[SPARK-47602][CORE] Resource managers: Migrate logError with variables to structured logging framework#45808
[SPARK-47602][CORE] Resource managers: Migrate logError with variables to structured logging framework#45808panbingkun wants to merge 5 commits into
Conversation
…s to structured logging framework
| * part of the ThreadContext. | ||
| */ | ||
| case class MDC(key: LogKey.Value, value: String) | ||
| case class MDC(key: LogKey, value: Any) |
There was a problem hiding this comment.
Make the type of value from String to Any, so that when using MDC, some code can be omitted, eg: String.valueOf(...) , x.toString
|
|
||
| def expectedPatternForBasicMsg(level: Level): String | ||
|
|
||
| def expectedPatternForBasicMsgWithException(level: Level): String |
There was a problem hiding this comment.
Add a new test for basicMsg + Exception
There was a problem hiding this comment.
Nit: Let's add comments about each pattern is about
| */ | ||
| object LogKey extends Enumeration { | ||
| val EXECUTOR_ID, MIN_SIZE, MAX_SIZE = Value | ||
| val APPLICATION_ID, APPLICATION_STATE, BUCKET, CONTAINER_ID, EXECUTOR_ID, POD_ID = Value |
There was a problem hiding this comment.
Each enum can take one line.
val APPLICATION_ID = Value
val APPLICATION_STATE = Value
| val MIN_SIZE, MAX_SIZE, MAX_EXECUTOR_FAILURES = Value | ||
| val EXIT_CODE = Value | ||
|
|
||
| type LogKey = Value |
There was a problem hiding this comment.
Because I want to use 'xxx: LogKey' instead of 'xxx: LogKey.Value' to define a type, it feels like it's a personal preference. Should we restore it?
|
|
||
| test("check MDC message") { | ||
| val log = log"This is a log, exitcode ${MDC(EXIT_CODE, 10086)}" | ||
| assert(log.message === "This is a log, exitcode 10086") |
There was a problem hiding this comment.
Let's verify the returned map as well
| import org.apache.spark.deploy.k8s.KubernetesConf | ||
| import org.apache.spark.deploy.k8s.KubernetesUtils.addOwnerReference | ||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.internal.{Logging, LogKey => LK, MDC} |
There was a problem hiding this comment.
Why alias as LK? LogKey is fine
|
@panbingkun Thanks for the work. LGTM except some minor comments. |
|
I will update |
Done. |
| } else { | ||
| logError(s"Driver terminated with exit code ${exitCode}! Shutting down. $remoteAddress") | ||
| logError(log"Driver terminated with exit code ${MDC(EXIT_CODE, exitCode)}! " + | ||
| log"Shutting down. ${MDC(REMOTE_ADDRESS, remoteAddress)}") |
There was a problem hiding this comment.
@gengliangwang
Perhaps it would be more reasonable to write this in this way
logError(log"Driver terminated with exit code ${MDC(EXIT_CODE, exitCode)}! " + log"Shutting down. $remoteAddress")
Not every variable needs to be recorded in the field content of JSON
Unfortunately, we currently do not support this syntax as above, it will fail to compile.
So I submitted a separate PR to address this issue: #45813
There was a problem hiding this comment.
@panbingkun Let's put all the variables into the context during the migration. If the context is too verbose, we can configure log4j to only show a subset of keys later.
Otherwise, there will be back-and-forth during the migration. remoteAddress can be helpful for debugging here.
|
Thanks, merging to master |
| case e: Exception => | ||
| logError(s"Cannot get the creationTimestamp of the pod: ${state.pod}", e) | ||
| logError(log"Cannot get the creationTimestamp of the pod: " + | ||
| log"${MDC(LogKey.POD_ID, state.pod)}", e) |
There was a problem hiding this comment.
IMO POD_ID is not a suitable key here. state.pod actually outputs a whole Pod Spec, and there is no such a "pod id" concept in the K8s, in practice, we use namespace and pod name to identify a pod.
There was a problem hiding this comment.
@pan3793 Thanks for the suggestion. Do you mind creating a PR to improve this?
There was a problem hiding this comment.
I would like to have a try, it may take a few hours to learn this new log framework.
There was a problem hiding this comment.
The inital PR of the framework is in #45729. Thank you in advance!
There was a problem hiding this comment.
@pan3793 Thank you very much for correcting this issue.
What changes were proposed in this pull request?
The pr aims to:
logErrorin moduleResource managerswith variables tostructured logging framework.*LoggingSuiteMDCSuitetypeofvaluefromStringtoAny( so that when usingMDC, some code can beomitted, eg:String.valueOf(...),x.toString)Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.