[SPARK-11328] [SQL] Improve error message when hitting this issue.#9942
[SPARK-11328] [SQL] Improve error message when hitting this issue.#9942nongli wants to merge 4 commits into
Conversation
|
add to whitelist |
|
Test build #46619 has finished for PR 9942 at commit
|
There was a problem hiding this comment.
we can remove this line, scala will throw un-catched exception
The issue is that the output commiter is not idempotent and retry attempts will fail because the output file already exists. It is not safe to clean up the file as this output committer is by design not retryable. Currently, the job fails with a confusing file exists error. This patch is a stop gap to tell the user to look at the top of the error log for the proper message. This is difficult to test locally as Spark is hardcoded not to retry. Manually verified by upping the retry attempts.
There was a problem hiding this comment.
earlier logs => earlier logs or stage page?
There was a problem hiding this comment.
oh, https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L406 is another place we create writer.
|
Seems the jira number is not the right one? |
|
ah it is actually https://issues.apache.org/jira/browse/SPARK-11328 |
There was a problem hiding this comment.
May be its better idea to wrap current exception instance into SparkException
Example: SparkException("some error message", e)
There was a problem hiding this comment.
I explicitly did not do that because that makes it really seem like the root cause is "file already exists". This error would be in the middle of a big stack trace. We can go either way though.
d19b5c3 to
c4375ec
Compare
|
Test build #46989 has finished for PR 9942 at commit
|
The issue is that the output commiter is not idempotent and retry attempts will
fail because the output file already exists. It is not safe to clean up the file
as this output committer is by design not retryable. Currently, the job fails
with a confusing file exists error. This patch is a stop gap to tell the user
to look at the top of the error log for the proper message.
This is difficult to test locally as Spark is hardcoded not to retry. Manually
verified by upping the retry attempts.