fix: coalesce should return correct datatype#168
Merged
viirya merged 2 commits intoapache:mainfrom Mar 5, 2024
Merged
Conversation
viirya
commented
Mar 5, 2024
Comment on lines
+37
to
+48
| test("coalesce should return correct datatype") { | ||
| Seq(true, false).foreach { dictionaryEnabled => | ||
| withTempDir { dir => | ||
| val path = new Path(dir.toURI.toString, "test.parquet") | ||
| makeParquetFileAllTypes(path, dictionaryEnabled = dictionaryEnabled, 10000) | ||
| withParquetTable(path.toString, "tbl") { | ||
| checkSparkAnswerAndOperator( | ||
| "SELECT coalesce(cast(_18 as date), cast(_19 as date), _20) FROM tbl") | ||
| } | ||
| } | ||
| } | ||
| } |
Member
Author
There was a problem hiding this comment.
Due to the issue apache/datafusion#9458, the return type and the actual output array is different in DataFusion coalesce function:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2) (192.168.86.44 executor driver): org.apache.comet.CometNativeException
: Arrow error: Invalid argument error: column types must match schema types, expected Utf8 but found Date32 at column index 0
at org.apache.comet.Native.executePlan(Native Method)
at org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:65)
at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:111)
at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:126)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
viirya
commented
Mar 5, 2024
Comment on lines
+1504
to
+1506
| // TODO: Remove this once we have new DataFusion release which includes | ||
| // the fix: https://github.com/apache/arrow-datafusion/pull/9459 | ||
| castToProto(None, a.dataType, childExpr) |
Member
Author
There was a problem hiding this comment.
This is a workaround for now before we have new DataFusion release that includes the fix: apache/datafusion#9459
Member
Author
|
cc @sunchao |
Member
Author
|
Merged. Thanks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #167.
Rationale for this change
What changes are included in this PR?
How are these changes tested?