[SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField on DecimalType should find a common type with typeSoFar#16463
Closed
dongjoon-hyun wants to merge 1 commit into
Closed
[SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField on DecimalType should find a common type with typeSoFar#16463dongjoon-hyun wants to merge 1 commit into
CSVInferSchema.inferField on DecimalType should find a common type with typeSoFar#16463dongjoon-hyun wants to merge 1 commit into
Conversation
…alType should find a common type with `typeSoFar`
CSV type inferencing causes `IllegalArgumentException` on decimal numbers with heterogeneous precisions and scales because the current logic uses the last decimal type in a **partition**. Specifically, `inferRowType`, the **seqOp** of **aggregate**, returns the last decimal type. This PR fixes it to use `findTightestCommonType`.
**decimal.csv**
```
9.03E+12
1.19E+11
```
**BEFORE**
```scala
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
root
|-- _c0: decimal(3,-9) (nullable = true)
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
16/12/16 14:32:49 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 3
```
**AFTER**
```scala
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
root
|-- _c0: decimal(4,-9) (nullable = true)
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
+---------+
| _c0|
+---------+
|9.030E+12|
| 1.19E+11|
+---------+
```
Pass the newly add test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #16320 from dongjoon-hyun/SPARK-18877.
|
Test build #70829 has finished for PR 16463 at commit
|
Member
Author
|
Hi, @gatorsmile . |
jaceklaskowski
approved these changes
Jan 4, 2017
Member
Author
|
Thank you for review, @jaceklaskowski . |
asfgit
pushed a commit
that referenced
this pull request
Jan 4, 2017
…lType should find a common type with `typeSoFar`
## What changes were proposed in this pull request?
CSV type inferencing causes `IllegalArgumentException` on decimal numbers with heterogeneous precisions and scales because the current logic uses the last decimal type in a **partition**. Specifically, `inferRowType`, the **seqOp** of **aggregate**, returns the last decimal type. This PR fixes it to use `findTightestCommonType`.
**decimal.csv**
```
9.03E+12
1.19E+11
```
**BEFORE**
```scala
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
root
|-- _c0: decimal(3,-9) (nullable = true)
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
16/12/16 14:32:49 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 3
```
**AFTER**
```scala
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
root
|-- _c0: decimal(4,-9) (nullable = true)
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
+---------+
| _c0|
+---------+
|9.030E+12|
| 1.19E+11|
+---------+
```
## How was this patch tested?
Pass the newly add test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #16463 from dongjoon-hyun/SPARK-18877-BACKPORT-21.
Member
|
LGTM. Thanks, merging to 2.1! |
Member
|
Could you please close it and open one for branch 2.0? Thanks! |
Member
Author
|
Sure! Thank you, @gatorsmile . |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
CSV type inferencing causes
IllegalArgumentExceptionon decimal numbers with heterogeneous precisions and scales because the current logic uses the last decimal type in a partition. Specifically,inferRowType, the seqOp of aggregate, returns the last decimal type. This PR fixes it to usefindTightestCommonType.decimal.csv
BEFORE
AFTER
How was this patch tested?
Pass the newly add test case.