MINOR: [C++][Parquet] Rephrase decimal annotation#33694
MINOR: [C++][Parquet] Rephrase decimal annotation#33694wjones127 merged 4 commits intoapache:masterfrom
Conversation
|
@pitrou What about this? |
0c9e284 to
bc94680
Compare
bc94680 to
08f5d60
Compare
|
Breadcrumb: These changes are in response to comments: |
wjones127
left a comment
There was a problem hiding this comment.
Thanks for doing this. I've added several suggestions :)
cpp/src/parquet/properties.h
Outdated
| /// | ||
| /// According to the specs, DECIMAL can be used to annotate the following types: | ||
| /// - int32: for 1 <= precision <= 9. | ||
| /// - int64: for 1 <= precision <= 18; precision < 10 will produce a warning. |
There was a problem hiding this comment.
"precision < 10 will produce a warning." is this true of our implementation? Maybe we should skip saying that.
There was a problem hiding this comment.
This is copied from the specs. In our impl no warning is produced because we have fixed physical types for decimals and guarantee that precision < 10 will not use int64 type.
There was a problem hiding this comment.
Then perhaps we should change to this? So that we document our libraries behavior, and not just what the spec says?
| /// - int64: for 1 <= precision <= 18; precision < 10 will produce a warning. | |
| /// - int64: for 10 <= precision <= 18. |
There was a problem hiding this comment.
Once this is done, I think this is ready to be merged :)
There was a problem hiding this comment.
Fixed. Please check again.
cpp/src/parquet/properties.h
Outdated
| /// As a consequence, decimal columns stored in integer types are more compact | ||
| /// but in a risk that the parquet file may not be readable by previous Arrow C++ | ||
| /// versions or other implementations. |
There was a problem hiding this comment.
| /// As a consequence, decimal columns stored in integer types are more compact | |
| /// but in a risk that the parquet file may not be readable by previous Arrow C++ | |
| /// versions or other implementations. | |
| /// As a consequence, decimal columns stored in integer types are more compact. |
I'm not sure it's worth saying that older versions don't support them. In Arrow C++, read support for these was added at the same time for the other physical types for decimals (original pr). And that was a long time ago, before version 0.10.0. I'm sure it was similarly well supported in parquet-mr.
There was a problem hiding this comment.
I have checked that both apache/parquet-mr and apache/impala support reading this long ago. It seems that the native parquet reader in the velox does not handle this yet. I'm not familiar with other implementations but there may be some do not support it.
IMHO, we cannot take care of all other implementations. There is no risk to the parquet-cpp itself because the reader support is already in and it is well documented by the parquet specs. So I have removed the comment about the risk. @wjones127 @pitrou
|
I found that this feature is already |
I'm afraid it might be too late for that, since the release vote has already passed. So this will have to be a minor breaking change in the 12.0.0 release. |
|
Benchmark runs are scheduled for baseline = b7fd793 and contender = 068f499. 068f499 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
What changes are included in this PR?
enable_short_decimal_stored_as_integer.