GH-35576: [C++] Make Decimal{128,256}::FromReal more accurate#35997
GH-35576: [C++] Make Decimal{128,256}::FromReal more accurate#35997pitrou merged 4 commits intoapache:mainfrom
Conversation
18155dc to
f8a6443
Compare
|
@felipecrv @benibus Would either you have time to review this? |
westonpace
left a comment
There was a problem hiding this comment.
I won't promise to follow the conversion algorithm entirely :) but this looks well tested and the bit shifting routines make sense.
cpp/src/arrow/util/basic_decimal.cc
Outdated
There was a problem hiding this comment.
Is this going away because it is superceded by the version in GenericBasicDecimal?
cpp/src/arrow/util/decimal_test.cc
Outdated
There was a problem hiding this comment.
Why remove these cases? We can still convert negative numbers right? Isn't it just less precise?
There was a problem hiding this comment.
Because CheckDecimalFromReal and CheckDecimalFromRealIntegerString now automatically deduce the negative test cases from the positive ones.
python/pyarrow/tests/test_compute.py
Outdated
There was a problem hiding this comment.
Is this a TODO (should we create an issue?) or is it a fact, in which case we can get rid of the XXX?
There was a problem hiding this comment.
Not sure. I wouldn't create an issue for it as the effort-benefit ratio is probably low.
felipecrv
left a comment
There was a problem hiding this comment.
I can't claim I went through all the details of the implementation, but it looks like the kind of code that doesn't have rarely-taken branches that wouldn't be exercised from the unit tests. LGTM.
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as: ``` >>> pa.array([1234567890.]).cast(pa.decimal128(38, 11)) <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0> [ 1234567889.99999995904 ] >>> pa.array([1234567890.]).cast(pa.decimal128(38, 12)) <pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0> [ 1234567890.000000057344 ] ``` The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.
5f0abee to
87be93a
Compare
|
I think I have addressed all review comments. Would you like to take another look? @westonpace @felipecrv |
|
Conbench analyzed the 7 benchmark runs on commit There were 35 benchmark results indicating a performance regression:
The full Conbench report has more details. |
| return { | ||
| // -- Stress the 24 bits of precision of a float | ||
| // 2**63 + 2**40 | ||
| FromFloatTestParam{9.223373e+18f, 19, 0, "9223373136366403584"}, |
There was a problem hiding this comment.
@pitrou Hi, the expected return value of FromFloatTestParam{5.76460752e13f, 18, 4, "57646075230342.3488"} is 57646073774080.0000, which seems different from the original value, does that meet expectations?
There was a problem hiding this comment.
What do you call "the original value" in that context?
Both values are actually equal in float32:
>>> np.float32(5.76460752e13) == np.float32(57646073774080)
TrueThere was a problem hiding this comment.
What do you call "the original value" in that context? Both values are actually equal in
float32:>>> np.float32(5.76460752e13) == np.float32(57646073774080) True
I see what you mean. Thanks!
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as:
The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.