ARROW-12657: [C++] Adding String hex to numeric conversion#11161
ARROW-12657: [C++] Adding String hex to numeric conversion#11161wmalpica wants to merge 4 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Add a test case for hex with negative sign (eg., -0x34) and another for multiple prefixed zeros (eg., 00000x7f).
There was a problem hiding this comment.
Those cases are explicitly not allowed. Do you mean that I should add them to CheckCastFails?
A negative before a hex does not make sense. If the value can be negative (signed integer) then you should use the binary representation of that negative number, for example for int8 0xFF is -1 and 0xF0 is -128.
For the case of multiple prefixed zeros, I opted for not allowing that to be allowed, since allowing it would make the logic more complex (probably slower) and it seems like it would be a very odd way for a hex string to be formatted.
There was a problem hiding this comment.
Ohh, sorry...I meant as failing test cases. It is just a way to ensure more code coverage in tests.
cpp/src/arrow/util/value_parsing.h
Outdated
There was a problem hiding this comment.
Simplify to return ARROW_PREDICT_TRUE(ParseHex(s, length, out));.
|
LGTM. I added some final minor suggestions. |
cpp/src/arrow/util/value_parsing.h
Outdated
There was a problem hiding this comment.
A lookup table approach may have better performance.
We can leave it as a followup task.
There was a problem hiding this comment.
It might be worth investigating, but I would request we leave it as followup task.
e0e3f9f to
18e8e96
Compare
|
The LZ4 build failures on MinGW are unrelated. |
This PR adds the ability for the StringConverter to parse hex strings of the form: 0x123abc to an integer. The strings must start with "0x" case insensitive. The values ABCDEF are case insensitive. Note that there was already a Hex Parsing function here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/string.cc#L76 Which was not really flexible enough for what was needed. Not sure it it would be best to replace it with the new functionality added by this PR. I am open to suggestions This work was originally done here apache#11064 but had to create a new PR due to a rebase issue Closes apache#11161 from wmalpica/ARROW-12657_3 Lead-authored-by: William Malpica <16705032+wmalpica@users.noreply.github.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
This PR adds the ability for the StringConverter to parse hex strings of the form:
0x123abc to an integer.
The strings must start with "0x" case insensitive.
The values ABCDEF are case insensitive.
Note that there was already a Hex Parsing function here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/string.cc#L76
Which was not really flexible enough for what was needed.
Not sure it it would be best to replace it with the new functionality added by this PR. I am open to suggestions
This work was originally done here #11064 but had to create a new PR due to a rebase issue