ARROW-12946: [C++] String swap case kernel#10855
ARROW-12946: [C++] String swap case kernel#10855Christian8491 wants to merge 9 commits intoapache:masterfrom Christian8491:ARROW-12946-String-swap-case-kernel
Conversation
|
LGTM. Need to add implementation for UTF8. |
pitrou
left a comment
There was a problem hiding this comment.
Thanks for doing this! Just a couple comments.
|
@lidavidm please enable CI |
|
For CI, it appears that Ubuntu 20.04 provides utf8proc 2.5, but the isupper/islower functions are not provided until 2.6: https://juliastrings.github.io/utf8proc/releases/ Meanwhile, RTools35 is using utf8proc 2.4; it builds utf8proc from source due to some other issue, but I believe it's using the system headers anyways looking at the compiler command since the system headers come before the self-built utf8proc ones. I think bumping the minimum utf8proc version and fiddling with the build flags so that the utf8proc headers take precedence will be needed. |
|
An alternative solution is to use helper functions already available, refer to https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1391-L1408 Related notes in Arrow w.r.t. to lower/upper casing of UTF8 can be found in: |
Yeah the logs say which doesn't make sense. |
|
As @edponce suggested, I replaced some |
|
Some tests invoke the incorrect kernel (ASCII test uses |
This PR adds
swapcasecompute kernel for string. It is similar toPython str.swapcase()