ARROW-12714: [C++] String title case kernel#10869
ARROW-12714: [C++] String title case kernel#10869edponce wants to merge 23 commits intoapache:masterfrom
Conversation
213d532 to
2a5d59f
Compare
d83bc17 to
5a02758
Compare
|
@edponce Do you know when this will be ready for review? Or do you need help on this? |
|
@pitrou I am working on completing this PR today and would greatly appreciate your review. |
5a02758 to
6d23d4b
Compare
|
The capitalize and title kernels are the first vector string kernels that perform code point transforms. The code point transforms (case changes) can grow in bytes and thus required the use of cc @pitrou |
78911fd to
02957f2
Compare
|
@ianmcook Could you revise the R binding for the titlecase kernel? |
0fff885 to
78bd427
Compare
pitrou
left a comment
There was a problem hiding this comment.
LGTM, just a couple more questions / comments
78bd427 to
5955c4e
Compare
pitrou
left a comment
There was a problem hiding this comment.
Just one question, you may or may not want to act on it.
However, can you fix the lint failure? archery lint --clang-format should do it.
|
Thank you very much @edponce ! |
This PR adds scalar string compute functions for titlecasing a string, namely "ascii_title" and "utf8_title". Simple titlecasing is performed, only every cased character following an uncased character is uppercased. Additional changes included with this PR are: * restructure StringTransformCodepointXXX classes to support vector string kernels using codepoint transforms * update capitalize kernels Closes apache#10869 from edponce/ARROW-12714-String-title-case-kernel Authored-by: Eduardo Ponce <edponce00@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
This PR adds scalar string compute functions for titlecasing a string, namely "ascii_title" and "utf8_title". Simple titlecasing is performed, only every cased character following an uncased character is uppercased.
Additional changes included with this PR are: