ARROW-12948: [C++][Python] Add slice_replace kernel#10494
ARROW-12948: [C++][Python] Add slice_replace kernel#10494lidavidm wants to merge 3 commits intoapache:masterfrom
Conversation
|
Needs rebasing onto #10496. |
pitrou
left a comment
There was a problem hiding this comment.
Thank you, this looks good on the principle.
There was a problem hiding this comment.
Should probably be called binary_replace_slice since it works on non-Ascii input as well (it just slices in byte units, not codeunits).
cpp/src/arrow/compute/api_scalar.h
Outdated
There was a problem hiding this comment.
Hmm... I'm not sure the default values will be picked up. Is it just for documentation?
There was a problem hiding this comment.
opts.start can be negative, so you should perhaps use before_slice instead.
There was a problem hiding this comment.
Good catch. I added a test case for this too.
There was a problem hiding this comment.
This looks a bit pedantic. Just output - output_start?
python/pyarrow/tests/test_compute.py
Outdated
There was a problem hiding this comment.
It's not really useful to re-write the same tests in Python. What you could do is generate slices as in test_slice_compatibility to check Pandas compatibility.
docs/source/cpp/compute.rst
Outdated
| +--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+ | ||
| | utf8_lower | Unary | String-like | String-like | \(8) | | | ||
| +--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+ | ||
| | utf8_replace_slice | Unary | String-like | String-like | \(2) | :struct:`ReplaceSliceOptions` | |
There was a problem hiding this comment.
I believe the note number should be (4)?
This adds a slice_replace kernel mimicking Pandas's str.slice_replace. There are both ascii and UTF8 variants, indexing respectively with bytes and codepoints. The ascii variant also works on binary arrays.