ARROW-10306: [C++] Add string replacement kernel#8468
ARROW-10306: [C++] Add string replacement kernel#8468maartenbreddels wants to merge 3 commits intoapache:masterfrom
Conversation
|
You may want to remove the regex variant of this if you want to move this forward without depending on resolving the re2 dependency issue. |
|
@maartenbreddels would it be practical to split this into two PRs? (one for plain replace, other for re2-based replace) Or would you prefer first to have the re2 dependency issue resolved? (ARROW-10541) |
c55f9ec to
1099f4e
Compare
6f41599 to
e1c0571
Compare
|
I'd rather keep this 1 PR, looks like #8756 is working |
|
@pitrou this is ready for review, failure seems unrelated (minio on windows). |
There was a problem hiding this comment.
ReplaceString below is basically independent from Type, but using this idiom may compile it twice. Can you find another way to parametrize the kernel?
(hint: perhaps use composition rather than inheritance)
There was a problem hiding this comment.
Maybe I misunderstand, but via offset_type we are not independent of Type right?
There was a problem hiding this comment.
Well, I don't understand why offset_type is being used here. ValueDataBuilder is basically a TypedBufferBuilder<uint8_t>, it's used for building the string data, it doesn't deal with string offsets.
There was a problem hiding this comment.
Ok, yes, I see it now. I'm using offset_type in this class as well, which I shouldn't, I think that's what led me to this. This requires a bit of refactoring.
There was a problem hiding this comment.
This pattern occurs more often in the file, I didn't realize this lead to slower compilation and probably larger binary sizes. I think it requires a refactor that is larger than this PR. Also, I won't have the time currently to do this. Can we merge this as is, and I'll open a Jira issue?
There was a problem hiding this comment.
Well, you don't need to refactor other kernels for now, but I suppose this one could easily be adapted, no? :-)
There was a problem hiding this comment.
Similarly as above, this looks basically independent from Type.
There was a problem hiding this comment.
Note that the GlobalReplace loop works a bit differently, it calls Match then Rewrite, avoiding the duplicate matching calls. Not sure it's worth optimizing this, though:
https://github.com/google/re2/blob/master/re2/re2.cc#L427
There was a problem hiding this comment.
Good idea, I prefer to keep it as it is, I left a comment in the code so this doesn't get lost.
e1c0571 to
60d6589
Compare
3f8aa29 to
bd5e8fe
Compare
|
@maartenbreddels Is it ready for review again? Feel free to ping me. |
|
@pitrou Yes, apart from an unanswered question this is ready for review 👍 |
d4608a9 to
356c300
Compare
bd5e8fe to
436a9e2
Compare
|
@pitrou this is ready for review (assuming you agree with the above plan of doing a refactor later on) |
|
(gentle ping here, would really like to see those PRs merged for 4.0 in April!) |
|
I'll rebase and update this PR. |
|
Hmm, there's a re2 link error on RTools 3.5: |
|
Haven't seen that before, and we've been building with re2 for months now. Maybe this is the first time we're building something that actually uses re2? It's possible that re2 needs a "backport" library built with the rtools3.5 toolchain; an immediate workaround could be to build with |
|
@github-actions crossbow submit -g r |
|
Revision: 82bb60d Submitted crossbow builds: ursacomputing/crossbow @ actions-233 |
|
The as-cran failure is legit: https://github.com/ursacomputing/crossbow/runs/2197449739?check_suite_focus=true#step:7:210 It's a weird build setup that uses |
|
I made ARROW-12094 for fixing that build. Merging now. |
Two new kernels