ARROW-10640: [C++] An "if_else" kernel to combine two arrays based on a mask#10410
ARROW-10640: [C++] An "if_else" kernel to combine two arrays based on a mask#10410nirandaperera wants to merge 39 commits intoapache:masterfrom
Conversation
|
@bkietz I think the bitmap ops approach is simpler than the bitmap visitor approach. WDYT? bitmap visitor - |
lidavidm
left a comment
There was a problem hiding this comment.
This looks good! I left some general comments.
As for visitor vs bitmap ops, I think the visitor-based one is fine. It does break down the cases nicely, even if it's a bit verbose.
There was a problem hiding this comment.
AllocateBitmap doesn't zero the allocation
arrow/cpp/src/arrow/compute/kernel.h
Lines 61 to 64 in 1769888
There was a problem hiding this comment.
I think there is a mismatch in the kernel comment and the impl.
arrow/cpp/src/arrow/compute/kernel.cc
Line 56 in 29130ca
It looks like it is zeroed out for bitmaps
There was a problem hiding this comment.
Hmm, we should probably update comment then. CC @bkietz given KernelContext::AllocateBitmap has to zero the allocation to satisfy Valgrind et al, should we just go ahead and guarantee that it returns a pre-cleared bitmap?
There was a problem hiding this comment.
I'd prefer to leave this for a follow up where we can try to zero only the final byte (as described by the comment) and see if Valgrind is appeased. If not, the comment can be updated
There was a problem hiding this comment.
You could just inline the memset call here - no need to put a helper in a header IMO unless we're going to use it a lot.
There was a problem hiding this comment.
Please inline the call to memset here
|
@github-actions autotune |
There was a problem hiding this comment.
Please inline the call to memset here
|
I did a quick comparison with |
We may be able to further improve this if we directly use vector operations inside the kernel (I haven't checked the compiled code yet, may be compiler does that already), because if-else use case directly map to |
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
6208b36 to
4dbe076
Compare
bkietz
left a comment
There was a problem hiding this comment.
Excellent work, thanks for doing this!
|
It looks like there's an actual build failure on CentOS/GCC4.8 (expand "Dump install logs" to get the actual failure): Details |
Ah! thanks @lidavidm I missed this... it seems to be a gcc 4.x thing, isn't it. BTW @bkietz should I add string type support to this as well, or can I leave it for later development? |
I think it is important to add support for strings as well, but it's probably a good idea to leave that for a follow-up PR so we can get this merged? (I can imagine the implementation to be a bit different / more complex for variable length array elements) |
|
I agree, and also Niranda already filed ARROW-12955 to take care of that followup. |
lidavidm
left a comment
There was a problem hiding this comment.
Looks like that fixed the builds, and the remaining failures are not relevant.
Adding a preliminary impl for an
if_else(cond: Datum, left: Datum, right: Datum)function. It works as follows,nullvalues will be promoted to the output.