ARROW-10540: [Rust] Extended filter kernel to all types and improved performance#8954
Closed
jorgecarleitao wants to merge 7 commits intoapache:masterfrom
jorgecarleitao:mutable_filter2
Closed
ARROW-10540: [Rust] Extended filter kernel to all types and improved performance#8954jorgecarleitao wants to merge 7 commits intoapache:masterfrom jorgecarleitao:mutable_filter2
jorgecarleitao wants to merge 7 commits intoapache:masterfrom
jorgecarleitao:mutable_filter2
Conversation
Contributor
|
Amazing, great work! Here is my bench result on mac book pro 2018: |
Member
Author
|
I am moving this to draft as I do not believe the 200x. It looks too good to be true. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves the filter kernel:
MutableArrayDataThere are two novel ideas here:
it minimizes the number of memcopies when building the filtered array, both for single filter and multi-filter operations.
for single filter operations, it leverages an iterator to create the new array on the fly. For multi filter operations, it persists the iterator's result in a vector and iterates over it per array.
This PR also improves the performance of
MutableArrayDataby avoiding some bound checks viaunsafe(properly documented).Summary of the benchmarks:
Code used to benchmark:
Benchmark result: