GH-37055: [C++] Optimize hash kernels for Dictionary ChunkedArrays#38394
GH-37055: [C++] Optimize hash kernels for Dictionary ChunkedArrays#38394felipecrv merged 6 commits intoapache:mainfrom
Conversation
|
|
|
Benchmark result: https://gist.github.com/js8544/d64f289313c814a2df6a87a945a0b382#file-pr38394-txt Also significant improvement in user's case in python: #37055 (comment) |
|
It was mentioned in #9683 (comment) that we can compute the result of each chunk and then merge them with This PR saves calls of the dictionary unifier, we can also further optimize this by optimizing the unifying process. This will be done once we have a faster hashtable: #38372. |
|
@felipecrv Do you mind having a look at this? |
Soon! |
There was a problem hiding this comment.
Do we even need dictionary_ to be a member variable now? Wouldn't it suffice to perform a single DictionaryUnifier::GetResult call at the end?
There was a problem hiding this comment.
Documentation for DictionaryUnifier::GetResult says the unifier can't be re-used after a call to GetResult [1].
My suggestion (that I think will work well):
Rename dictionary_ to first_dictionary_ and change Append(arr) to transition through this state machine on each call:
// --------------------------------------------------------------------------------------------------------------
// Current State Next State
// --------------------------------------------------------------------------------------------------------------
// !first_dictionary_ && !dictionary_unifier_ --> first_dictionary_ = arr_dict_
// UNCHANGED dictionary_unifier_
// --------------------------------------------------------------------------------------------------------------
// first_dictionary_ && !dictionary_unifier_ --> if !first_dictionary_.Equals(arr_dict) then
// dictionary_unifier_ = unify(first_dictionary_, arr_dict)
// first_dictionary_ = nullptr
// else
// UNCHANGED first_dictionary_, dictionary_unifier_
// end
// --------------------------------------------------------------------------------------------------------------
dictionary_unifier_ --> dictionary_unifier_ = unify(dictionary_unifier_, arr_dict)You will then have to re-think how dictionary_value_type and dictionary work below.
[1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/array/array_dict.h#L169
There was a problem hiding this comment.
@js8544 to clarify, dictionary_unifier_ = unify(dictionary_unifier_, arr_dict) means something like unifier->unify(arr_dict) in the actual code :)
001a846 to
f09ab5c
Compare
felipecrv
left a comment
There was a problem hiding this comment.
I don't see any changes other than the rebase.
Still working on it :) |
|
@felipecrv Updated as you suggested. |
|
@js8544 I won't merge my own suggestions before you have a chance to reply and apply them yourself. If you're OK with them and CI passes with them applied I will merge. |
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
|
macOS build failing for some unknown reason and this has been rebased very recently, so I'm merging. |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit ec41209. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…ays (apache#38394) ### Rationale for this change When merging dictionaries across chunks, the hash kernels unnecessarily unify the existing dictionary, dragging down the performance. ### What changes are included in this PR? Reuse the dictionary unifier across chunks. ### Are these changes tested? Yes, with a new benchmark for dictionary chunked arrays. ### Are there any user-facing changes? No. * Closes: apache#37055 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Rationale for this change
When merging dictionaries across chunks, the hash kernels unnecessarily unify the existing dictionary, dragging down the performance.
What changes are included in this PR?
Reuse the dictionary unifier across chunks.
Are these changes tested?
Yes, with a new benchmark for dictionary chunked arrays.
Are there any user-facing changes?
No.
value_countsextremely slow for chunkedDictionaryArray#37055