ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column#6985
ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column#6985emkornfield wants to merge 40 commits intoapache:masterfrom
Conversation
39280c5 to
fe2f609
Compare
cpp/src/parquet/CMakeLists.txt
Outdated
cpp/src/parquet/column_reader.cc
Outdated
There was a problem hiding this comment.
nit: AVx2 -> AVX2, o detect -> to detect, BIM2 -> BMI2
There was a problem hiding this comment.
Remove the comment as it was stale.
pitrou
left a comment
There was a problem hiding this comment.
I can't comment on the correctness of the levels handling (someone competent will have to review). I would like to see some code cleanup and organization changes, though.
There was a problem hiding this comment.
should be removed now.
cpp/src/parquet/column_reader.cc
Outdated
There was a problem hiding this comment.
I think it would be nice to move the low-level handling of levels to a dedicated C++ file.
There was a problem hiding this comment.
I've moved all everything in the anonymous namespace to level_conversion.cc
There was a problem hiding this comment.
Note that this prevents inlining of the Scalar version which was happening before, but I'm not sure that is a big deal.
cpp/src/parquet/column_reader.cc
Outdated
cpp/src/parquet/level_conversion.h
Outdated
cpp/src/parquet/level_conversion.h
Outdated
There was a problem hiding this comment.
I don't think it makes sense for this to be a helper function.
There was a problem hiding this comment.
I agree now that the AppendWord is moved to FirstTimeBitmapWriter
cpp/src/parquet/level_conversion.h
Outdated
There was a problem hiding this comment.
Then please only define this function if ARROW_HAVE_BMI2 is defined.
There was a problem hiding this comment.
Please, a more specific name (such as BitmapToString).
There was a problem hiding this comment.
Then just don't define the test.
There was a problem hiding this comment.
good point. done.
cpp/src/parquet/column_reader.cc
Outdated
There was a problem hiding this comment.
typo: max_repitition_level -> max_repetition_level
cpp/src/parquet/column_reader.cc
Outdated
There was a problem hiding this comment.
Do we need to move these two lines before line 186?
There was a problem hiding this comment.
I think just 1.
cpp/src/parquet/column_reader.cc
Outdated
There was a problem hiding this comment.
Since this path is executed only when #if defined(ARROW_HAVE_BMI2) is true, it would be good to add such as code
#if defined(ARROW_HAVE_BMI2)
...
#else
assert(false && "must not execute this without BMI2");
#endif
There was a problem hiding this comment.
FYI: Most of these tests fail on big-endian platform
[ FAILED ] TestAppendBitmap.TestOffsetOverwritesCorrectBitsOnExistingByte
[ FAILED ] TestAppendBitmap.TestOffsetShiftBitsCorrectly
[ FAILED ] TestAppendBitmap.AllBytesAreWrittenWithEnoughSpace
[ FAILED ] TestAppendBitmap.OnlyApproriateBytesWrittenWhenLessThen8BytesAvailable
[ FAILED ] TestAppendToValidityBitmap.BasicOperation
There was a problem hiding this comment.
There was a substantial refactoring, I tried to provide guards but without a big-endian machine in CI it will be very hard to to catch all of these issues.
cpp/src/parquet/level_conversion.h
Outdated
There was a problem hiding this comment.
Why do we need to use union? bytes[] is used only at line 106.
It looks simple to explicitly extract the lowest 8bit instead of using union.
There was a problem hiding this comment.
I originally thought I would have to do more. I removed the union.
|
@pitrou I think I addressed your comments. One of them that went stale was the complexity for "AppendWord", I tried to remove parts that did not seem to affect performance on benchmarks for parquet column reading but I spent some time trying to maximize word level parallelism for the unaligned case, because I think at least for repeated fields I expect this case to be fairly common. As a point of comparison using the AppendWord implementation I put in place for BigEndian shows much smaller improvements. Really this should use CopyBitmap but I didn't want to start moving a move code then I needed to in bit_util.h for this PR. I opened If more comments are needed or you think there is a cleaner way of writing the code, I'm happy for input. @wesm do you want to look at the parquet specific logic/comments to make sure I captured them correctly? |
|
I'll try to have a closer look tomorrow or Tuesday |
|
@wesm wanted to make sure this is still on your radar |
|
Yep sorry thanks for the nudge, will look today |
wesm
left a comment
There was a problem hiding this comment.
At a high level this looks good.
One problem with introducing more SIMD code is that we won't yet have a runtime dispatching strategy. We will need to go through all of our SIMD accelerations in this library and refactor things so that we can build a "fat" binary that includes both AVX2/BMI2-accelerated versions and the non-SIMD versions. That way we can reap the benefits of this work in portable packages like Python wheels / conda packages.
cpp/src/arrow/util/bit_util.h
Outdated
cpp/src/arrow/util/bit_util.h
Outdated
There was a problem hiding this comment.
This section is a bit too much bit manipulation for me to carefully scrutinize but I trust you. There are some typos in the comments
There was a problem hiding this comment.
OK, if there is anything I can add for documentation please let me know.
There was a problem hiding this comment.
This is cryptic. Can you rewrite it in a more understandable way? For example (untested):
if (bit_mask_ == 0x1) {
current_byte_ = 0;
} else {
current_byte_ = *(append_position + bytes_for_word - 1);
}|
I think this is fine to merge once most of the typos in the comments are fixed. A rebase will probably fix the Rust lint error |
cpp/src/arrow/util/bit_util.h
Outdated
cpp/src/arrow/util/bit_util.h
Outdated
There was a problem hiding this comment.
OK, if there is anything I can add for documentation please let me know.
This reverts commit 59e39794974ebade0cff0ba702b6cbc9cbd43b87.
|
@pitrou I think the C GLib build failure looks unrelated to me. I fixed the s390x build. |
|
@wesm I expect your PR to get merged first but if it doesn't consider using PopCount methods in this one. |
|
+1. I'm going to go ahead and merge this and then I'll rebase #7352 |
This change does the following:
With AVX2 enabled this seems to improve benchmarks by 20% for nullable columns (BM_Read..) on my box. I didn't see any impacts in other benchmarkmarks.
See checklist for what is remaining. If possible i'd like to get early feedback on naming and my approach to checking for BMI2.
Still needed: