ARROW-10040: [Rust] Add slice that realigns Buffer by nevi-me · Pull Request #8223 · apache/arrow

nevi-me · 2020-09-19T19:27:36Z

Has the consequence of removing the alignment limit on bool kernels.
It however comes at the cost of slower buffer manipulation.

Has the consequence of removing the alignment limit on bool kernels. It however comes at the cost of slower buffer manipulation.

nevi-me · 2020-09-19T19:31:37Z

@jhorstmann @paddyhoran this is related to the alignment fixes made recently. I noticed while reviewing another PR that we had a limitation on boolean kernels if offsets weren't a multiple of 8. So I've implemented a slice on Buffer that copies the buffer and removes the offsets.

I expect there to be a minor perf impact, and I only use the above slice method when necessary.

It's very likely that my implementation can be improved upon, but I'm at my current limits on bit manipulation. Please see https://godbolt.org/z/bMc5qd for some of the assembly generated. Not happy with the instruction count, and would appreciate some guidance or improvements. Feel free to push any changes directly on this branch.

github-actions · 2020-09-19T19:31:50Z

https://issues.apache.org/jira/browse/ARROW-10040

jhorstmann · 2020-09-20T20:17:42Z

Hi @nevi-me I guess I have to thank you for pushing this topic. I played around today with a different approach today that uses an iterator over the validity mask and returns bits in chunks of u8/u16/u32/u64. This can then be used together with a chunked iterator over the typed data of buffer to efficiently implement kernels. So far it's more of a proof of concept because it doesn't handle the remainder bits of this chunking yet. It's also largely untested. Generated code for a kernel looks very very nice, the inner loop gets unrolled and it even uses avx512 mask registers if available:

https://rust.godbolt.org/z/qGcrW8

This approach could also be used with explicit vector instructions in the inner loop, and a scalar loop for the remainder. Ensuring no out of bounds reads if the chunk size matches the vector register size.

Big todos are the remainder part and a lot of tests.

jhorstmann · 2020-09-21T14:28:50Z

My proof of concept is available at https://github.com/jhorstmann/bititer-poc/blob/master/src/lib.rs but I'm not sure when I will have time to integrate it with arrow.

nevi-me · 2020-09-22T11:45:17Z

@jhorstmann can I close this PR, and rely on your implementation when ready? Also, do you think we'd be able to use your implementation in parquet, as we might need that for iterating through Boolean arrays when converting from Arrow to Parquet.

jhorstmann · 2020-09-22T14:22:11Z

@nevi-me can you point me to the part of the parquet code that you have in mind? I found the BitReader used by bit packed encoding but that seems to solve a more general problem of reading arbitrary bit widths and also seems to work on a different parquet::util::memory::Buffer implementation instead of arrow::Buffer.

nevi-me · 2020-09-25T14:30:43Z

Hey @jhorstmann, I haven't had time to look, but maybe I'm confused. What I recall is that I needed a way of converting a Buffer to a Vec<bool> when I write to Parquet. It might even be that I don't need to do this though, but I'll see when I get time to work on the Arrow Parquet writer some more.

I'm closing this PR

@nevi-me

…itrary offsets @nevi-me this is the chunked iterator based approach i mentioned in #8223 I'm not fully satisfied with the solution yet: - I'd prefer to move all the bit-based functions into `Bitmap`, but creating a `Bitmap` from a `&Buffer` would involve cloning an `Arc`. - I need to do some benchmarking about how much the `packed_simd` implementation actually helps. If it's not a big difference I'd propose to remove it to simplify the code. Closes #8262 from jhorstmann/ARROW-10040-unaligned-bit-buffers Authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

nevi-me added 2 commits September 19, 2020 20:57

ARROW-10040: [Rust] Add slice that realigns Buffer

2a42f4a

Has the consequence of removing the alignment limit on bool kernels. It however comes at the cost of slower buffer manipulation.

fix clippy lint

b7a4e81

andygrove added the Component: Rust label Sep 20, 2020

jhorstmann mentioned this pull request Sep 24, 2020

ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets #8262

Closed

nevi-me closed this Sep 25, 2020

nevi-me deleted the ARROW-10040 branch October 17, 2020 19:21

asfimport mentioned this pull request Oct 9, 2020

[Rust] Create a way to slice unalligned offset buffers #26062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-10040: [Rust] Add slice that realigns Buffer#8223

ARROW-10040: [Rust] Add slice that realigns Buffer#8223
nevi-me wants to merge 2 commits intoapache:masterfrom
nevi-me:ARROW-10040

nevi-me commented Sep 19, 2020

Uh oh!

nevi-me commented Sep 19, 2020

Uh oh!

github-actions bot commented Sep 19, 2020

Uh oh!

jhorstmann commented Sep 20, 2020

Uh oh!

jhorstmann commented Sep 21, 2020

Uh oh!

nevi-me commented Sep 22, 2020

Uh oh!

jhorstmann commented Sep 22, 2020

Uh oh!

nevi-me commented Sep 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

nevi-me commented Sep 19, 2020

Uh oh!

nevi-me commented Sep 19, 2020

Uh oh!

github-actions bot commented Sep 19, 2020

Uh oh!

jhorstmann commented Sep 20, 2020

Uh oh!

jhorstmann commented Sep 21, 2020

Uh oh!

nevi-me commented Sep 22, 2020

Uh oh!

jhorstmann commented Sep 22, 2020

Uh oh!

nevi-me commented Sep 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments