ARROW-7284: [Java] ensure java implementation meets clarified dictionary spec#5945
Closed
tianchen92 wants to merge 2 commits intoapache:masterfrom
Closed
ARROW-7284: [Java] ensure java implementation meets clarified dictionary spec#5945tianchen92 wants to merge 2 commits intoapache:masterfrom
tianchen92 wants to merge 2 commits intoapache:masterfrom
Conversation
Contributor
Author
|
cc @emkornfield |
Contributor
|
@tianchen92 sorry, I've been backlogged will try to review your open PRs over the next week or so (unless others get to them sooner CC @praveenbingo @pravindra @BryanCutler) |
emkornfield
reviewed
Jan 10, 2020
java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
Outdated
Show resolved
Hide resolved
emkornfield
reviewed
Jan 10, 2020
java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
Outdated
Show resolved
Hide resolved
emkornfield
reviewed
Jan 10, 2020
java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
Outdated
Show resolved
Hide resolved
emkornfield
reviewed
Jan 10, 2020
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowDictionaryBatch.java
Show resolved
Hide resolved
emkornfield
reviewed
Jan 10, 2020
Contributor
Author
|
Thanks @emkornfield, PR updated according to your comments. |
emkornfield
reviewed
Jan 17, 2020
| serializeDictionaryBatch(out, dictionary1, false, closeableList); | ||
|
|
||
| // write recordBatch2 | ||
| serializeRecordBatch(out, Arrays.asList(encodedVector1, encodedVector2), closeableList); |
Contributor
There was a problem hiding this comment.
is this necessary for this test?
Contributor
Author
There was a problem hiding this comment.
I guess yes, if we only sent dictionaries without batches, seems dictionaries won't be read since we made this change in https://issues.apache.org/jira/browse/ARROW-6040
emkornfield
pushed a commit
that referenced
this pull request
Feb 18, 2020
… batch Related to [ARROW-7546](https://issues.apache.org/jira/browse/ARROW-7546). Per discussion #5945 (comment). In ARROW-7284, we write a simple method to concat vectors. However, ARROW-7073 is about to concat vector values efficiently, after this PR merged, we should use this new implementation in ArrowReader. Closes #6431 from tianchen92/ARROW-7546 and squashes the following commits: 9bced46 <tianchen> ARROW-7546: Use new implementation to concat vectors values in batch Authored-by: tianchen <[email protected]> Signed-off-by: Micah Kornfield <[email protected]>
This was referenced Jan 17, 2020
kou
pushed a commit
to apache/arrow-java
that referenced
this pull request
Nov 25, 2024
… batch Related to [ARROW-7546](https://issues.apache.org/jira/browse/ARROW-7546). Per discussion apache/arrow#5945 (comment). In ARROW-7284, we write a simple method to concat vectors. However, ARROW-7073 is about to concat vector values efficiently, after this PR merged, we should use this new implementation in ArrowReader. Closes #6431 from tianchen92/ARROW-7546 and squashes the following commits: 9bced461c <tianchen> ARROW-7546: Use new implementation to concat vectors values in batch Authored-by: tianchen <[email protected]> Signed-off-by: Micah Kornfield <[email protected]>
pribor
pushed a commit
to GlobalWebIndex/arrow
that referenced
this pull request
Oct 24, 2025
…ary spec Related to [ARROW-7284](https://issues.apache.org/jira/browse/ARROW-7284). As discussed on [[link](https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E)]. This is the java side implementation. 1. It is not required that all dictionary batches occur at the beginning of the IPC stream format (if a the first record batch has an all null dictionary encoded column, the null column's dictionary might not be sent until later in the stream). 2. A second dictionary batch for the same ID that is not a "delta batch" in an IPC stream indicates the dictionary should be replaced. 3. Clarifies that the file format, can only contain 1 "NON-delta" dictionary batch and multiple "delta" dictionary batches. 4. Add an enum to dictionary metadata for possible future changes in what format dictionary batches can be sent. (the most likely would be an array Map<Int, Value>). An enum is needed as a place holder to allow for forward compatibility past the release 1.0.0. Closes apache#5945 from tianchen92/ARROW-7284 and squashes the following commits: 16a609d <tianchen> resolve some comments 1231756 <tianchen> ARROW-7284: ensure java implementation meets clarified dictionary spec Authored-by: tianchen <[email protected]> Signed-off-by: Micah Kornfield <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related to ARROW-7284.
As discussed on [link]. This is the java side implementation.
It is not required that all dictionary batches occur at the beginning
of the IPC stream format (if a the first record batch has an all null
dictionary encoded column, the null column's dictionary might not be sent
until later in the stream).
A second dictionary batch for the same ID that is not a "delta batch"
in an IPC stream indicates the dictionary should be replaced.
Clarifies that the file format, can only contain 1 "NON-delta"
dictionary batch and multiple "delta" dictionary batches.
Add an enum to dictionary metadata for possible future changes in what
format dictionary batches can be sent. (the most likely would be an array
Map<Int, Value>). An enum is needed as a place holder to allow for forward
compatibility past the release 1.0.0.