Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Jan 27, 2026

What's Changed

Fix BaseVariableWidthVector/BaseLargeVariableWidthVector IPC serialization when valueCount is 0.

Problem

When valueCount == 0, setReaderAndWriterIndex() was setting offsetBuffer.writerIndex(0), which means readableBytes() == 0. IPC serializer uses readableBytes() to determine buffer size, so 0 bytes were written to the IPC stream. This crashes IPC readers in other libraries because Arrow spec requires offset buffer to have at least one entry [0].

This is a follow-up to #967 which fixed the same issue in ListVector/LargeListVector.

Fix

Simplify setReaderAndWriterIndex() to always use (valueCount + 1) * OFFSET_WIDTH for offset buffer's writerIndex. When valueCount == 0, this correctly sets writerIndex to OFFSET_WIDTH, ensuring offset[0] is included in serialization.

Testing

Added tests for empty VarCharVector and LargeVarCharVector verifying offset buffer has correct readableBytes().

related to #343

@github-actions

This comment has been minimized.

@Yicong-Huang
Copy link
Contributor Author

cc @viirya @jbonofre

@lidavidm lidavidm added the bug-fix PRs that fix a big. label Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PRs that fix a big.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants