Skip to content

Implement BinaryRow deserialization to decode partition binary bytes #126

@luoyuxia

Description

@luoyuxia

Parent Issue

Part of #124 (support partitioned table)

Background

In Paimon, ManifestEntry._PARTITION stores partition values as raw BinaryRow binary bytes (Vec<u8>). Currently, BinaryRow (crates/paimon/src/spec/data_file.rs:30-51) is a stub that only stores arity — it cannot hold or parse actual partition data.

To support partitioned table reading, we need BinaryRow to wrap the raw bytes and provide typed accessor methods.

What needs to be done

  1. Enhance BinaryRow to hold actual binary data

    • Add a data: Vec<u8> (or Bytes) field to back the row
    • Parse the binary layout: header (8 bytes) + null bit set (aligned) + fixed-length part (8 bytes per field) + variable-length part
  2. Implement typed getters

    • is_null_at(pos) — check the null bit
    • get_int(pos)i32
    • get_long(pos)i64
    • get_short(pos)i16
    • get_byte(pos)i8
    • get_boolean(pos)bool
    • get_float(pos)f32
    • get_double(pos)f64
    • get_string(pos)&str / String (variable-length, read offset+length from fixed part, then read from variable part)
    • get_binary(pos)&[u8]
  3. Add a constructor from raw bytes

    • BinaryRow::from_bytes(arity: i32, data: Vec<u8>) -> BinaryRow
  4. Unit tests

    • Round-trip: construct binary data manually, verify typed getters
    • Test null handling
    • Test variable-length fields (String, Binary)

Reference

Affected files

  • crates/paimon/src/spec/data_file.rsBinaryRow struct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions