Iceberg-rust Write support
I've noticed a lot of interest in write support in Iceberg-rust. This issue aims to break this down into smaller pieces so they can be picked up in parallel.
Appetizers
If you're not super familiar with the codebase, feel free to pick up one of the appetizers below. They are, or are not, related to the write path, but are good things to get in, and are good to get to know the code base:
Commit path
The commit path entails writing a new metadata JSON.
Related operations
These are not on the critical path to enable writes, but are related to it:
Metadata tables
Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.
Integration Tests
Integration tests with other engines like spark.
Contribute
If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍
Iceberg-rust Write support
I've noticed a lot of interest in write support in Iceberg-rust. This issue aims to break this down into smaller pieces so they can be picked up in parallel.
Appetizers
If you're not super familiar with the codebase, feel free to pick up one of the appetizers below. They are, or are not, related to the write path, but are good things to get in, and are good to get to know the code base:
DataFileWriterBuildertests #726Commit path
The commit path entails writing a new metadata JSON.
initial-default#737add_files.This is done with the Java API where during writing the upper, lower bound are tracked and the number of null- and nan records are counted.. Most of this is in, except theNaNcounts: Implement nan_value_counts && distinct_counts metrics in parquet writer #417Related operations
These are not on the critical path to enable writes, but are related to it:
SchemaUpdatelogic to Iceberg-Rust #697unionByNameto easily union two schemas to provide easy schema evolution: Update a TableSchema from a Schema #698Metadata tables
Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.
Integration Tests
Integration tests with other engines like spark.
Contribute
If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍