Skip to content

parallel-workload: Check stalls#35750

Draft
def- wants to merge 11 commits into
MaterializeInc:mainfrom
def-:pr-pw-stall-check
Draft

parallel-workload: Check stalls#35750
def- wants to merge 11 commits into
MaterializeInc:mainfrom
def-:pr-pw-stall-check

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented Mar 27, 2026

Inspired by https://materializeinc.slack.com/archives/CU7ELJ6E9/p1774556502630089

We should have had this originally!

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@def- def- force-pushed the pr-pw-stall-check branch 6 times, most recently from 8d73f63 to bef3a21 Compare March 31, 2026 07:34
@def- def- force-pushed the pr-pw-stall-check branch from bef3a21 to f5a6e3d Compare April 16, 2026 02:35
@def-
Copy link
Copy Markdown
Contributor Author

def- commented Apr 16, 2026

@def- def- force-pushed the pr-pw-stall-check branch from f5a6e3d to 9b9e224 Compare May 10, 2026 15:46
@def- def- requested review from DAlperin and bosconi May 10, 2026 15:48
@def- def- marked this pull request as ready for review May 10, 2026 15:48
@def- def- requested a review from a team as a code owner May 10, 2026 15:48
@def-
Copy link
Copy Markdown
Contributor Author

def- commented May 10, 2026

Ready for review.

New test run: https://buildkite.com/materialize/nightly/builds/16376 Edit: Unexpectedly there's a bunch of new failures, have to. check in with Dov.

@def- def- marked this pull request as draft May 10, 2026 15:59
The reader for range columns called `push_range_with`, which writes the
range to the row verbatim. For discrete range types (int4/int8/date),
Parquet files authored by external engines may encode ranges with
non-canonical bounds — e.g. `[1,10]` for `int4range`, which MZ stores
internally as `[1,11)`. Without canonicalization those rows do not
compare or hash equal to logically-identical values constructed inside
MZ, so `COPY FROM PARQUET` rows mismatch their pure-SQL counterparts.

Switch the decode path to `push_range`, which canonicalizes the range
before packing. Add a unit test that decodes a hand-built non-canonical
arrow StructArray and a testdrive regression test that round-trips a
DuckDB-authored Parquet file through `COPY FROM ... (FORMAT PARQUET)`.

Fixes DB-55.

https://claude.ai/code/session_01KXioUsT2r5o2tRwrsHm4iR
claude and others added 10 commits May 11, 2026 14:24
Three runtime errors surface when a parallel-workload run drives an
Iceberg sink with a wider mix of column types:

1. "Datum Int16 does not match builder Int32Builder" — Iceberg has no
   smallint, so the writer context arrow schema (derived from the iceberg
   schema) uses Int32 while Materialize rows still carry Datum::Int16.
   Add a lossless Int16 -> Int32 promotion in ArrowColumn::append_datum,
   mirroring the existing UInt16 -> Int32 case.

2. "Field 'value' missing extension metadata" — Materialize names map
   fields entries/keys/values, but iceberg-rust's arrow conversion names
   them key_value/key/value. merge_field_metadata_recursive matched by
   name and silently dropped the value field's extension metadata, so
   ArrowBuilder later failed when constructing the inner builder. Match
   the map entries struct positionally instead.

3. "Failed to create EqualityDeleteWriterConfig: field_id N not found" —
   the planner accepted Range types as Iceberg equality delete keys, but
   ranges lower into Iceberg structs and iceberg-rust's
   RecordBatchProjector skips nested fields, so the equality field id is
   unreachable at runtime. Drop Range from the allow-list so the failure
   is caught at sink creation instead.
Explain why positional matching in `merge_map_entries_metadata` is correct
by citing the Arrow `Schema.fbs` definition of `Map` as
`List<entries: Struct<key, value>>` with non-enforced field names.
Pins the three runtime failures fixed in the previous commit:

- `merge_map_entries_preserves_value_extension_metadata`: unit test that
  builds a materialize-shaped map field (entries/keys/values) and an
  iceberg-shaped one (key_value/key/value) and asserts the merge copies
  the value field's extension metadata positionally.
- `test/iceberg/key-validation.td`: adds a Range key rejection block
  alongside the existing Map/List rejections.
- `test/iceberg/catalog.td`: adds smallint and map[text=>text] sinks
  exercising the Int16->Int32 promotion and the entries-struct metadata
  merge end-to-end against a real Iceberg catalog.
DuckDB's iceberg_scan returns 0 rows for map-valued tables in the
versions we test against, so the round-trip via map_keys/map_values was
not actually exercising the metadata merge — the assertion just failed
with no actionable signal. Check mz_sink_statuses for `running` instead:
without `merge_map_entries_metadata` the sink stalls with "Field 'value'
missing extension metadata" during ArrowBuilder construction, which is
exactly the regression we want to pin.
builder_for_datatype was hard-coding MapFieldNames::default()
(entries/keys/values) when constructing the MapBuilder, regardless of
what the surrounding Schema actually said. For Iceberg the schema's map
fields are key_value/key/value (preserved by merge_map_entries_metadata),
so the resulting MapArray's nested DataType disagreed with the schema and
RecordBatch::try_new rejected every row — the sink stalled silently and
iceberg_scan saw an empty table.

Read entry/key/value names off the schema's entries struct so the
MapArray matches whichever convention the caller chose. COPY TO S3
PARQUET keeps building its schema with the arrow-rs defaults, so its
output is unchanged.

Also restores the DuckDB iceberg_scan assertion on the map_table sink in
catalog.td now that the round-trip actually works.
@def- def- force-pushed the pr-pw-stall-check branch from 9b9e224 to 93cb7f3 Compare May 11, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants