Skip to content

Skip local build on CRC mismatch during ONLINE transition for upsert tables#18895

Open
deepthi912 wants to merge 1 commit into
apache:masterfrom
deepthi912:upsert-skip-local-build-on-crc-mismatch
Open

Skip local build on CRC mismatch during ONLINE transition for upsert tables#18895
deepthi912 wants to merge 1 commit into
apache:masterfrom
deepthi912:upsert-skip-local-build-on-crc-mismatch

Conversation

@deepthi912

@deepthi912 deepthi912 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

When a CONSUMING segment transitions to ONLINE and its offset matches the committed offset in ZK, the server builds the segment locally and swaps it in. For upsert tables this is unsafe: a locally-built segment can diverge from the segment committed by the winning replica (e.g. different docId ordering), so its CRC differs from the committed CRC. Swapping in a CRC-divergent local build can leave the upsert metadata (validDocIds / primary-key → record-location) inconsistent across replicas.

For example consider a case where the time column/comparison columns are transformed on tables, or maybe commit time compaction differers crc between replicas

Changes

  • Adds a buildSegmentAndReplace(SegmentZKMetadata) overload that, for upsert-enabled tables, compares the locally-built segment's CRC against the committed CRC in ZK (via BaseTableDataManager.hasSameCRC). On a mismatch it skips the local build so the caller downloads the committed segment instead.
  • The guard is applied on both the matched-offset and post-catch-up reconciliation paths in goOnlineFromConsuming.
  • If the locally-built segment metadata cannot be read, the check reports a mismatch so the transition downloads the committed segment rather than throwing.

Non-upsert tables and callers that do not pass the committed metadata are unaffected.

Rolling upgrade

Behavior change is scoped to upsert tables; no wire/ZK format change. Mixed-version replicas converge to the committed segment either way.

Testing

Added coverage in RealtimeSegmentDataManagerTest for the CRC-match, CRC-mismatch, unreadable-metadata, and non-upsert paths.

…nsition

When a CONSUMING segment transitions to ONLINE and its offset matches the
committed offset in ZK, the server builds the segment locally and swaps it in.
For upsert tables this is unsafe: a locally-built segment can diverge from the
segment committed by the winning replica (e.g. different docId ordering), so its
CRC differs from the committed CRC. Swapping in a CRC-divergent local build can
leave the upsert metadata (validDocIds / primary-key -> record-location)
inconsistent across replicas.

This change adds a buildSegmentAndReplace(SegmentZKMetadata) overload that, for
upsert-enabled tables, compares the locally-built segment's CRC against the
committed CRC in ZK (via BaseTableDataManager.hasSameCRC). On a mismatch it skips
the local build and the caller downloads the committed segment instead. The guard
is applied on both the matched-offset and post-catch-up reconciliation paths in
goOnlineFromConsuming. Non-upsert tables and callers that do not pass the
committed metadata are unaffected.

If the locally-built segment metadata cannot be read, the check reports a
mismatch so the transition downloads the committed segment rather than throwing.

Rolling upgrade: behavior change is scoped to upsert tables; no wire/ZK format
change. Mixed-version replicas converge to the committed segment either way.
@codecov-commenter

codecov-commenter commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.79%. Comparing base (cd6dd61) to head (f285d4c).
⚠️ Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
...a/manager/realtime/RealtimeSegmentDataManager.java 87.50% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18895      +/-   ##
============================================
- Coverage     64.80%   64.79%   -0.02%     
- Complexity     1322     1347      +25     
============================================
  Files          3393     3393              
  Lines        211332   211677     +345     
  Branches      33234    33308      +74     
============================================
+ Hits         136952   137152     +200     
- Misses        63329    63442     +113     
- Partials      11051    11083      +32     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.79% <87.50%> (-0.02%) ⬇️
temurin 64.79% <87.50%> (-0.02%) ⬇️
unittests 64.79% <87.50%> (-0.02%) ⬇️
unittests1 56.97% <87.50%> (-0.03%) ⬇️
unittests2 37.16% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@deepthi912 deepthi912 changed the title For upsert tables, skip local build on CRC mismatch during ONLINE transition Skip local build on CRC mismatch during ONLINE transition for upsert tables Jul 1, 2026
@deepthi912 deepthi912 added real-time Related to realtime table ingestion and serving upsert Related to upsert functionality labels Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

real-time Related to realtime table ingestion and serving upsert Related to upsert functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants