Skip to content

[CELEBORN-2342] Fix object aliasing in LegacySkewHandlingPartitionValidator corrupting sub-range metadata#3708

Closed
xumingming wants to merge 1 commit into
apache:mainfrom
xumingming:fix/legacy-skew-validator-object-aliasing
Closed

[CELEBORN-2342] Fix object aliasing in LegacySkewHandlingPartitionValidator corrupting sub-range metadata#3708
xumingming wants to merge 1 commit into
apache:mainfrom
xumingming:fix/legacy-skew-validator-object-aliasing

Conversation

@xumingming

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

When an AQE-skewed partition is split into N sub-ranges, the first sub-range's CommitMetadata object was stored by reference in both subRangeToCommitMetadataMap and currentCommitMetadataForReducer. Each subsequent sibling RPC mutated that object in-place via addCommitData(), silently inflating the TreeMap entry from bytes(A) to bytes(A)+bytes(siblings). Any task retry then sent the correct bytes(A) but found the inflated value, causing a permanent CelebornIOException mismatch and job abort.

Why are the changes needed?

This is a bug of E2E Integrity Check.

Does this PR resolve a correctness bug?

  • Yes

Does this PR introduce any user-facing change?

  • Yes

How was this patch tested?

Added Unit Test.

…idator corrupting sub-range metadata

When an AQE-skewed partition is split into N sub-ranges, the first sub-range's CommitMetadata
object was stored by reference in both subRangeToCommitMetadataMap and
currentCommitMetadataForReducer. Each subsequent sibling RPC mutated that object in-place via
addCommitData(), silently inflating the TreeMap entry from bytes(A) to bytes(A)+bytes(siblings).
Any task retry then sent the correct bytes(A) but found the inflated value, causing a permanent
CelebornIOException mismatch and job abort.

Fix: store a defensive copy in subRangeToCommitMetadataMap so sibling merges into
currentCommitMetadataForReducer cannot corrupt the per-range stored value.
@xumingming

Copy link
Copy Markdown
Contributor Author

@gauravkm Can you help review this PR?

@gauravkm gauravkm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for fixing!

@SteNicholas

Copy link
Copy Markdown
Member

Thanks for fix. Merged to main(v0.7.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants