Fix: Align sort_merge_join filter output with join schema to fix right-anti panic#18800
Fix: Align sort_merge_join filter output with join schema to fix right-anti panic#18800rluvaton merged 5 commits intoapache:mainfrom
Conversation
|
hmm, why is the test failing? |
|
Looks like this include fixes created by, let's wait for it to be merged Also, once merged can you update the fuzz tests for join: |
|
also, when fixing a bug, we require tests to be added that failed on main and pass on this pr |
Sure will do. |
| .columns() | ||
| .iter() | ||
| .skip(right_columns_length) | ||
| .take(left_columns_length) |
There was a problem hiding this comment.
I don't think we should add it as we can lose columns or hide bugs if we the rest of the columns are not left
There was a problem hiding this comment.
Right, should I Drop the .take(left_columns_length) and add a sanity check like assert_eq_or_internal_err!(..., left_columns_length, "...")) before extending right_columns.
Which issue does this PR close?
sort_merge_joinand different number of columns between each side #18787.Rationale for this change
Sort-merge joins assumed both inputs had the same width when applying filter results. With different column counts (like right-anti joins) we built batches that no longer matched the output schema, so Arrow panicked during concat_batches.
What changes are included in this PR?
Are these changes tested?
Manual testing and ran previous tests.
Are there any user-facing changes?