fix: Pick correct columns in Sort Merge Equijoin#18772
Merged
rluvaton merged 5 commits intoapache:mainfrom Nov 18, 2025
Merged
Conversation
rluvaton
reviewed
Nov 17, 2025
Member
There was a problem hiding this comment.
I think this should also be changed to:
Suggested change
| .take(left_columns_length) |
Member
|
I was able to reproduce the bug when changing the sort_merge_join.slt to have 3 columns in Can you please update the description and update/add tests (I would update the sort_merge_join.slt like in the diff below to make sure all tests are testing that. make sure to add a comment on why 3 columns and t1 2). Updated the slt file to reproduce the errordiff --git a/datafusion/sqllogictest/test_files/sort_merge_join.slt b/datafusion/sqllogictest/test_files/sort_merge_join.slt
--- a/datafusion/sqllogictest/test_files/sort_merge_join.slt (revision f3980641660997345af6061dc3b34f365020bd07)
+++ b/datafusion/sqllogictest/test_files/sort_merge_join.slt (date 1763411346050)
@@ -26,7 +26,7 @@
CREATE TABLE t1(a text, b int) AS VALUES ('Alice', 50), ('Alice', 100), ('Bob', 1);
statement ok
-CREATE TABLE t2(a text, b int) AS VALUES ('Alice', 2), ('Alice', 1);
+CREATE TABLE t2(a text, b int, c int) AS VALUES ('Alice', 2, 77), ('Alice', 1, 66);
# inner join query plan with join filter
query TT
@@ -64,83 +64,83 @@
----
# left join without join filter
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
# left join with join filter
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b * 50 <= t1.b
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Bob 1 NULL NULL NULL
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b < t1.b
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
# right join without join filter
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 RIGHT JOIN t2 ON t1.a = t2.a
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
# right join with join filter
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 RIGHT JOIN t2 ON t1.a = t2.a AND t2.b * 50 <= t1.b
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 RIGHT JOIN t2 ON t1.a = t2.a AND t1.b > t2.b
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
# full join without join filter
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 FULL JOIN t2 ON t1.a = t2.a
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 FULL JOIN t2 ON t1.a = t2.a AND t2.b * 50 > t1.b
----
-Alice 100 NULL NULL
-Alice 50 Alice 2
-Bob 1 NULL NULL
-NULL NULL Alice 1
+Alice 100 NULL NULL NULL
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
+NULL NULL Alice 1 66
-query TITI rowsort
+query TITII rowsort
SELECT * FROM t1 FULL JOIN t2 ON t1.a = t2.a AND t1.b > t2.b + 50
----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 NULL NULL
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 NULL NULL NULL
+Bob 1 NULL NULL NULL
statement ok
DROP TABLE t1; |
Contributor
Author
|
Thanks @rluvaton, Sure |
4c629e2 to
9df5882
Compare
ec3ca20 to
11a25cb
Compare
11a25cb to
e1de055
Compare
8eaeff5 to
29eae5b
Compare
5c530b0 to
29eae5b
Compare
29eae5b to
e7d80ae
Compare
Contributor
Author
dba2a02 to
0c457ba
Compare
logan-keede
pushed a commit
to logan-keede/datafusion
that referenced
this pull request
Nov 23, 2025
## Which issue does this PR close? - Closes apache#18804. ## Rationale for this change ## What changes are included in this PR? Take correct columns ## Are these changes tested? Yes, - added rust tests to hit invalid code paths - sqltests - fuzz tests enhancement to fuzzify columns count Fuzz tests are taken from @rluvaton 's apache#18788 , excluding those this PR doesn't fix: ``` fuzz_cases::join_fuzz::test_right_anti_join_1k_binary_filtered fuzz_cases::join_fuzz::test_right_anti_join_1k_filtered fuzz_cases::join_fuzz::test_right_semi_join_1k_filtered ``` ## Are there any user-facing changes?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Take correct columns
Are these changes tested?
Yes,
Fuzz tests are taken from @rluvaton 's #18788 , excluding those this PR doesn't fix:
Are there any user-facing changes?