support filter pushdown with left-semi joins in HashJoinExec#17153
support filter pushdown with left-semi joins in HashJoinExec#17153adriangb wants to merge 6 commits intoapache:mainfrom
Conversation
jonathanc-n
left a comment
There was a problem hiding this comment.
Just small comment for now, going to look more into it tomorrow.
| } | ||
|
|
||
| // Right child analysis | ||
| if right_preserved && !is_left_only_predicate(filter, left_schema_len) { |
There was a problem hiding this comment.
Maybe we can add a TODO here or just refactor now as I believe !is_left_only_predicate here is not the correct behaviour due to mixed predicates.
There was a problem hiding this comment.
Would love to get some help understanding what the correct behavior actually is... the abstract semantics of joins is a bit beyond my database knowledge
There was a problem hiding this comment.
Shouldn't !is_left_only_predicate be a is_right_only_predicate?
There was a problem hiding this comment.
I'm assuming so, or we end up pushing mixed predicates into the from_child call, which is unnecessary
| /// assert_eq!(JoinType::Left.lr_is_preserved(), (true, false)); | ||
| /// assert_eq!(JoinType::LeftSemi.lr_is_preserved(), (true, false)); | ||
| /// ``` | ||
| pub fn lr_is_preserved(self) -> (bool, bool) { |
There was a problem hiding this comment.
i think we should use a slightly more clear name, maybe like join_side_is_preserved() or something along those lines
|
Thanks @adriangb for working on this. I think it would be nice to add a sqllogictest example showing the pushdown. Here is an example: copy (select i as k from generate_series(1, 10) t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 100) t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
select *
from t1
where k in (
select k
from t2
where t2.v >= 10
);However, I noticed that the filter is being pushed down to the wrong table: HashJoinExec
DataSourceExec t1.parquet -- should be added to t1
DataSourceExec t2.parquet predicate=v@1 >= 10 AND DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 10 ]I checked and it is also happening with inner joins (#17196), so its probably not related to this specific PR. |
|
Closing in favor of @kosiew 's work |
Closes #16973