Refactor extract_join_keys and move the ExtractEquijoinPredicate rule#4760
Refactor extract_join_keys and move the ExtractEquijoinPredicate rule#4760alamb merged 9 commits intoapache:masterfrom
Conversation
| col("t2.a") + lit(2i32).cast_to(&DataType::UInt32, &t2_schema)?, | ||
| ) | ||
| .alias("t1.a + 1 = t2.a + 2"); | ||
| let plan = LogicalPlanBuilder::from(t1) |
There was a problem hiding this comment.
The split_conjunction will unalias the expr.
| " CoalesceBatchesExec: target_batch_size=4096", | ||
| " RepartitionExec: partitioning=Hash([Column { name: \"t1.t1_id + Int64(12)\", index: 2 }], 2)", | ||
| " ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t1_id@0 + CAST(12 AS UInt32) as t1.t1_id + Int64(12)]", | ||
| " RepartitionExec: partitioning=Hash([Column { name: \"t1.t1_id + UInt32(12)\", index: 2 }], 2)", |
There was a problem hiding this comment.
After we move the ExtractEquijoinPredicate rule, the optimizer will
Simplifythe expression.- Extract join keys from join filter.
Since the ExtractEquijoinPredicate can unalias the filter, the result is the same as #4755 now.
| Arc::new(DecorrelateWhereExists::new()), | ||
| Arc::new(DecorrelateWhereIn::new()), | ||
| Arc::new(ScalarSubqueryToJoin::new()), | ||
| Arc::new(ExtractEquijoinPredicate::new()), |
| use std::sync::Arc; | ||
|
|
||
| // equijoin predicate | ||
| type EquijoinPredicate = (Expr, Expr); |
| match &expr { | ||
| Expr::BinaryExpr(BinaryExpr { left, op, right }) => match op { | ||
| Operator::Eq => { | ||
| ) -> Result<(Vec<EquijoinPredicate>, Option<Expr>)> { |
There was a problem hiding this comment.
this is a much nicer interface 👍 for being self documenting
|
I will take a look this pr carefully tomorrow. |
| #[test] | ||
| fn join_with_alias_filter() -> Result<()> { |
There was a problem hiding this comment.
BTW, I recommend add a integration-test to show the plan after all rule optimize it.
There was a problem hiding this comment.
Since we can't create a join whose condition is an alias a sql, I add a integration-test with dataframe api.
|
Thanks @ygf11 @jackwener and @liukun4515 |
|
Benchmark runs are scheduled for baseline = 0d6d371 and contender = 93052cd. 93052cd is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4759.
Rationale for this change
The other rule may depend on
ExtractEquijoinPredicate.What changes are included in this PR?
split_conjunction.ExtractEquijoinPredicaterule behindSubqueryFilterToJoin.Are these changes tested?
Yes.
Are there any user-facing changes?