fix: struct field don't push down to TableScan#8774
Conversation
|
|
||
| query RRR | ||
| select min(f5), max(f5), avg(f5) from m2 where tag_id = '1000' and time < '2024-01-03T14:46:35+01:00' group by type; | ||
| select min(f5), max(f5), avg(f5) from m2 where tag_id = '1000' and time < '2024-01-03T14:46:35+01:00' group by type order by min(f5); |
There was a problem hiding this comment.
just make result stable
|
the root reason maybe in below code I'm not sure why we handle it this way. I tried commenting out this line, and all the tests still passed(the only test don't pass is not related to this, and I comment above in |
alamb
left a comment
There was a problem hiding this comment.
Thank you @haohuaijin -- given this PR fixes the bug and includes good test coverage ❤️ I think we could merge it as is.
However, I had some suggestions to improve the code even more -- do you think we should try to do those as part of this PR or maybe as a follow on PR?
| | Expr::IsNull(expr) | ||
| | Expr::Negative(expr) => outer_columns_helper(expr, columns), | ||
| Expr::Column(_) | Expr::Literal(_) | Expr::Wildcard { .. } => true, | ||
| _ => false, |
There was a problem hiding this comment.
What would you think about removing this catchall (and and thus explicitly enumerating all Expr types so this type of bug can't happen again)?
| _ => false, |
An alternate idea would be to remove this explicit walk down the tree entirely and use https://docs.rs/datafusion/latest/datafusion/logical_expr/utils/fn.expr_to_columns.html instead?
This pushdown code may predate the more generaic tree walks
There was a problem hiding this comment.
Good idea. Already removed the catchall.
An alternate idea would be to remove this explicit walk down the tree entirely and use https://docs.rs/datafusion/latest/datafusion/logical_expr/utils/fn.expr_to_columns.html instead?
the outer_columns_helper function is used to collect all outer_ref_column that expr_to_columns function doesn't return these columns. Because expr_to_columns use TreeNode to walks the expr tree, but TreeNode don't walk the outer_ref_column
https://github.com/apache/arrow-datafusion/blob/dd4263f843e093c807d63edf73a571b1ba2669b5/datafusion/expr/src/tree_node/expr.rs#L69-L77
There was a problem hiding this comment.
the outer_columns_helper function is used to collect all outer_ref_column that expr_to_columns function doesn't return these columns. Because expr_to_columns use TreeNode to walks the expr tree, but TreeNode don't walk the outer_ref_column
I see -- thank you for the explanation, which makes sense. It fiddled a little with the code, and I think I have a cleanup here that uses the TreeNode walking to avoid having to enumerate Expr variants #8787
| | Expr::IsNotNull(expr) | ||
| | Expr::IsNull(expr) | ||
| | Expr::Negative(expr) => outer_columns_helper(expr, columns), | ||
| Expr::Column(_) |
Which issue does this PR close?
Closes #8735
Rationale for this change
when debug #8735, I find not only the case in #8735 don't push down, we have other case don't push down
for example
What changes are included in this PR?
add more case to
outer_columns_helper.Are these changes tested?
yes, add more tests
Are there any user-facing changes?