support unnest as subexpression #9592
Conversation
|
|
||
| ## Unnest multiple columns | ||
| query II | ||
| select unnest(column1), unnest(column2) from unnest_table; |
There was a problem hiding this comment.
The result of this query should be incorrect. In DuckDB:
D CREATE TABLE unnest_table
AS VALUES
([1,2,3], [7], 1),
([4,5], [8,9,10], 2),
([6], [11,12], 3),
([12], [null, 42, null], null),
-- null array to verify the `preserve_nulls` option
(null, null, 4)
;
D select unnest(col0), unnest(col1) from unnest_table;
┌──────────────┬──────────────┐
│ unnest(col0) │ unnest(col1) │
│ int32 │ int32 │
├──────────────┼──────────────┤
│ 1 │ 7 │
│ 2 │ │
│ 3 │ │
│ 4 │ 8 │
│ 5 │ 9 │
│ │ 10 │
│ 6 │ 11 │
│ │ 12 │
│ 12 │ │
│ │ 42 │
│ │ │
├──────────────┴──────────────┤
│ 11 rows 2 columns │
└─────────────────────────────┘For multiple unnest expressions, we might need to extend the functionality of UnnestExec to support it. I think we could implement it later, maybe in another PR.
There was a problem hiding this comment.
@jonahgao Agreed, I will remove this feature.
|
|
||
| ## Unnest multiple columns | ||
| query error DataFusion error: This feature is not implemented: Only support single unnest expression for now | ||
| select unnest(column1), unnest(column2) from unnest_table; |
There was a problem hiding this comment.
It seems multiple columns are not supported yet
There was a problem hiding this comment.
What I mean multiple columns is actually multiple unnest expression, it seems misleading
There was a problem hiding this comment.
@jayzhan211 I renamed it to "multiple unnest functions"
| data: transformed_expr, | ||
| transformed, | ||
| tnr: _, | ||
| } = expr.transform_up_mut(&mut |expr: Expr| { |
There was a problem hiding this comment.
Why the logic of unnest is split out of others two?
is doing if else directly without transformed more readable?
if unnest
else if column
else {
}There was a problem hiding this comment.
@jayzhan211 The transform is essential for this feature. We need to rewrite the expression in bottom-up recursion such that we can extract the child and parent expression of the unnest function. The reason to split out the Column variant is that: we need to retain the relation field
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Ok, I got it. First time seeing transform_up_mut, not familiar with it.
| Ok(arg_data_types[0].clone()) | ||
| let arg_data_type = arg_data_types[0].clone(); | ||
| // Unnest's output type is the inner type of the list | ||
| match arg_data_type{ |
jayzhan211
left a comment
There was a problem hiding this comment.
LGTM, thanks @YjyJeff
datafusion/sql/src/select.rs
Outdated
| transformed, | ||
| tnr: _, | ||
| } = expr.transform_up_mut(&mut |expr: Expr| { | ||
| let column_name = expr.display_name()?; |
There was a problem hiding this comment.
This can be moved inside the if statement, because it is not used in the else statement.
There was a problem hiding this comment.
Looks good to me!
My only slight concern is that there are two projections around the unnest plan, and there are always an extra copy of projection expressions. However, this seems hard to avoid unless UnnestExec can unnest an expression, rather than just a column. This could perhaps be a future work.
Thank you for your nice contribution @YjyJeff
Which issue does this PR close?
Closees 9591 and 9589
Rationale for this change
Users could use unnest function with other expressions instead of writing a new wrapper query
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No