Conversation
|
Current implementation. There are two workflows for unnest
Workflow 1: Workflow 2: |
| set datafusion.execution.target_partitions = 1; | ||
|
|
||
| query ? | ||
| select unnest(make_array(1,2,3,4,5)) |
There was a problem hiding this comment.
I also recommend to create test with one of aggregate functions as stated in the description of the issue: #6555
SELECT sum(a) AS total FROM (SELECT unnest(make_array(3, 5, 6) AS a) AS b;
----
14
There was a problem hiding this comment.
This one also. Add to task list
| )?); | ||
|
|
||
| self.apply_expr_alias(apply_name_plan, alias.columns) | ||
| let plan = self.apply_expr_alias(plan, alias.columns)?; |
There was a problem hiding this comment.
The order is changed so that we can get either expr or table. expr.
Example in array.slt
table.expr: SELECT sum(data.a), sum(data.b) FROM unnest(make_array(1, 2, 3), make_array(4, 5, 6, 7)) as data(a, b);
expr: SELECT sum(a), sum(b) FROM unnest(make_array(1, 2, 3), make_array(4, 5, 6, 7)) as data(a, b);
|
Wait for sqlparser 0.36 |
|
I was planning to release sqlparser in a few weeks -- but if it is blocking this PR I can find time to make a release sooner |
I hope sqlparser can be released soon! Thanks! |
|
Next release is tracked by apache/datafusion-sqlparser-rs#905 I will try and prioritize it but this week is short for me because of the US holiday (and I am spending most of my DataFusion time on #6800) |
|
@alamb Kindly remind you. :) |
9127da7 to
e55a2fb
Compare
|
Possible improvement: TODO in another PR
|
alamb
left a comment
There was a problem hiding this comment.
Thank you @jayzhan211 -- this is getting there but I don't think it is ready to merge yet (for reasons I left in comments below)
My major concerns are that we aren't using the LogicalPlan::Unnest and UnnestExec physical operator -- doing so would mean that improvements we plan (like making UnnestExec more efficient) will not apply to using unnest as a function
As this PR is large and I would like to move it along, I suggest
- Remove
flatten(we can put that into another PR) - Make a PR with just the
Expr::Unnest-- maybe this could simply create aLogicalPlan::Unnest? - Make a PR to add any additional features needed for unnest (like multiple column support, for example)
What do you think?
| # TODO: Unnest columns fails | ||
| query error DataFusion error: SQL error: TokenizerError\("Unterminated string literal at Line: 2, Column 95"\) | ||
| caused by | ||
| Internal error: UNNEST only supports list type\. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker |
There was a problem hiding this comment.
this appears still to be NotImplemented
datafusion/expr/src/expr.rs
Outdated
| args: flatten_args, | ||
| })) | ||
| } | ||
| _ => Err(DataFusionError::Internal(format!( |
There was a problem hiding this comment.
| _ => Err(DataFusionError::Internal(format!( | |
| _ => Err(DataFusionError::NotYetImplemented(format!( |
There was a problem hiding this comment.
The updated version is in https://github.com/apache/arrow-datafusion/pull/7461/files
| Ok(Self::from(unnest(self.plan, column.into())?)) | ||
| } | ||
|
|
||
| pub fn join_unnest_plans( |
There was a problem hiding this comment.
Is it possible to use https://docs.rs/datafusion/latest/datafusion/physical_plan/unnest/struct.UnnestExec.html directly rather than building up a special LogicalPlan? It seems like UnnestExec already does this
Let me work on #6995 first while thinking about step 2, since I don't think simply creating LogicalPlan from Expr::Unnest is trivial. |
|
Marking as draft to signify it is not waiting on review |
7c56682 to
cfcb8b3
Compare
To create LogicalPlan::Unnest, it seems we can only do that in the optimization step. But this can't avoid schema change, due to Convert expr to logical plan in |
|
The Unnest function is now processing in the planning step, converting Expr::Unnest to LogicalPlan::Unnest in this phase. Nothing to do in the optimize phase and only the existing UnnestExec is processed in the PhysicalPlan phase. Not sure if this is a good design but it works :) Maybe we can review the overall design, and then I will address the comments and resolve conflicts. |
|
I will try and review this more carefully later today and provide feedback. Thank you @jayzhan211 |
72273a5 to
907cc71
Compare
datafusion/expr/src/expr_schema.rs
Outdated
| .collect::<Result<Vec<_>>>()?; | ||
|
|
||
| if data_types.is_empty() { | ||
| return Err(DataFusionError::NotImplemented( |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
907cc71 to
199f770
Compare
|
Marking this one as draft as it has bitrotted for a while and I think we need to work on simplifing the array code we have before we add more |
|
I think DataFusion now supports UNNEST -- is this PR still being worked on? |
We support single columns only, the challenging one multiple columns has not yet been supported, I remember @jonahgao is working on it. I forgot which PR you mentioned. |
Not started yet. I just discovered that it hasn't been implemented in #9592 (comment), but I can start working on it from now on. |
|
I think I might have misunderstood. Neither of the two queries below has been implemented, the one discussed in this PR seems to be the latter. DataFusion CLI v37.0.0
❯ select unnest(ARRAY[1,2]), unnest(ARRAY[3,4,5]);
This feature is not implemented: Only support single unnest expression for now
❯ select * from unnest(ARRAY[1,2], ARRAY[2,3]);
This feature is not implemented: unnest() does not support multiple arguments yetI plan to try to implement them both. |
Which issue does this PR close?
Ref #6555
Closes #.
Rationale for this change
Support
unnestfor arraysWhat changes are included in this PR?
select unnest()select * from unnest()datafusion.execution.target_partitions > 1(We might not need this?)Are these changes tested?
Are there any user-facing changes?