Implement array_agg aggregate function#1300
Conversation
| Ok(vec![self.array.clone()]) | ||
| } | ||
|
|
||
| fn update(&mut self, values: &[ScalarValue]) -> Result<()> { |
There was a problem hiding this comment.
Did you leave update_batch and merge_batch on purpose?
I think at this point it is hard to think of a much more efficient implementation (avoiding converting every item to scalars), given that we don't have columnar storage for aggregates yet.
There was a problem hiding this comment.
yea, I just rely on the default implementation.
|
|
||
| #[derive(Debug)] | ||
| pub(crate) struct ArrayAggAccumulator { | ||
| array: ScalarValue, |
There was a problem hiding this comment.
I think this could be Vec<ScalarValue> instead.
You only need to convert it to a List in evaluate and state leading to easier update implementation.
| Ok(Field::new( | ||
| &self.name, | ||
| DataType::List(Box::new(Field::new( | ||
| "element", |
There was a problem hiding this comment.
There was a problem hiding this comment.
thanks for catching this. should be item to be consistent.
| Ok(vec![Field::new( | ||
| &format_state_name(&self.name, "array_agg"), | ||
| DataType::List(Box::new(Field::new( | ||
| "item", |
jimexist
left a comment
There was a problem hiding this comment.
thanks for the pull request. can you add sql test and psql comparison integration tests?
i'd like to cover 0-/1-/2-element cases and also with a group by to see how it's handled differently
alamb
left a comment
There was a problem hiding this comment.
Thank you @liukun4515 -- I agree with @jimexist that having the end to end tests (aka showing that one can invoke array_agg from SQL) is quite important. Otherwise this PR is 👍 to me
capkurmagati
left a comment
There was a problem hiding this comment.
Thanks for the contribution.
I'm not sure if it's better to do in another PR, but both Trino and PostgreSQL supports order by.
Do you consider to support it?
Maybe that can be put into a separate pull request? I agree that it is useful to add |
|
Thanks @jimexist @alamb @capkurmagati Let me add some end to end tests. I will try to add |
| Ok(()) | ||
| } | ||
|
|
||
| fn merge(&mut self, states: &[ScalarValue]) -> Result<()> { |
There was a problem hiding this comment.
BTW, after we changed the state from a ScalarValue to a Vec<ScalarValue> in early commit, we also need to update how merge works. It cannot call update directly now. Found this when adding e2e tests.
| return Ok(()); | ||
| }; | ||
|
|
||
| match &states[0] { |
There was a problem hiding this comment.
I think it would makes sense to assert here that states.len() == 1? so we don't (silently) end up ignoring any other items that might be added (accidentally / erroniously)
datafusion/tests/sql.rs
Outdated
| let mut ctx = ExecutionContext::new(); | ||
| register_aggregate_csv(&mut ctx).await?; | ||
| let sql = | ||
| "SELECT array_agg(c13) FROM (SELECT * FROM aggregate_test_100 LIMIT 2) test"; |
There was a problem hiding this comment.
can you please add an ORDER BY to this query - as written it may be non deterministic (the first two values returned are not guaranteed to be in any order):
| "SELECT array_agg(c13) FROM (SELECT * FROM aggregate_test_100 LIMIT 2) test"; | |
| "SELECT array_agg(c13) FROM (SELECT * FROM aggregate_test_100 ORDER BY c13 LIMIT 2) test"; |
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn csv_query_array_agg_empty() -> Result<()> { |
datafusion/tests/sql.rs
Outdated
| let mut ctx = ExecutionContext::new(); | ||
| register_aggregate_csv(&mut ctx).await?; | ||
| let sql = | ||
| "SELECT array_agg(c13) FROM (SELECT * FROM aggregate_test_100 LIMIT 1) test"; |
There was a problem hiding this comment.
same comment here about needing an ORDER BY please
* Implement array_agg aggregate function. * Avoid copying. * Fix clippy. * For review comment. * Add e2e tests. * Add assert and order by.
|
Thanks @houqp @jimexist @Dandandan @alamb @capkurmagati ! |
array_agg aggregate function
Which issue does this PR close?
Closes #1085.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?