Skip to content

Mark ARRAY_AGG(DISTINCT ...) not implemented#1534

Merged
alamb merged 1 commit intoapache:masterfrom
james727:array_distinct_notimpl
Jan 11, 2022
Merged

Mark ARRAY_AGG(DISTINCT ...) not implemented#1534
alamb merged 1 commit intoapache:masterfrom
james727:array_distinct_notimpl

Conversation

@james727
Copy link
Contributor

@james727 james727 commented Jan 9, 2022

Which issue does this PR close?

This is the last change required to close #1512

Rationale for this change

Right now array_agg(distinct ...) doesn't work. The physical plan construction logic uses the non-distinct array_agg whether or not distinct was specified. Interestingly enough it still works correctly under certain conditions, due to the SingleDistinctToGroupBy optimizer rule.

As an example, consider the following queries:

--- Works, since logical plan is rewritten with a subquery and non-distinct agg. Will return:
--- +------------------------------------------+
--- | ARRAYAGG(DISTINCT aggregate_test_100.c2) |
--- +------------------------------------------+
--- | [2, 3, 5, 1, 4]                          |
--- +------------------------------------------+
SELECT array_agg(DISTINCT c2) FROM aggregate_test_100;

--- Returns incorrect results, since SingleDistinctToGroupBy optimizer rule does not apply:
--- +--------------------------------------------------------------------------+
--- | ARRAYAGG(DISTINCT aggregate_test_100.c2)      | COUNT(DISTINCT UInt8(1)) |
--- +--------------------------------------------------------------------------+
--- | [2, 5, 1, 1, 5, 4, 3, 3, 1, 4, 1, 4, 3, ...]  | 1                        |
--- +--------------------------------------------------------------------------+
SELECT array_agg(DISTINCT c2), count(distinct 1) FROM aggregate_test_100;

After this change distinct array agg will throw an error when SingleDistinctToGroupBy does not apply.

I'm planning on working on actually implementing distinct array_agg after this, but figured this was worth fixing for now.

What changes are included in this PR?

This marks the aggregation not implemented, and adds a block for testing count/approx distinct/array agg with distinct = true.

@james727 james727 force-pushed the array_distinct_notimpl branch from 33a5b1a to dda9b5a Compare January 9, 2022 20:02
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @james727

do distinct aggregates make sense for array aggregates? If they make sense could you possibly file at ticket to track implementing them?

@alamb alamb merged commit b05feda into apache:master Jan 11, 2022
@james727 james727 deleted the array_distinct_notimpl branch January 11, 2022 19:50
@james727
Copy link
Contributor Author

@alamb yep - they do! I think #1323 covers it but I can add another ticket and close the set_agg one if easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distinct aggregations work only partially

2 participants