-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Enforce explicit opt-in for WITHIN GROUP syntax in aggregate UDAFs
#18607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
68cf7ae
915e3bb
1679849
4f1431a
5c33aa0
fcdce76
5eaaddc
9e71211
45bd1da
62cc368
a2968ab
494142f
1811287
f79715a
193c6fc
a58a27f
2c2ff39
752c78c
333a3b8
1265c4b
566508b
c5dd922
a28b463
6e10ca1
b7183de
2366a36
b11e66d
6a90e31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -48,6 +48,36 @@ FROM employees; | |
|
|
||
| Note: When no rows pass the filter, `COUNT` returns `0` while `SUM`/`AVG`/`MIN`/`MAX` return `NULL`. | ||
|
|
||
| ## WITHIN GROUP / Ordered-set aggregates | ||
|
|
||
| Some aggregate functions accept the SQL `WITHIN GROUP (ORDER BY ...)` clause to specify the ordering the | ||
| aggregate relies on. In DataFusion this is opt-in: only aggregate functions whose implementation returns | ||
| `true` from `AggregateUDFImpl::supports_within_group_clause()` accept the `WITHIN GROUP` clause. Attempting to | ||
| use `WITHIN GROUP` with a regular aggregate (for example, `SELECT SUM(x) WITHIN GROUP (ORDER BY x)`) will fail | ||
| during planning with an error: "WITHIN GROUP is only supported for ordered-set aggregate functions". | ||
|
|
||
| Currently, the built-in aggregate functions that support `WITHIN GROUP` are: | ||
|
|
||
| - `percentile_cont` — exact percentile aggregate (also available as `percentile_cont(column, percentile)`) | ||
| - `approx_percentile_cont` — approximate percentile using the t-digest algorithm | ||
| - `approx_percentile_cont_with_weight` — approximate weighted percentile using the t-digest algorithm | ||
|
|
||
| Note: rank-like functions such as `rank()`, `dense_rank()`, and `percent_rank()` are window functions and | ||
| use the `OVER (...)` clause; they are not ordered-set aggregates that accept `WITHIN GROUP` in DataFusion. | ||
|
|
||
| Example (ordered-set aggregate): | ||
|
|
||
| ```sql | ||
| percentile_cont(0.5) WITHIN GROUP (ORDER BY value) | ||
| ``` | ||
|
|
||
| Example (invalid usage — planner will error): | ||
|
|
||
| ```sql | ||
| -- This will fail: SUM is not an ordered-set aggregate | ||
| SELECT SUM(x) WITHIN GROUP (ORDER BY x) FROM t; | ||
| ``` | ||
|
Comment on lines
51
to
79
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we decide to add this to our documentation we should be explicit about which functions are ordered set aggregates. As it is, the user cannot tell by this alone.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I thought this part gave a good indication of ordered set aggregate functions.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem here is it is not an explicit list, and refers to rank as an ordered-set aggregation when it isn't in DataFusion. My concern here is that it'll inform the users that ordered-set aggregate functions exist, but isn't exactly clear which ones we have. |
||
|
|
||
| ## General Functions | ||
|
|
||
| - [array_agg](#array_agg) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure adding these tests here are necessary; SLTs should be sufficient and I believe some of the tests added here are already in SLTs, e.g.
datafusion/datafusion/sqllogictest/test_files/aggregate.slt
Lines 174 to 179 in d57e215
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the tests that are already covered in slt.