-
Notifications
You must be signed in to change notification settings - Fork 196
Optimize DISTINCT, ORDER BY and DISTINCT ON when Aggregation without Group By. #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Many plan diffs, will fix later. |
4a01111 to
1d31349
Compare
SRF will break the assumption. select count(*), generate_series(1, 4) from t1;
count | generate_series
-------+-----------------
3 | 1
3 | 2
3 | 3
3 | 4
(4 rows)Fix it and Postgres' |
|
I took a look at orca, it has already optimized Even if with as distinct is a function which only works in a group. The function called |
Yeah, see #677 (reply in thread) |
1d31349 to
66614af
Compare
Orca removed the distinct expression when it is used on the agg expression even if there is group by clause, do we need to consider that? |
We can consider that type of optimization in the future. |
66614af to
96d4e42
Compare
For query which has Aggregation but without Group by clause, the
DISTINCT/DISTINCT ON/ORDER BY clause could be removed as there would
be one row returned at most.
And there is no necessary to do unique or sort.
This can simply the plan, and process less Aggref nodes during planner.
select distinct on(count(b), count(c)) count(a), sum(b) from
t_distinct_sort order by count(c);
QUERY PLAN
--------------------------------------------------------------------
Unique
Output: (count(a)), (sum(b)), (count(c)), (count(b))
Group Key: (count(c)), (count(b))
-> Sort
Output: (count(a)), (sum(b)), (count(c)), (count(b))
Sort Key: (count(t_distinct_sort.c)),
(count(t_distinct_sort.b))
-> Finalize Aggregate
Output: count(a), sum(b), count(c), count(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b)),
(PARTIAL count(c)), (PARTIAL count(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b),
PARTIAL count(c), PARTIAL count(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
After this commit:
select distinct on(count(b), count(c)) count(a), sum(b) from
t_distinct_sort order by count(c);
QUERY PLAN
--------------------------------------------------------
Finalize Aggregate
Output: count(a), sum(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Optimizer: Postgres query optimizer
Authored-by: Zhang Mingli [email protected]
96d4e42 to
d53a163
Compare
|
LGTM! |
|
Many thanks @fanfuxiaoran for detailed review! |
For query which has Aggregation but without Group by clause, the DISTINCT/DISTINCT ON/ORDER BY clause could be removed as there would be one row returned at most.
And there is no necessary to do unique or sort.
This can simply the plan, and process less expressions like: Aggref nodes during planner.
DISTINCT
After this commit:
DISTINCT ON and ORDER BY
After this commit:
ORDER BY
After this commit:
DISTINCT and ORDER BY
After this commit:
Authored-by: Zhang Mingli [email protected]
fix #ISSUE_Number
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheckmake -C src/test installcheck-cbdb-parallelcloudberrydb/devteam for review and approval when your PR is ready🥳