-
Notifications
You must be signed in to change notification settings - Fork 196
Make UNION Parallel. #1213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make UNION Parallel. #1213
Conversation
|
Got some crash and failures in previous commit. The RCA is: This should be another issue which is exposed by this PR, keep them as two commits. |
Fixed in 32278d2 |
my-ship-it
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit implements improvements to the handling of UNION operations
in CBDB, specifically addressing challenges related to Parallel Append
and Motion nodes within subqueries. We have disabled Parallel Append for
UNION operations to prevent incorrect results caused by competition
among workers for subnodes. This change mitigates the risk of premature
task completion, which previously led to data loss in scenarios
involving Motion Senders.
To further enhance parallel processing capabilities, we have introduced
a Parallel-oblivious Append approach. This allows multiple workers to
operate independently without sharing state, eliminating the
coordination issues associated with Parallel-aware Append strategies.
By implementing these changes, we improve the reliability and
correctness of UNION operations while maintaining overall system
performance. This positions CBDB to effectively support parallel
processing in a safer manner.
select distinct a from t_distinct_0 union select distinct b from
t_distinct_0;
QUERY PLAN
----------------------------------------------------------------------
Gather Motion 6:1 (slice1; segments: 6)
-> HashAggregate
Group Key: t_distinct_0.a
-> Redistribute Motion 6:6 (slice2; segments: 6)
Hash Key: t_distinct_0.a
Hash Module: 3
-> Append
-> GroupAggregate
Group Key: t_distinct_0.a
-> Sort
Sort Key: t_distinct_0.a
-> Redistribute Motion 6:6 (slice3;
segments: 6)
Hash Key: t_distinct_0.a
Hash Module: 3
-> Streaming HashAggregate
Group Key: t_distinct_0.a
-> Parallel Seq Scan on
t_distinct_0
-> GroupAggregate
Group Key: t_distinct_0_1.b
-> Sort
Sort Key: t_distinct_0_1.b
-> Redistribute Motion 6:6 (slice4;
segments: 6)
Hash Key: t_distinct_0_1.b
Hash Module: 3
-> Streaming HashAggregate
Group Key: t_distinct_0_1.b
-> Parallel Seq Scan on
t_distinct_0 t_distinct_0_1
Authored-by: Zhang Mingli [email protected]
Corrected the fetching of the upper UPPERREL_SETOP relation to avoid NULL bitmaps, which caused consider_parallel to be false. This inconsistency led to assertion failures in add_partial_path. Ensured the correct relation IDs are set before entering the function make_union_unique. Authored-by: Zhang Mingli [email protected]
This commit implements improvements to the handling of UNION operations in CBDB, specifically addressing challenges related to Parallel Append and Motion nodes within subqueries. We have disabled Parallel Append for UNION operations to prevent incorrect results caused by competition among workers for subnodes. This change mitigates the risk of premature task completion, which previously led to data loss in scenarios involving Motion Senders.
To further enhance parallel processing capabilities, we have introduced a Parallel-oblivious Append approach. This allows multiple workers to operate independently without sharing state, eliminating the coordination issues associated with Parallel-aware Append strategies.
By implementing these changes, we improve the reliability and correctness of UNION operations while maintaining overall system performance. This positions CBDB to effectively support parallel processing in a safer manner.
performance
see case[0] below
no-parallel (3031.346 ms)
4-parallel UNION (1226.660 ms)
case[0]
Authored-by: Zhang Mingli [email protected]
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions