refactor: Use SpillManager for all spilling scenarios#15405
Merged
alamb merged 4 commits intoapache:mainfrom Mar 26, 2025
Merged
refactor: Use SpillManager for all spilling scenarios#15405alamb merged 4 commits intoapache:mainfrom
alamb merged 4 commits intoapache:mainfrom
Conversation
comphead
approved these changes
Mar 25, 2025
Contributor
comphead
left a comment
There was a problem hiding this comment.
Thanks @2010YOUY01 lgtm
I'm also thinking if we should give a spill manager the read spilled files functions?
alamb
approved these changes
Mar 25, 2025
Contributor
alamb
left a comment
There was a problem hiding this comment.
Looks like a very nice refactor to me -- thank you @2010YOUY01
| } | ||
|
|
||
| /// Emit all rows, sort them, and store them on disk. | ||
| /// Emit all intermediate aggregation states, sort them, and store them on disk. |
| let Some(emit) = self.emit(EmitTo::All, true)? else { | ||
| return Ok(()); | ||
| }; | ||
| let sorted = sort_batch(&emit, self.spill_state.spill_expr.as_ref(), None)?; |
Contributor
There was a problem hiding this comment.
eventually it might make sense to have the spill manager handle sorting the runs too (so it could potentially merge multiple files into a single run to reduce fanout, etc
| /// split by `batch_size_rows` | ||
| #[deprecated( | ||
| since = "46.0.0", | ||
| note = "This method is deprecated. Use `SpillManager::spill_record_batch_by_size` instead." |
Contributor
Author
Yes there is already a basic one: In the future if there are more functions to read spilled files, I think they should also be included inside SpillManager
|
Contributor
Author
Contributor
|
Thank you @2010YOUY01 and @comphead |
qstommyshu
pushed a commit
to qstommyshu/datafusion
that referenced
this pull request
Mar 27, 2025
* Use SpillManager in all spilling scenarios * resolve conflict * fix ci format
nirnayroy
pushed a commit
to nirnayroy/datafusion
that referenced
this pull request
May 2, 2025
* Use SpillManager in all spilling scenarios * resolve conflict * fix ci format
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
SpillManagerinAggregateExecandSortMergeJoinExec#15374Rationale for this change
#15355 Introduced
SpillManageras a new interface for spilling related operations, and updateSortExecto use it.This PR update all spilling related operations to use the new
SpillManagerinterface.What changes are included in this PR?
AggregateExec,SortMergeJoinExec]:SpillMetricsspill_managerinsideSpillManagerinterfaceAre these changes tested?
Existing tests.
Are there any user-facing changes?
No