Fix array_agg memory over use by gabotechs · Pull Request #16346 · apache/datafusion

gabotechs · 2025-06-09T12:45:46Z

Which issue does this PR close?

Closes #.

Rationale for this change

The different accumulators of the array_agg function store certain scalar values as part of their state, and for the same reason that the following PR

fix: overcounting of memory in first/last. #15924

was needed for the first/last functions, it is also needed here.

What changes are included in this PR?

Reuses the tooling shipped in #15924 for compacting scalar values for the different array_agg accumulators

Are these changes tested?

yes, by new and existing tests

Are there any user-facing changes?

If users are using a bounded memory pool, they might stop seeing certain errors due to failed memory allocations

datafusion/functions-aggregate/src/array_agg.rs

LiaCastaneda · 2025-06-09T13:51:15Z

datafusion/common/src/scalar/mod.rs

+    /// Compacts ([ScalarValue::compact]) the current [ScalarValue] and returns it.
+    pub fn compacted(mut self) -> Self {


would there be ay benefit in adding #[inline] this since its a small function?

🤔 I don't have enough evidence to justify that #[inline] is better here, the function is not really in the hot path of any operation, if you ask me I'd just trust the compiler to do what's right.

alamb

Thanks @gabotechs and @LiaCastaneda -- this makes sense to me.

I also made a small PR to improve the docs too

#16361

alamb · 2025-06-11T00:47:40Z

datafusion/functions-aggregate/src/array_agg.rs

        Ok(())
    }

+    #[test]


I verified these tests cover the code in this PR -- they fail without the changes in the PR

assertion `left == right` failed left: 2652 right: 732

alamb · 2025-06-11T01:04:11Z

datafusion/functions-aggregate/src/array_agg.rs

+            // storing it here directly copied/compacted avoids over accounting memory
+            // not used here.
+            self.values
+                .push(make_array(copy_array_data(&val.to_data())));


I found this code confusing at first too so I tried to add some additional documentation

Document copy_array_data function with example #16361

Another thing I found might make this code easier to understand would be to refactor this into a function so it looks more like

Suggested change

.push(make_array(copy_array_data(&val.to_data())));

.push(copy_array(val))

Or something like that

/// Copies an array to a new array with mimimal memory overhead fn copy_array(array: &dyn Array) -> ArrayRef { .. }

Or something like that .

This is definitely not required just something that occured to me while reviewing

alamb · 2025-06-11T13:04:15Z

Thanks again @gabotechs and @LiaCastaneda

* Fix array_agg memory over accounting * Add comment (cherry picked from commit 8a2d618)

* Fix array_agg memory over use (apache#16346) * Fix array_agg memory over accounting * Add comment (cherry picked from commit 8a2d618) * Fix test

Fix array_agg memory over accounting

33df7e2

gabotechs commented Jun 9, 2025

View reviewed changes

datafusion/functions-aggregate/src/array_agg.rs Show resolved Hide resolved

LiaCastaneda reviewed Jun 9, 2025

View reviewed changes

Add comment

f5eba19

github-actions bot added common Related to common crate functions Changes to functions implementation labels Jun 9, 2025

alamb changed the title ~~Fix array_agg memory over accounting~~ Fix array_agg memory over use Jun 11, 2025

alamb mentioned this pull request Jun 11, 2025

Document copy_array_data function with example #16361

Merged

alamb approved these changes Jun 11, 2025

View reviewed changes

alamb merged commit 8a2d618 into apache:main Jun 11, 2025
29 checks passed

gabotechs added a commit to DataDog/datafusion that referenced this pull request Jun 12, 2025

Fix array_agg memory over use (apache#16346)

3006a14

* Fix array_agg memory over accounting * Add comment (cherry picked from commit 8a2d618)

gabotechs mentioned this pull request Jun 12, 2025

[branch-48] Fix array_agg memory overaccounting DataDog/datafusion#31

Merged

This was referenced Jul 18, 2025

Address memory over-accounting in array_agg #16816

Merged

[DISCUSSION] Memory accounting model discussion #16841

Open

This was referenced Sep 29, 2025

Implement GroupsAccumulator for array_agg aggregation function #10145

Open

A collection of array_agg improvements and issues #17829

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix array_agg memory over use#16346

Fix array_agg memory over use#16346
alamb merged 2 commits intoapache:mainfrom
gabotechs:fix-array-agg-memory-over-accounting

gabotechs commented Jun 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

LiaCastaneda Jun 9, 2025

Uh oh!

gabotechs Jun 10, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Jun 11, 2025

Uh oh!

alamb Jun 11, 2025

Uh oh!

Uh oh!

alamb commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		/// Compacts ([ScalarValue::compact]) the current [ScalarValue] and returns it.
		pub fn compacted(mut self) -> Self {

                       Ok(())
                   }
+                  #[test]

	.push(make_array(copy_array_data(&val.to_data())));
	.push(copy_array(val))

Conversation

gabotechs commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

LiaCastaneda Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

gabotechs Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gabotechs commented Jun 9, 2025 •

edited

Loading