[Executorch] Use temp allocator for allocating scratch memory#15728
[Executorch] Use temp allocator for allocating scratch memory#15728meta-codesync[bot] merged 28 commits intogh/kimishpatel/211/basefrom
Conversation
This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15728
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit af048bd with merge base a09a4b7 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
There was a problem hiding this comment.
Pull Request Overview
This PR refactors memory allocation in the Flash Attention implementation to use the temporary memory allocator from RuntimeContext instead of stack-allocated std::vector objects. This enables the use of caching allocators when available, reducing allocation overhead.
- Adds
RuntimeContext& ctxparameter tocpu_flash_attentionfunction - Replaces stack-allocated vectors with
ctx.allocate_temp()calls with fallback to heap allocation - Removes unnecessary
buf_reducedallocation (dead code for unsupported reduced types) - Updates all call sites to pass the
RuntimeContext
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
extension/llm/custom_ops/op_sdpa_impl.h |
Modified cpu_flash_attention signature to accept RuntimeContext, replaced vector allocations with temp allocator calls |
extension/llm/custom_ops/op_sdpa.cpp |
Updated all call sites (6 locations) to pass ctx parameter to cpu_flash_attention |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| namespace sdpa::impl { | ||
|
|
||
| static std::vector<char> scratch_for_quant_dequant_vec; |
There was a problem hiding this comment.
This static vector scratch_for_quant_dequant_vec is declared but never used in the code. It appears to be a leftover from the refactoring where the local vector was replaced with the temp allocator approach. This should be removed.
| static std::vector<char> scratch_for_quant_dequant_vec; |
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
Pull Request resolved: #15728 This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. ghstack-source-id: 325470498 @exported-using-ghexport Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/)
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
Pull Request resolved: #15728 This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. ghstack-source-id: 325471916 @exported-using-ghexport Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/)
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
Pull Request resolved: #15728 This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. ghstack-source-id: 325655867 @exported-using-ghexport Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/)
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
…ory" This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
765bc5e
into
gh/kimishpatel/211/base
Pull Request resolved: #15728 This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. ghstack-source-id: 327518050 @exported-using-ghexport Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/)
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #15728 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/211/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/211/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/203/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/211/orig Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) @diff-train-skip-merge Co-authored-by: Kimish Patel <kimishpatel@fb.com>
…h#16121) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#15728 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/211/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/211/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/203/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/211/orig Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) @diff-train-skip-merge Co-authored-by: Kimish Patel <kimishpatel@fb.com>
Stack from ghstack (oldest at bottom):
This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead.
Differential Revision: D85532076