Back FreeableBuffer with int64_t by lucylq · Pull Request #14570 · pytorch/executorch

lucylq · 2025-09-25T01:47:00Z

Summary:
This diff backs FreeableBuffer with an int64_t, instead of a void*

Usecase

PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

Potential concerns

Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
Increased runtime latency; calls to the existing void* API perform truncation checks now.

Alternatives

Why we choose this solution? Seems to be the least intrusive out of a number of solutions.

Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Differential Revision: D83007972

pytorch-bot · 2025-09-25T01:47:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14570

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 14be5a9 with merge base 0e74a17 ():

NEW FAILURE - The following job has failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 7c4a677138c7b935348cdf41cd6103355128ad422d346d7ad595b6dec5fd676b /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-25T01:47:10Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

github-actions · 2025-09-25T01:47:47Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

swolchok · 2025-09-25T18:28:22Z

runtime/core/freeable_buffer.h

  FreeFn free_fn_;
+  FreeUInt64Fn free_uint64_fn_;


it looks like at most one free function can be set, and it needs to correspond to the contents of data_, right? if so, I would reorganize things like so:

struct PointerData { FreeFn free_fn_; const void* data_; }; struct Int64Data { FreeUInt64Fn free_fn_; uint64_t data_; }; std::variant<PointerData, Int64Data> data_;

this way we only increase the size of FreeableBuffer by 8 bytes instead of 16.

Thank you! This is really helpful I'll try it.

Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972

facebook-github-bot · 2025-09-26T17:21:14Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972

facebook-github-bot · 2025-09-26T17:23:52Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972

facebook-github-bot · 2025-09-26T17:25:30Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972

facebook-github-bot · 2025-09-26T21:54:41Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972

facebook-github-bot · 2025-10-01T16:49:58Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83007972.

swolchok

feel free to ping me if I don't re-review within a day next time

Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Reviewed By: swolchok Differential Revision: D83007972

meta-codesync · 2025-10-07T17:28:31Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83007972.

zingo · 2025-10-09T10:21:16Z

This seem to have broken the coretex-m size test :(

regressed 70 bytes after #14570

Differential Revision: D83007972 Pull Request resolved: pytorch#14570

regressed 70 bytes after pytorch#14570

lucylq requested review from JacobSzwejbka and swolchok as code owners September 25, 2025 01:47

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 25, 2025

swolchok reviewed Sep 25, 2025

View reviewed changes

lucylq force-pushed the export-D83007972 branch from 47c4504 to 4ed9675 Compare September 26, 2025 17:21

lucylq force-pushed the export-D83007972 branch from 4ed9675 to c9ad791 Compare September 26, 2025 17:23

lucylq force-pushed the export-D83007972 branch from c9ad791 to 80d60cb Compare September 26, 2025 17:25

lucylq force-pushed the export-D83007972 branch from 80d60cb to ea447c3 Compare September 26, 2025 21:54

lucylq force-pushed the export-D83007972 branch from ea447c3 to 4c8a15a Compare October 1, 2025 16:49

lucylq mentioned this pull request Oct 2, 2025

Support int64_t in FreeableBuffer #14408

Closed

swolchok approved these changes Oct 2, 2025

View reviewed changes

lucylq force-pushed the export-D83007972 branch from 4c8a15a to 14be5a9 Compare October 7, 2025 17:28

meta-codesync bot merged commit f32e9fc into pytorch:main Oct 8, 2025
290 of 307 checks passed

zingo mentioned this pull request Oct 9, 2025

Arm backend: Decompose sub/add with alpha!=1 #14932

Merged

lucylq mentioned this pull request Oct 9, 2025

Bump cortex-m size test #14950

Merged

lucylq added a commit that referenced this pull request Oct 9, 2025

Bump cortex-m size test (#14950)

b6884df

regressed 70 bytes after #14570

jirioc pushed a commit to nxp-upstream/executorch that referenced this pull request Dec 19, 2025

Back FreeableBuffer with int64_t

30efc9e

Differential Revision: D83007972 Pull Request resolved: pytorch#14570

jirioc pushed a commit to nxp-upstream/executorch that referenced this pull request Dec 19, 2025

Bump cortex-m size test (pytorch#14950)

39dca02

regressed 70 bytes after pytorch#14570

Conversation

lucylq commented Sep 25, 2025

Usecase

Potential concerns

Alternatives

Uh oh!

pytorch-bot bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14570

❌ 1 New Failure

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

github-actions bot commented Sep 25, 2025

This PR needs a release notes: label

Uh oh!

swolchok Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

lucylq Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 26, 2025

Uh oh!

facebook-github-bot commented Sep 26, 2025

Uh oh!

facebook-github-bot commented Sep 26, 2025

Uh oh!

facebook-github-bot commented Sep 26, 2025

Uh oh!

facebook-github-bot commented Oct 1, 2025

Uh oh!

swolchok left a comment

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Oct 7, 2025

Uh oh!

Uh oh!

zingo commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

This PR needs a `release notes:` label

lucylq Sep 26, 2025 •

edited

Loading