Back FreeableBuffer with int64_t#14570
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14570
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 14be5a9 with merge base 0e74a17 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
runtime/core/freeable_buffer.h
Outdated
| FreeFn free_fn_; | ||
| FreeUInt64Fn free_uint64_fn_; |
There was a problem hiding this comment.
it looks like at most one free function can be set, and it needs to correspond to the contents of data_, right? if so, I would reorganize things like so:
struct PointerData {
FreeFn free_fn_;
const void* data_;
};
struct Int64Data {
FreeUInt64Fn free_fn_;
uint64_t data_;
};
std::variant<PointerData, Int64Data> data_;
this way we only increase the size of FreeableBuffer by 8 bytes instead of 16.
There was a problem hiding this comment.
Thank you! This is really helpful I'll try it.
47c4504 to
4ed9675
Compare
Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972
Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972
4ed9675 to
c9ad791
Compare
Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972
c9ad791 to
80d60cb
Compare
Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972
80d60cb to
ea447c3
Compare
ea447c3 to
4c8a15a
Compare
Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Differential Revision: D83007972
swolchok
left a comment
There was a problem hiding this comment.
feel free to ping me if I don't re-review within a day next time
Summary: This diff backs FreeableBuffer with an int64_t, instead of a void* ## Usecase PTE file lives on a 32-bit system, where void* is 4 bytes. PTD file lives on a 36-bit system, which requires int64_t to address it. We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system. Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map. ## Potential concerns * Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small. * Increased runtime latency; calls to the existing void* API perform truncation checks now. ## Alternatives Why we choose this solution? Seems to be the least intrusive out of a number of solutions. 1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist. 2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating. 3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from. 4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface. https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0 Reviewed By: swolchok Differential Revision: D83007972
4c8a15a to
14be5a9
Compare
|
This seem to have broken the coretex-m size test :( |
Differential Revision: D83007972 Pull Request resolved: pytorch#14570
regressed 70 bytes after pytorch#14570
Summary:
This diff backs FreeableBuffer with an int64_t, instead of a void*
Usecase
PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.
We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.
Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.
Potential concerns
Alternatives
Why we choose this solution? Seems to be the least intrusive out of a number of solutions.
https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0
Differential Revision: D83007972