Use to_edge_lower_and_transform for XNNPack#8624
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8624
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 951d91e with merge base 77589c6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Have you tested the performance of a model (like llama 3B), before and after? Asking because export_llama is used by different users to prepare for the .pte files in cpu. Please make sure there's no perf regress. |
|
Running on demand perf benchmark here: https://github.com/pytorch/executorch/actions/runs/13505211116 |
| ) | ||
| if args.verbose: | ||
| print_delegation_info(builder.edge_manager.exported_program().graph_module) | ||
| if args.num_sharding > 0 and args.qnn: |
There was a problem hiding this comment.
This code shouldn't be here right?
There was a problem hiding this comment.
Oh yeah technically should remove since qnn. Will remove after benchmarking finishes
|
@iseeyuan please see the performance benchmark graph I posted in the pr description |
|
Can you run internal CI tests before merging? Otherwise looks good. |
|
@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Use `to_edge_transform_and_lower` in `export_llama` for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in `export_llama` in the args, although I'm not sure if that is happening anywhere at the moment. Closes #8621 Performance regression benchmarking for xnnpack (on android) vs. past 3 days: <img width="1427" alt="Screenshot 2025-02-24 at 11 39 52 AM" src="https://github.com/user-attachments/assets/1640cf2c-a579-491f-8940-7ccfbe464903" /> These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges. Test Plan: See if CI passes Differential Revision: D70124742 Pulled By: jackzhxng
b30733d to
1b0c5e4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D70124742 |
|
@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Use `to_edge_transform_and_lower` in `export_llama` for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in `export_llama` in the args, although I'm not sure if that is happening anywhere at the moment. Closes #8621 Performance regression benchmarking for xnnpack (on android) vs. past 3 days: <img width="1427" alt="Screenshot 2025-02-24 at 11 39 52 AM" src="https://github.com/user-attachments/assets/1640cf2c-a579-491f-8940-7ccfbe464903" /> These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges. Test Plan: See if CI passes Differential Revision: D70124742 Pulled By: jackzhxng
930ec0e to
951d91e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D70124742 |
Differential Revision: D70221944 Pull Request resolved: #8717
Summary
Use
to_edge_transform_and_lowerinexport_llamafor XNNPack. As part of these changes, this also means that you cannot specify multiple backends inexport_llamain the args, although I'm not sure if that is happening anywhere at the moment.Closes #8621
Performance regression benchmarking for xnnpack (on android) vs. past 3 days:

These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges.
Test plan
See if CI passes