Add a fast path for _clone_dim_order by GregoryComer · Pull Request #15815 · pytorch/executorch

GregoryComer · 2025-11-13T19:44:01Z

Summary

Add a direct memcpy fast path for the portable _clone_dim_order op, as it can be a performance bottleneck. I'd like to more aggressively optimize these out of the graph, but this fast path should reduce the perf impact significantly.

Test plan

Existing correctness tests for the _clone_dim_order implementation should cover it.

For performance, I did a quick test with a default dim order (1, 128, 256, 256) element tensor on an x86 server. This is mainly intended as a quick smoke test and not a proper benchmark. I included numbers for both optimized and debug builds. Optimized matters more, but super long debug runs can be painful for development.

[Optimized Build]
Before: 27.9 ms
After: 6.4 ms

[Debug Build]
Before: 5947.01 ms
After: 7.2 ms

pytorch-bot · 2025-11-13T19:44:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15815

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 421c6dc with merge base e774b77 ():

NEW FAILURE - The following job has failed:

pull / test-moshi-linux / linux-job (gh)
RuntimeError: Command docker exec -t 76ba0d56cd2c18e8e2b8adde386c1a39d7340c44ed31634379e4ff35ca2ba2e2 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-11-13T19:44:46Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2025-11-13T19:57:51Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D86993338.

GregoryComer · 2025-11-13T23:18:32Z

Note that the moshi failure is pre-existing.

Gasoonjia

LGTM!

### Summary Add a direct memcpy fast path for the portable _clone_dim_order op, as it can be a performance bottleneck. I'd like to more aggressively optimize these out of the graph, but this fast path should reduce the perf impact significantly. ### Test plan Existing correctness tests for the _clone_dim_order implementation should cover it. For performance, I did a quick test with a default dim order (1, 128, 256, 256) element tensor on an x86 server. This is mainly intended as a quick smoke test and not a proper benchmark. I included numbers for both optimized and debug builds. Optimized matters more, but super long debug runs can be painful for development. [Optimized Build] Before: 27.9 ms After: 6.4 ms [Debug Build] Before: 5947.01 ms After: 7.2 ms

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2025

GregoryComer requested a review from Gasoonjia November 13, 2025 19:50

GregoryComer force-pushed the clone-dim-order-fast-path branch from 3f1cb30 to 929d52b Compare November 13, 2025 19:53

Add a fast path for _clone_dim_order

421c6dc

GregoryComer force-pushed the clone-dim-order-fast-path branch from 929d52b to 421c6dc Compare November 13, 2025 20:09

GregoryComer marked this pull request as ready for review November 13, 2025 21:35

GregoryComer requested a review from manuelcandales as a code owner November 13, 2025 21:35

Gasoonjia approved these changes Nov 14, 2025

View reviewed changes

GregoryComer merged commit 0704ae3 into pytorch:main Nov 14, 2025
231 of 234 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fast path for _clone_dim_order#15815

Add a fast path for _clone_dim_order#15815
GregoryComer merged 1 commit intopytorch:mainfrom
GregoryComer:clone-dim-order-fast-path

GregoryComer commented Nov 13, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

meta-codesync bot commented Nov 13, 2025

Uh oh!

GregoryComer commented Nov 13, 2025

Uh oh!

Gasoonjia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GregoryComer commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15815

❌ 1 New Failure

Uh oh!

github-actions bot commented Nov 13, 2025

This PR needs a release notes: label

Uh oh!

meta-codesync bot commented Nov 13, 2025

Uh oh!

GregoryComer commented Nov 13, 2025

Uh oh!

Gasoonjia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GregoryComer commented Nov 13, 2025 •

edited

Loading

pytorch-bot bot commented Nov 13, 2025 •

edited

Loading

This PR needs a `release notes:` label