JIT: form ldp/stp for adjacent Arm64 indexed accesses#129910
JIT: form ldp/stp for adjacent Arm64 indexed accesses#129910AndyAyersMS wants to merge 2 commits into
Conversation
Arm64 can't encode [base, index, #off], so a base+index used at multiple offsets must be materialized anyway; re-enable CSE for it before pairing so adjacent loads/stores fold into ldp/stp. Fixes dotnet#93263. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the CoreCLR Arm64 JIT CSE phase to selectively re-enable CSE for certain base + index address subexpressions so they can be materialized once and reused, enabling adjacent Arm64 indexed accesses to fold into ldp/stp pairs. It also adds a new JIT disasm-based regression test covering the expected pairing and address reuse patterns.
Changes:
- Add an Arm64-only pre-pass in
optOptimizeValnumCSEsto clearGTF_ADDRMODE_NO_CSEfor profitablebase + indexaddress expressions. - Add a new JIT opt test that checks for
ldp/stpformation and (heuristically) shared base registers for multiple-offset accesses. - Wire the new helper into the compiler interface (
compiler.h).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/coreclr/jit/optcse.cpp | Adds Arm64-only logic to re-enable CSE for certain base+index address expressions ahead of valnum CSE. |
| src/coreclr/jit/compiler.h | Declares the new Arm64-only helper on Compiler. |
| src/tests/JIT/opt/AddrMode/LdpStpPairing.cs | Adds disasm-based test cases for Arm64 ldp/stp pairing and shared-address reuse scenarios. |
| src/tests/JIT/opt/AddrMode/LdpStpPairing.csproj | Adds the new test project with disasm checking enabled and required environment variables set. |
|
diffs -- not much ldp/stp formation, but quite a bit of improved address mode usage. TP hist (~0.1%) seems reasonable. |
Only re-enable CSE for a genuine base+index (non-constant index); a base+constant is already free as [base, #imm], so CSE'ing it regressed struct-field/unbox patterns. Also compute offset adjacency in O(n). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Fixed review nits, addressed a few of the regressions (require index to be non-const). |
|
latest diffs |
|
It looks like the diffs are heavily dominated by crossgen, do we think it's worth the complexity for non-R2R scenario? Or maybe we should never bother with DONT_CSE for addressing modes and let CSE phase to recognize it |
Yeah not sure it is worth it just for crossgen. I believe GTF_ADDRMODE_NO_CSE is just a CQ hint, and it may be too broadly applied (or as you suggest possibly entirely unnnceessary on address modes). A first step might be to introduce a stress mode that simply ignores it and make sure it is indeed just a CQ hint. |
Arm64 can't encode [base, index, #off], so a base+index used at multiple offsets must be materialized anyway; re-enable CSE for it before pairing so adjacent loads/stores fold into ldp/stp.
Fixes #93263.