[ARM32] Eliminate red zone usage in runtime stubs#129398
Conversation
On ARM32 Linux, the area below SP is not guaranteed to be preserved across signal delivery. Replace red zone reads/writes with explicit stack adjustments (push/pop) in: - NativeAOT interop thunks (ldr pc dispatch, no stack intermediate) - NativeAOT UniversalTransition (caller pushes args onto stack) - NativeAOT interface dispatch stubs (PROLOG_STACK_ALLOC instead of sub-SP stores) - CoreCLR VTableCallStub (pre-indexed str/post-indexed ldr) Guarded by FEATURE_AVOID_RED_ZONE, enabled for ARM32 non-Windows targets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @agocke, @dotnet/ilc-contrib |
Windows ARM32 is no longer supported. |
Windows ARM32 is no longer supported, so every ARM32 target is Linux. The red zone avoidance is always needed — remove the preprocessor guard and delete the old red zone code paths entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7cc9b73 to
59bc77c
Compare
The ldr pc dispatch needs only 12 bytes (mov r12 + ldr pc), no padding required. This increases thunks per page from 204 to 341 (67% more). Also shorten verbose comments per review feedback. Co-authored-by: Jan Kotas <jkotas@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- StubDispatch: use PROLOG_PUSH/EPILOG_POP {r1,r2} instead of manual
STACK_ALLOC + str/ldr
- UniversalTransition: replace interleaved ldr/push dance with a single
PROLOG_PUSH {r0-r3} then load caller args from known stack offsets
- Clean up stale red zone comments
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
59bc77c to
87288df
Compare
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Segfaults in many linux arm32 NAOT tests Could you please take a look? |
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@MichalStrehovsky PTLA |
There was a problem hiding this comment.
Pull request overview
This PR updates several ARM32 stubs and NativeAOT transitions to avoid writing below sp (red zone) by switching to explicit stack adjustments (push/pop / stack alloc), and updates related thunk/transition conventions accordingly.
Changes:
- CoreCLR ARM32 interface/vtable-related stubs: replace red-zone saves/restores with stack-based sequences.
- NativeAOT ARM32 thunk and interop paths: shrink thunk stubs by branching via
ldr pcwhile preservingr12as the thunk data pointer, and adjustRhCommonStubaccordingly. - NativeAOT ARM32 universal transition: change extra-argument passing to caller-pushed stack args and update the corresponding stack frame layout and unwind helper logic.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/arm/virtualcallstubcpu.hpp | Updates VTableCall stub encoding/size logic to use push/pop-style stack ops instead of red zone. |
| src/coreclr/runtime/arm/StubDispatch.S | Replaces red-zone register spills in cached interface dispatch stubs and adjusts slow-path arg passing to universal transition. |
| src/coreclr/nativeaot/Runtime/ThunksMapping.cpp | Changes ARM thunk stub shape and size to branch via ldr pc and keep r12 as data pointer. |
| src/coreclr/nativeaot/Runtime/StackFrameIterator.cpp | Updates ARM universal transition stack frame layout to account for caller-pushed extra args. |
| src/coreclr/nativeaot/Runtime/EHHelpers.cpp | Adjusts ARM unwind helper to compensate for new interface dispatch stack usage on null-this AV. |
| src/coreclr/nativeaot/Runtime/arm/UniversalTransition.S | Switches universal transition extra args from red zone to caller-pushed stack args and updates prolog/epilog accordingly. |
| src/coreclr/nativeaot/Runtime/arm/InteropThunksHelpers.S | Updates RhCommonStub to consume r12 directly (no red-zone load). |
| src/coreclr/nativeaot/Runtime/arm/DispatchResolve.S | Replaces red-zone spills with stack pushes and updates slow-path argument setup for universal transition. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
All Arm32 test failures are known |
| uint64_t m_fpArgRegs[8]; // ChildSP+008 CallerSP-078 (0x40 bytes) (d0-d7) | ||
| uint64_t m_returnBlock[4]; // ChildSP+048 CallerSP-038 (0x20 bytes) | ||
| uintptr_t m_intArgRegs[4]; // ChildSP+068 CallerSP-018 (0x10 bytes) (r0-r3) | ||
| uintptr_t m_callerPushedArgs[2]; // ChildSP+078 CallerSP-008 (0x8 bytes) (extra arg + target fn) |
There was a problem hiding this comment.
Does src/coreclr/nativeaot/Common/src/Internal/Runtime/TransitionBlock.cs need a matching ARM32 layout update?
There was a problem hiding this comment.
Hmmm, that would be a messy change with a lot of eventual fallout. It creates holes in structs.
@cshung Could please change the prolog of universal transition to create the same layout as before, and rever this change?
Something like:
// Caller pushed 8 bytes: [sp]=extra arg, [sp+4]=target fn
.pad #8
PROLOG_PUSH "{r0-r1}"
ldr r12, [sp, #20] // Capture target function (caller's [sp+4], now at sp+16+4)
ldr r1, [sp, #16] // Capture extra arg (caller's [sp], now at sp+16)
str r3, [sp, #20] // Now we can store remaining arg registers into the space used for the hidden args
str r2, [sp, #16]
Co-authored-by: Michal Strehovský <MichalStrehovsky@users.noreply.github.com>
On ARM32 Linux, the area below SP is not guaranteed to be preserved across signal delivery. The runtime previously used the red zone (writing below SP without adjusting it) in several stubs, which can cause silent corruption or crashes when a signal is delivered at the wrong moment.
This PR eliminates all red zone usage in ARM32 runtime stubs by replacing sub-SP reads/writes with explicit stack adjustments (push/pop):
ThunksMapping.cpp) — useldr pcdispatch directly from r12, no stack intermediate. This also shrinks THUNK_SIZE from 20 to 12 bytes.DispatchResolve.S,StubDispatch.S) —PROLOG_PUSH/EPILOG_POPinstead of red zone stores.