Remove Wrapper delegate kind, switch ARM32 VSD hidden arg to R12#129403
Remove Wrapper delegate kind, switch ARM32 VSD hidden arg to R12#129403Copilot wants to merge 53 commits into
Conversation
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
|
@copilot Remove all wrapper handling code from |
|
@copilot Remove all wrapper handling code from MulticastDelegate.CoreCLR.cs too. |
|
Tagging subscribers to this area: @agocke |
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
- Simplify DispatchHolder AV comment per feedback - Move ResolveHolder MethodTable load after push to preserve r4 Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
…nstants Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
|
@copilot Build failure Assertion failed. |
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Fixed in 21e17b1. The issue was that R12 is a high register (R8-R15) and the simple form of EmitLDR requires both source and destination to be low registers (R0-R7). Changed the delay-load/VSD path to use R0 as scratch instead of R12, while keeping R12 for the hidden indirection cell as required by the new ABI. |
|
@copilot Please revert the last commit. It is wrong fix for the issue. The thunk must not clobber r0 - r3 registers. Instead, fix the instruction encoder to support r12 register as needed. |
… scratch" This reverts commit 21e17b1. Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Use 32-bit encoding with offset 0 for high registers while keeping compact 16-bit encoding for low registers (R0-R7). Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Reverted in 78e5b3e and fixed the instruction encoder in 1d5ac1c. ARMEmitter.EmitLDR now uses 32-bit encoding with offset 0 for high registers (R8-R15) including R12, while keeping the compact 16-bit encoding for low registers (R0-R7). Build verified successfully. |
…hunk.cs Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
…g; use it in ImportThunk for R12 section index Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Use r4 instead of r12 to load ResolveCacheElem.token during the cache loop in ResolveHolder::Initialize. This prevents the hidden IndirectionCellAndFlags in r12 from being clobbered on a pMT-match / token-mismatch path that falls through to _slowEntryPoint. - Replace 32-bit `ldr r12, [r6 + #token]` (2 WORDs) with 16-bit `ldr r4, [r6, #token]` (1 WORD, safe since r4 is reloaded per loop) - Change `cmp r12, r5` → `cmp r4, r5` - Adjust preceding pMT `bne nextEntry` offset from 8 → 7 (1 fewer WORD) - Decrement resolveEntryPointLen from 32 → 31 to match Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
| // The following macro relies on this entry point being DWORD-aligned. We've already asserted that the | ||
| // overall stub is aligned above, just need to check that the preceding stubs occupy an even number of | ||
| // WORD slots. | ||
| _ASSERTE((n & 1) == 0); |
There was a problem hiding this comment.
@copilot resolveEntryPointLen is odd now, but this assert checks that it is even. Delete the nop for alignment above and adjust resolveEntryPointLen as appropriate
There was a problem hiding this comment.
Fixed in the latest commit: removed the alignment NOP from the slow-entry path of the resolve entry point and updated resolveEntryPointLen from 31 to 30. With the NOP gone the instruction count is 30 (even), satisfying the (n & 1) == 0 assert without needing any padding.
…eEntryPointLen to 30 Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
main PR
Description
CoreCLR’s
Wrapperdelegate kind was retained only to satisfy an ARM32 VSD ABI quirk (r4hidden cell). This change removes that special delegate path by moving ARM32 VSD hidden-parameter passing tor12and deleting wrapper-delegate infrastructure across VM/JIT/AOT/contracts andSystem.Private.CoreLib.ARM32 VSD calling-convention alignment
IndirectionCellAndFlagsinr12.REG_R12.LookupHolder::InitializeandResolveStubslow-entry push incoming hiddenr12to the stack and load dispatch token intor12before branching to the worker.ResolveWorkerAsmStubnow uses a fake-prolog/transition-block shape (matching the existing ARM delay-load pattern) so argument registers are preserved correctly while passing the dispatch token and hidden cell/flags toVSD_ResolveWorker.ResolveStub::slowEntryPointLento match the emitted slow-entry instruction count.slowEntryPointLenat 6, which is already even and satisfies the subsequent alignment invariant ((n & 1) == 0).Wrapper delegate infrastructure removal
COMDelegate.MulticastDelegate.CoreCLR.cs.Cross-layer contract cleanup
wrapperDelegateInvokeandoffsetOfWrapperDelegateIndirectCellfrom EE/JIT interfaces and SuperPMI agnostic records.jiteeversionguid.hto match the interface change.IL stub enum cleanup
method.hpp,dllimport.h) and corresponding cDAC contract mapping.ARM32 ResolveStub correctness fix
ResolveStub::_failEntryPointto avoid clobberingr12(which now carriesIndirectionCellAndFlags).r4as decrement scratch for_pCounterupdate and preserves callee-savedr4via stack save/restore in the fail path.JIT cleanup after feedback
VirtualStubParamInfoconstructor argument insrc/coreclr/jit/compiler.h.src/coreclr/jit/compiler.cppaccordingly.GTF_CALL_M_STACK_ARRAYto the removed wrapper-flag slot (0x00004000) so call flag values remain contiguous.Delegate VM cleanup after feedback
refRealDelegatelocal and associatedGCPROTECT_BEGIN/ENDinCOMDelegate::BindToMethod.(*pRefThis)directly._ReturnAddressintrinsic declaration fromcomdelegate.cpp.R2R compatibility boundary
Documentation update
docs/design/coreclr/botr/clr-abi.mdhidden-parameter section: ARM32 VSD hidden parameter is nowR12.Customer Impact
Without this change, ARM32 virtual stub dispatch can mis-handle hidden argument/token flow in edge paths and wrapper-delegate-specific infrastructure remains in place even though it is no longer needed. This can lead to correctness risk and unnecessary maintenance burden. The slow-entry assertion fixes also prevent ARM checked-build assertion failures in
ResolveHolder::Initializewithout unnecessary instruction padding.Regression
No known product regression is being fixed; this is ABI/calling-convention cleanup plus correctness hardening in ARM32 VSD paths. The latest updates fix regressions introduced during this PR’s ARM32 stub reshaping (slow-entry length mismatch and follow-up assertion failures).
Testing
./build.sh clr+libs+host./build.sh clr.runtime -arch arm -c Release./build.sh clr -arch arm -c Release(failed inILCompiler_publish.csprojdue missinglibjitinterface_arm.so, unrelated to the ARM32 asm helper change)./build.sh clr.runtime -arch arm -c Checked(passed)Risk
Medium. The change touches low-level CoreCLR ARM32 stub assembly and calling-convention plumbing, but is narrowly scoped to VSD worker handoff and aligned with existing fake-prolog transition-block patterns used elsewhere in ARM stubs. The final slow-entry fix is low risk and removes unnecessary NOP padding while keeping emitted metadata and alignment invariants correct.
Package authoring no longer needed in .NET 9
IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.