[mono] Fix re-entrant AssemblyLoadContext resolution stack overflow under full-AOT#129693
[mono] Fix re-entrant AssemblyLoadContext resolution stack overflow under full-AOT#129693pavelsavara wants to merge 9 commits into
Conversation
…nder full-AOT Invoking a managed ALC resolve hook (e.g. MonoResolveUsingLoad) constructs an AssemblyName, whose parsing uses a generic method. Under full-AOT, JIT-compiling that generic method can itself trigger resolution of the same assembly, which re-invokes the same hook and recurses without bound until the stack overflows. This only reproduces under full-AOT because in JIT/interp the generic method is already compiled and cached, so the nested resolution does not recur. Add a thread-local re-entrancy guard in invoke_resolve_method keyed by (resolve_method, assembly_name), mirroring the existing TLS recursion guard in mono_class_setup_fields. When resolution re-enters the same hook for the same assembly name on the current thread, return NULL to break the cycle.
|
/azp run runtime-extra-platforms |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adds a thread-local re-entrancy guard around managed AssemblyLoadContext resolve hook invocation in Mono, preventing unbounded recursion (and eventual stack overflow) when assembly resolution re-enters the same resolve path on the same thread.
Changes:
- Introduces a per-thread “resolve in progress” tracking list (TLS) to detect and short-circuit re-entrant resolve calls.
- Allocates the new TLS key during
mono_alcs_init. - Adds a debug trace when re-entrant resolution is skipped.
…untime-llvm resolve_method is a single cached MonoMethod shared by all AssemblyLoadContexts, so keying the per-thread recursion guard on (resolve_method, assembly_name) alone would wrongly suppress a legitimate nested resolve of the same name on a different ALC. Include the ALC instance in the key. Also add the ALC and arch/exception mini paths to the runtime-llvm PR trigger so this full-AOT fix exercises the LLVMFULLAOT leg.
The ALC re-entrancy fix (dotnet#129693) addresses the full-AOT stack overflow tracked by dotnet#129508, so drop the IsMonoFULLAOT ActiveIssue. The separate IsWasm ActiveIssue (dotnet#124219) is kept.
…ession test This PR fixes the re-entrant AssemblyLoadContext resolution stack overflow tracked by dotnet#129508, so drop the IsMonoFULLAOT ActiveIssue. The separate IsWasm ActiveIssue (dotnet#124219) is kept. Provides the regression coverage requested in review.
|
Why is this a llvm bump regression ? |
@BrzVlad the LLVM-23 bump's main contribution to #129693 was turning on a test leg that finally ran the offending test in the only mode that crashes — the bump branch went red, we triaged the new failures, and this latent ALC re-entrancy fell out of that. More details here https://gist.github.com/pavelsavara/87f39ec5f1387a3770d11b2e2793b96f |
# Conflicts: # eng/pipelines/runtime-llvm.yml
…avelsavara/runtime into mono-fix-alc-reentrant-resolve
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/tests/JIT/Regression/JitBlue/Runtime_105619/Runtime_105619.cs:125
- TestEntryPoint currently swallows all exceptions, which means the test can still pass even if the collectible ALC load or reflection lookup fails (e.g., type/method not found) as long as the process doesn’t crash. Now that the MonoFULLAOT ActiveIssue is removed, it would be better for the test to fail on unexpected load/lookup errors and only ignore exceptions thrown by the invoked payload (typically wrapped in TargetInvocationException).
try
{
CollectibleALC alc = new CollectibleALC();
System.Reflection.Assembly asm = alc.LoadFromAssemblyPath(System.Reflection.Assembly.GetExecutingAssembly().Location);
System.Reflection.MethodInfo mi = asm.GetType(typeof(Program).FullName).GetMethod(nameof(MainInner));
Summary
Fixes an unbounded recursion (stack overflow) in Mono assembly resolution that reproduces under full-AOT when an assembly reference is resolved through a managed
AssemblyLoadContextresolve hook.Root cause
invoke_resolve_methodinvokes a managed ALC resolve hook (e.g.AssemblyLoadContext.MonoResolveUsingLoad). That hook constructs anAssemblyName, whose parsing uses a generic method (MemoryExtensions.Split<…>). Under full-AOT, JIT-compiling that generic instance can itself trigger resolution of the same assembly, which re-invokes the same hook, and so on without bound until the stack overflows.The crash only reproduces under full-AOT. In JIT/interpreter the generic method is already compiled and cached, so the nested resolution does not recur.
Observed native recursion cycle (one lap, repeating identically):
Fix
Add a thread-local re-entrancy guard in
invoke_resolve_method, keyed by(resolve_method, assembly_name). This mirrors the existing TLS recursion guard inmono_class_setup_fields(setup_fields_tls_id). When resolution re-enters the same hook for the same assembly name on the current thread, the nested call returnsNULLto break the cycle.Validation
Reproduced and verified against a merged JIT regression assembly that loads itself into a collectible
AssemblyLoadContextand invokes a method via reflection:SIGSEGV(stack overflow) under Mono full-AOT.Note
This description was generated with the assistance of GitHub Copilot.
Part of the MonoAOT LLVM 23 full-AOT regression set tracked by #129508. Built and tested together with the Emscripten 5.0.6 / LLVM 23 bump on #129396, where the re-enabled
Runtime_105619test passes on theruntime-llvmAllSubsets_Mono_LLVMFULLAOT_RuntimeTestsleg.Contributes to #129508.