Skip to content

[mono] Fix re-entrant AssemblyLoadContext resolution stack overflow under full-AOT#129693

Open
pavelsavara wants to merge 9 commits into
dotnet:mainfrom
pavelsavara:mono-fix-alc-reentrant-resolve
Open

[mono] Fix re-entrant AssemblyLoadContext resolution stack overflow under full-AOT#129693
pavelsavara wants to merge 9 commits into
dotnet:mainfrom
pavelsavara:mono-fix-alc-reentrant-resolve

Conversation

@pavelsavara

@pavelsavara pavelsavara commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

Fixes an unbounded recursion (stack overflow) in Mono assembly resolution that reproduces under full-AOT when an assembly reference is resolved through a managed AssemblyLoadContext resolve hook.

Root cause

invoke_resolve_method invokes a managed ALC resolve hook (e.g. AssemblyLoadContext.MonoResolveUsingLoad). That hook constructs an AssemblyName, whose parsing uses a generic method (MemoryExtensions.Split<…>). Under full-AOT, JIT-compiling that generic instance can itself trigger resolution of the same assembly, which re-invokes the same hook, and so on without bound until the stack overflows.

The crash only reproduces under full-AOT. In JIT/interpreter the generic method is already compiled and cached, so the nested resolution does not recur.

Observed native recursion cycle (one lap, repeating identically):

mono_class_get_checked
  → mono_class_from_typeref_checked
  → mono_assembly_load_reference
  → mono_assembly_request_byname
  → mono_runtime_try_invoke[_handle]
  → wrapper_runtime_invoke
  → AssemblyLoadContext.MonoResolveUsingLoad        (managed resolve hook)
  → new AssemblyName(string) → AssemblyNameParser.Parse → …
  → MemoryExtensions.Split → SplitCore<…>           (generic method)
  → generic_trampoline_jit                          (JIT the generic instance under full-AOT)
  → native class/type setup → mono_class_get_checked → ∞

Fix

Add a thread-local re-entrancy guard in invoke_resolve_method, keyed by (resolve_method, assembly_name). This mirrors the existing TLS recursion guard in mono_class_setup_fields (setup_fields_tls_id). When resolution re-enters the same hook for the same assembly name on the current thread, the nested call returns NULL to break the cycle.

Validation

Reproduced and verified against a merged JIT regression assembly that loads itself into a collectible AssemblyLoadContext and invokes a method via reflection:

  • Before: deterministic SIGSEGV (stack overflow) under Mono full-AOT.
  • After: the test passes (exit 100), behaving the same as JIT/interpreter.
  • The full merged regression assembly (hundreds of sub-tests that load many assemblies normally) passes under full-AOT with no regressions, confirming the guard does not affect normal, non-re-entrant assembly loading.

Note

This description was generated with the assistance of GitHub Copilot.


Part of the MonoAOT LLVM 23 full-AOT regression set tracked by #129508. Built and tested together with the Emscripten 5.0.6 / LLVM 23 bump on #129396, where the re-enabled Runtime_105619 test passes on the runtime-llvm AllSubsets_Mono_LLVMFULLAOT_RuntimeTests leg.

Contributes to #129508.

…nder full-AOT

Invoking a managed ALC resolve hook (e.g. MonoResolveUsingLoad) constructs an
AssemblyName, whose parsing uses a generic method. Under full-AOT, JIT-compiling
that generic method can itself trigger resolution of the same assembly, which
re-invokes the same hook and recurses without bound until the stack overflows.
This only reproduces under full-AOT because in JIT/interp the generic method is
already compiled and cached, so the nested resolution does not recur.

Add a thread-local re-entrancy guard in invoke_resolve_method keyed by
(resolve_method, assembly_name), mirroring the existing TLS recursion guard in
mono_class_setup_fields. When resolution re-enters the same hook for the same
assembly name on the current thread, return NULL to break the cycle.
Copilot AI review requested due to automatic review settings June 22, 2026 12:50
@github-actions github-actions Bot added the area-AssemblyLoader-coreclr only use for closed issues label Jun 22, 2026
@pavelsavara pavelsavara self-assigned this Jun 22, 2026
@pavelsavara pavelsavara added area-Codegen-AOT-mono and removed area-AssemblyLoader-coreclr only use for closed issues labels Jun 22, 2026
@pavelsavara

Copy link
Copy Markdown
Member Author

/azp run runtime-extra-platforms

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a thread-local re-entrancy guard around managed AssemblyLoadContext resolve hook invocation in Mono, preventing unbounded recursion (and eventual stack overflow) when assembly resolution re-enters the same resolve path on the same thread.

Changes:

  • Introduces a per-thread “resolve in progress” tracking list (TLS) to detect and short-circuit re-entrant resolve calls.
  • Allocates the new TLS key during mono_alcs_init.
  • Adds a debug trace when re-entrant resolution is skipped.

Comment thread src/mono/mono/metadata/assembly-load-context.c Outdated
Comment thread src/mono/mono/metadata/assembly-load-context.c
Comment thread src/mono/mono/metadata/assembly-load-context.c
…untime-llvm

resolve_method is a single cached MonoMethod shared by all AssemblyLoadContexts, so keying the per-thread recursion guard on (resolve_method, assembly_name) alone would wrongly suppress a legitimate nested resolve of the same name on a different ALC. Include the ALC instance in the key. Also add the ALC and arch/exception mini paths to the runtime-llvm PR trigger so this full-AOT fix exercises the LLVMFULLAOT leg.
pavelsavara added a commit to pavelsavara/runtime that referenced this pull request Jun 23, 2026
The ALC re-entrancy fix (dotnet#129693) addresses the full-AOT stack overflow tracked by dotnet#129508, so drop the IsMonoFULLAOT ActiveIssue. The separate IsWasm ActiveIssue (dotnet#124219) is kept.
…ession test

This PR fixes the re-entrant AssemblyLoadContext resolution stack overflow tracked by dotnet#129508, so drop the IsMonoFULLAOT ActiveIssue. The separate IsWasm ActiveIssue (dotnet#124219) is kept. Provides the regression coverage requested in review.
Copilot AI review requested due to automatic review settings June 23, 2026 11:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@BrzVlad

BrzVlad commented Jun 24, 2026

Copy link
Copy Markdown
Member

Why is this a llvm bump regression ?

@pavelsavara

Copy link
Copy Markdown
Member Author

Why is this a llvm bump regression ?

@BrzVlad the LLVM-23 bump's main contribution to #129693 was turning on a test leg that finally ran the offending test in the only mode that crashes — the bump branch went red, we triaged the new failures, and this latent ALC re-entrancy fell out of that.

More details here https://gist.github.com/pavelsavara/87f39ec5f1387a3770d11b2e2793b96f

Copilot AI review requested due to automatic review settings June 24, 2026 16:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread eng/pipelines/runtime-llvm.yml Outdated
Comment thread src/tests/JIT/Regression/JitBlue/Runtime_105619/Runtime_105619.cs
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 25, 2026 15:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/tests/JIT/Regression/JitBlue/Runtime_105619/Runtime_105619.cs:125

  • TestEntryPoint currently swallows all exceptions, which means the test can still pass even if the collectible ALC load or reflection lookup fails (e.g., type/method not found) as long as the process doesn’t crash. Now that the MonoFULLAOT ActiveIssue is removed, it would be better for the test to fail on unexpected load/lookup errors and only ignore exceptions thrown by the invoked payload (typically wrapped in TargetInvocationException).
        try
        {
            CollectibleALC alc = new CollectibleALC();
            System.Reflection.Assembly asm = alc.LoadFromAssemblyPath(System.Reflection.Assembly.GetExecutingAssembly().Location);
            System.Reflection.MethodInfo mi = asm.GetType(typeof(Program).FullName).GetMethod(nameof(MainInner));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants