Skip to content

[mono] Keep in-flight exception alive across LLVM resume unwind#129713

Merged
pavelsavara merged 8 commits into
dotnet:mainfrom
pavelsavara:mono-llvm-resume-unwind-gc-safe
Jun 24, 2026
Merged

[mono] Keep in-flight exception alive across LLVM resume unwind#129713
pavelsavara merged 8 commits into
dotnet:mainfrom
pavelsavara:mono-llvm-resume-unwind-gc-safe

Conversation

@pavelsavara

@pavelsavara pavelsavara commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

When an exception is unwound through an LLVM-compiled finally/fault handler under full-AOT, mono_handle_exception_internal saves the resume state and transfers control to the managed handler, which later calls back into mono_resume_unwind to continue unwinding. The in-flight exception object was stored across that window in ResumeState.ex_obj as a raw pointer, explicitly marked /* FIXME: GC */:

jit_tls->resume_state.ex_obj = obj;

The managed handler can reach a GC safepoint (for example the Monitor.Exit call emitted into a synchronized method wrapper). A moving GC then relocates the exception object, leaving ex_obj stale. When mono_resume_unwind resumes and passes that pointer to mono_object_isinst_checked while searching for a matching catch clause, it dereferences a garbage MonoClass:

SIGSEGV
  mono_class_is_assignable_from_internal
  mono_object_isinst_checked
  ... (second-pass catch search)
  llvm_resume_unwind_trampoline

This reproduces intermittently under load (multiple threads throwing/catching while allocating), and only with the LLVM full-AOT unwinder ΓÇö the llvmonly catch path already keeps the exception alive via a pinned GC handle (ResumeState.ex_gchandle), and the JIT uses a different unwind mechanism.

Fix

Store the in-flight exception in a pinned GC handle (ex_gchandle) across the LLVM resume window, mirroring the existing llvmonly catch path, and read it back in mono_resume_unwind. The now-unused raw ex_obj field is removed from ResumeState.

Testing

Re-enables JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840 on Mono (the [ActiveIssue] annotation is removed so CI exercises the fix). Validated with an x64 Mono LLVM full-AOT build: the test crashed ~17% of runs (5/30) before the change and 0 / 300 runs after it, with JIT mode unaffected.

Note

This change was authored with the assistance of GitHub Copilot.


Part of the MonoAOT LLVM 23 full-AOT regression set tracked by #129508. Built and tested together with the Emscripten 5.0.6 / LLVM 23 bump on #129396, where the re-enabled b143840 test passes on the runtime-llvm AllSubsets_Mono_LLVMFULLAOT_RuntimeTests leg.

Contributes to #129508.

When unwinding an exception through an LLVM-compiled finally/fault handler in full-AOT mode, mono_handle_exception_internal stored the in-flight exception object in ResumeState.ex_obj as a raw pointer (marked /* FIXME: GC */) before transferring control to the managed handler. That handler (e.g. Monitor.Exit emitted by a synchronized method wrapper) can reach a GC safepoint, and a moving GC then relocates the exception object, leaving the stored pointer stale. mono_resume_unwind later passed the stale pointer to mono_object_isinst_checked while searching for a matching catch clause, dereferencing a garbage MonoClass and crashing intermittently with a SIGSEGV under load.

Store the exception in a pinned GC handle (ResumeState.ex_gchandle) instead, mirroring the existing llvmonly catch path, and read it back in mono_resume_unwind. The now-unused raw ex_obj field is removed.

Re-enables the JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840 test on Mono.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Mono’s LLVM full-AOT exception resume plumbing to keep the in-flight exception reachable across the managed finally/fault window by storing it in ResumeState as a GCHandle (and retrieving it in mono_resume_unwind), and re-enables a regression test previously skipped on Mono.

Changes:

  • Remove the raw ResumeState.ex_obj field and rely on ResumeState.ex_gchandle for keeping the exception alive across the LLVM resume window.
  • In mono_handle_exception_internal, allocate/free the resume-state GCHandle when unwinding through LLVM-compiled finally/fault; in mono_resume_unwind, retrieve the exception from the handle and free it before continuing unwinding.
  • Re-enable b143840 on Mono by removing the ActiveIssue skip annotation in the IL test.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/tests/JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840/b143840.il Unskips the regression test on Mono by removing ActiveIssue.
src/mono/mono/mini/mini-runtime.h Removes the raw exception pointer field from ResumeState, leaving GCHandle-based storage.
src/mono/mono/mini/mini-exceptions.c Stores the in-flight exception in a GCHandle during LLVM finally/fault unwinding and retrieves it in mono_resume_unwind.

Comment thread src/mono/mono/mini/mini-runtime.h
Comment thread src/mono/mono/mini/mini-exceptions.c
Comment thread src/mono/mono/mini/mini-exceptions.c Outdated
@pavelsavara pavelsavara added this to the 11.0.0 milestone Jun 22, 2026
pavelsavara added a commit that referenced this pull request Jun 22, 2026
Re-link the disabled Mono full-AOT tests from the #129508 tracking issue to the individual PRs that fix them (nullabletypes -> #129702, call05_large/small -> #129708, WPF_3226 -> #129710, b143840 -> #129713, UnitTest_GVM_TypeLoadException -> #129715). The tests stay disabled; Runtime_105619 keeps the #129508 link.
…time-llvm

The IL-state (AOTed) catch path asserted resume_state.ex_gchandle was NULL before setting it. Now that the finally-resume path shares the same field (this PR replaced the raw ex_obj with a pinned handle), a finally handler that throws a superseding exception caught in AOTed code could reach this path with a stale handle still set, firing the assert and leaking the handle. Free any existing handle instead, mirroring the finally path. Also add arch-backend/exception/runtime mini paths to the runtime-llvm PR trigger so this PR exercises the LLVMFULLAOT leg.
Removing ResumeState.ex_obj shrinks MonoJitTlsData by one pointer (216->212 on wasm32) and shifts the fields after resume_state. Regenerates wasm32-unknown-none.h and wasm32-unknown-wasip2.h so the browser-wasm and wasi-wasm MonoAOTOffsets legs pass.
Applied the same mechanical shift the build produced for wasm32: removing ResumeState.ex_obj shrinks MonoJitTlsData by one pointer, so DECL_SIZE2(MonoJitTlsData) and every field after resume_state are reduced by the target pointer size (8 for aarch64/x86_64, 4 for armv7/i686). These targets' offsets cannot be regenerated on this host (need the Android NDK / macOS), so they were applied manually; the mobile/apple MonoAOTOffsets CI legs validate them.
Only consume resume_state.ex_gchandle when it is set, so an unexpected or double resume path doesn't free handle 0; ex_obj falls back to NULL, matching the prior behavior. Addresses review feedback.
Copilot AI review requested due to automatic review settings June 23, 2026 11:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comment thread src/mono/mono/mini/mini-exceptions.c Outdated
Comment thread eng/pipelines/runtime-llvm.yml Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 23, 2026 18:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comment thread src/mono/mono/mini/mini-exceptions.c Outdated
In mono_resume_unwind, copy the GC handle to a local and clear resume_state.ex_gchandle before calling mono_handle_exception_internal, freeing it only after the call returns. This keeps the in-flight exception rooted across the managed catch/finally search (which can GC or trigger a nested LLVM finally resume that installs its own handle) and avoids accidentally freeing a newly-installed handle.

Also switch the two resume-state exception handles from pinned to non-pinned: the object only needs to stay alive across the resume window (it is always re-fetched via the handle), matching interp_set_resume_state and avoiding unnecessary pinning that inhibits GC compaction.

Addresses PR review feedback.
@pavelsavara

Copy link
Copy Markdown
Member Author

/azp run runtime-extra-platforms

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@vitek-karas vitek-karas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thanks!

@akoeplinger

akoeplinger commented Jun 24, 2026

Copy link
Copy Markdown
Member

The Regression_o_2 failure in runtime-extra-platforms (Build linux-x64 Release AllSubsets_Mono_MiniFullAot_RuntimeTests minifullaot) is most likely unrelated, I've seen it fail in main too

@pavelsavara

Copy link
Copy Markdown
Member Author

/ba-g unrelated failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants