[mono] Keep in-flight exception alive across LLVM resume unwind#129713
Merged
pavelsavara merged 8 commits intoJun 24, 2026
Conversation
When unwinding an exception through an LLVM-compiled finally/fault handler in full-AOT mode, mono_handle_exception_internal stored the in-flight exception object in ResumeState.ex_obj as a raw pointer (marked /* FIXME: GC */) before transferring control to the managed handler. That handler (e.g. Monitor.Exit emitted by a synchronized method wrapper) can reach a GC safepoint, and a moving GC then relocates the exception object, leaving the stored pointer stale. mono_resume_unwind later passed the stale pointer to mono_object_isinst_checked while searching for a matching catch clause, dereferencing a garbage MonoClass and crashing intermittently with a SIGSEGV under load. Store the exception in a pinned GC handle (ResumeState.ex_gchandle) instead, mirroring the existing llvmonly catch path, and read it back in mono_resume_unwind. The now-unused raw ex_obj field is removed. Re-enables the JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840 test on Mono.
7 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Updates Mono’s LLVM full-AOT exception resume plumbing to keep the in-flight exception reachable across the managed finally/fault window by storing it in ResumeState as a GCHandle (and retrieving it in mono_resume_unwind), and re-enables a regression test previously skipped on Mono.
Changes:
- Remove the raw
ResumeState.ex_objfield and rely onResumeState.ex_gchandlefor keeping the exception alive across the LLVM resume window. - In
mono_handle_exception_internal, allocate/free the resume-state GCHandle when unwinding through LLVM-compiledfinally/fault; inmono_resume_unwind, retrieve the exception from the handle and free it before continuing unwinding. - Re-enable
b143840on Mono by removing theActiveIssueskip annotation in the IL test.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/tests/JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840/b143840.il | Unskips the regression test on Mono by removing ActiveIssue. |
| src/mono/mono/mini/mini-runtime.h | Removes the raw exception pointer field from ResumeState, leaving GCHandle-based storage. |
| src/mono/mono/mini/mini-exceptions.c | Stores the in-flight exception in a GCHandle during LLVM finally/fault unwinding and retrieves it in mono_resume_unwind. |
pavelsavara
added a commit
that referenced
this pull request
Jun 22, 2026
Re-link the disabled Mono full-AOT tests from the #129508 tracking issue to the individual PRs that fix them (nullabletypes -> #129702, call05_large/small -> #129708, WPF_3226 -> #129710, b143840 -> #129713, UnitTest_GVM_TypeLoadException -> #129715). The tests stay disabled; Runtime_105619 keeps the #129508 link.
…time-llvm The IL-state (AOTed) catch path asserted resume_state.ex_gchandle was NULL before setting it. Now that the finally-resume path shares the same field (this PR replaced the raw ex_obj with a pinned handle), a finally handler that throws a superseding exception caught in AOTed code could reach this path with a stale handle still set, firing the assert and leaking the handle. Free any existing handle instead, mirroring the finally path. Also add arch-backend/exception/runtime mini paths to the runtime-llvm PR trigger so this PR exercises the LLVMFULLAOT leg.
Removing ResumeState.ex_obj shrinks MonoJitTlsData by one pointer (216->212 on wasm32) and shifts the fields after resume_state. Regenerates wasm32-unknown-none.h and wasm32-unknown-wasip2.h so the browser-wasm and wasi-wasm MonoAOTOffsets legs pass.
Applied the same mechanical shift the build produced for wasm32: removing ResumeState.ex_obj shrinks MonoJitTlsData by one pointer, so DECL_SIZE2(MonoJitTlsData) and every field after resume_state are reduced by the target pointer size (8 for aarch64/x86_64, 4 for armv7/i686). These targets' offsets cannot be regenerated on this host (need the Android NDK / macOS), so they were applied manually; the mobile/apple MonoAOTOffsets CI legs validate them.
Only consume resume_state.ex_gchandle when it is set, so an unexpected or double resume path doesn't free handle 0; ex_obj falls back to NULL, matching the prior behavior. Addresses review feedback.
This was referenced Jun 23, 2026
Open
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
In mono_resume_unwind, copy the GC handle to a local and clear resume_state.ex_gchandle before calling mono_handle_exception_internal, freeing it only after the call returns. This keeps the in-flight exception rooted across the managed catch/finally search (which can GC or trigger a nested LLVM finally resume that installs its own handle) and avoids accidentally freeing a newly-installed handle. Also switch the two resume-state exception handles from pinned to non-pinned: the object only needs to stay alive across the resume window (it is always re-fetched via the handle), matching interp_set_resume_state and avoiding unnecessary pinning that inhibits GC compaction. Addresses PR review feedback.
Member
Author
|
/azp run runtime-extra-platforms |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
|
The Regression_o_2 failure in |
akoeplinger
approved these changes
Jun 24, 2026
Member
Author
|
/ba-g unrelated failures |
This was referenced Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When an exception is unwound through an LLVM-compiled
finally/faulthandler under full-AOT,mono_handle_exception_internalsaves the resume state and transfers control to the managed handler, which later calls back intomono_resume_unwindto continue unwinding. The in-flight exception object was stored across that window inResumeState.ex_objas a raw pointer, explicitly marked/* FIXME: GC */:The managed handler can reach a GC safepoint (for example the
Monitor.Exitcall emitted into asynchronizedmethod wrapper). A moving GC then relocates the exception object, leavingex_objstale. Whenmono_resume_unwindresumes and passes that pointer tomono_object_isinst_checkedwhile searching for a matchingcatchclause, it dereferences a garbageMonoClass:This reproduces intermittently under load (multiple threads throwing/catching while allocating), and only with the LLVM full-AOT unwinder ΓÇö the
llvmonlycatch path already keeps the exception alive via a pinned GC handle (ResumeState.ex_gchandle), and the JIT uses a different unwind mechanism.Fix
Store the in-flight exception in a pinned GC handle (
ex_gchandle) across the LLVM resume window, mirroring the existingllvmonlycatch path, and read it back inmono_resume_unwind. The now-unused rawex_objfield is removed fromResumeState.Testing
Re-enables
JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840on Mono (the[ActiveIssue]annotation is removed so CI exercises the fix). Validated with an x64 Mono LLVM full-AOT build: the test crashed ~17% of runs (5/30) before the change and 0 / 300 runs after it, with JIT mode unaffected.Note
This change was authored with the assistance of GitHub Copilot.
Part of the MonoAOT LLVM 23 full-AOT regression set tracked by #129508. Built and tested together with the Emscripten 5.0.6 / LLVM 23 bump on #129396, where the re-enabled
b143840test passes on theruntime-llvmAllSubsets_Mono_LLVMFULLAOT_RuntimeTestsleg.Contributes to #129508.