Skip to content

Fix GC Issue#400

Merged
cfis merged 6 commits intomasterfrom
dev
Mar 23, 2026
Merged

Fix GC Issue#400
cfis merged 6 commits intomasterfrom
dev

Conversation

@cfis
Copy link
Copy Markdown
Collaborator

@cfis cfis commented Mar 14, 2026

rb_gc_register_address() can trigger GC, so Anchor must register its VALUE slot before storing a heap object in it. Use RB_GC_GUARD(value) to keep the VALUE alive through the end of the method. #399

cfis added 3 commits March 13, 2026 18:27
…VALUE slot before storing a heap object in it. Use RB_GC_GUARD(value) to keep the VALUE alive through the end of the method.


The Anchor class uses rb_gc_register_address/rb_gc_unregister_address to
protect Ruby VALUEs held by Rice::Object (via Pin) from garbage collection.
The destructor skipped rb_gc_unregister_address when Anchor::enabled_ was
false, a flag set by a callback registered with rb_set_end_proc during the
first Anchor construction.

The problem is that rb_set_end_proc callbacks and Ruby at_exit blocks both
live in the same end_procs list and execute in LIFO (last-registered-first)
order. If a test framework like minitest registers its at_exit block BEFORE
the Rice extension loads (a common pattern: require "minitest/autorun" before
require "my_extension"), the LIFO order becomes:

  1. Anchor::disable runs first  → enabled_ = false
  2. minitest at_exit runs second → test suite executes

During step 2, every temporary Rice::Object creates an Anchor that calls
rb_gc_register_address. When the temporary is destroyed, the Anchor
destructor sees enabled_ == false and skips rb_gc_unregister_address. The
address remains in Ruby's global_object_list but the Anchor memory is freed.
When GC runs (especially under GC.stress), rb_vm_mark at vm.c:3008 walks
global_object_list and dereferences the freed pointer:

  rb_gc_mark_maybe(*list->varptr);  // invalid read of size 8

The fix replaces rb_set_end_proc with ruby_vm_at_exit (declared in ruby/vm.h).
The key difference in timing:

  - rb_set_end_proc fires DURING rb_ec_exec_end_proc, interleaved with
    at_exit blocks in unpredictable LIFO order (depends on require order).

  - ruby_vm_at_exit fires DURING ruby_vm_destruct, which runs AFTER all
    end_procs/at_exit blocks have completed and AFTER rb_ec_finalize.
    This is the correct boundary: all Ruby code has finished, but C++
    static destruction (where global Rice::Object instances are destroyed)
    has not yet begun.

Additionally, rb_gc_unregister_address is now called directly instead of
through detail::protect (rb_protect wrapper). This is safe because
rb_gc_unregister_address is a simple linked-list removal that never raises
Ruby exceptions and never triggers GC (xfree does not trigger GC stress).
Avoiding rb_protect in the destructor also eliminates any risk of longjmp
during stack unwinding.
cfis added 3 commits March 22, 2026 20:47
g++ 15 produces a false positive -Warray-bounds warning when inlining
through Ruby's RSTRING macro (ruby/internal/core/rstring.h). This is
harmless but breaks builds that use -Werror. Add -Wno-array-bounds to
mkmf-rice.rb and CMakePresets.json for all GCC targets.

Also document a separate g++ 15.2.1 / binutils 2.45.1 issue where LTO
triggers an internal assembler segfault. This is not something Rice can
fix, but the workaround (-fno-lto) is documented in build_settings.md.

Both issues were discovered while investigating
#399
@cfis cfis merged commit 3f5652f into master Mar 23, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant