Skip to content

fix: call callbacks via the executable trampoline address (#390)#427

Merged
romgrk merged 1 commit into
masterfrom
fix/390-closure-native-address
Jun 14, 2026
Merged

fix: call callbacks via the executable trampoline address (#390)#427
romgrk merged 1 commit into
masterfrom
fix/390-closure-native-address

Conversation

@romgrk

@romgrk romgrk commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Addresses #390 — callback segfaults on Ubuntu 26 (libffi 3.5), e.g. node examples/glib-timeout.js crashing the moment the callback fires.

Root cause

When a JS function is passed as a callback argument, node-gtk hands the C side the address of the ffi_closure. On libffi 3.4+ the executable trampoline lives in a separate memory mapping from the writable ffi_closure struct (static trampolines / W^X), so the ffi_closure* pointer is not itself callable — invoking it segfaults. The correct, callable address is g_callable_info_get_closure_native_address().

History — why this isn't just "re-merge #391"

The most likely cause of that startup break is g_callable_info_get_closure_native_address() returning NULL on some GI builds, which then passed a NULL callback at bootstrap. This PR re-applies the fix with a NULL-guard fallback to the closure pointer, so a callback pointer is never NULL.

Why this is safe everywhere

The pointer handed to C changes only when native_address is non-NULL and differs from closure — i.e. precisely the libffi-3.4+ static-trampoline case that currently segfaults. In every configuration where the old code worked:

  • if the two addresses coincide (my Arch box, older libffi) → identical behavior;
  • if introspection returns NULL → falls back to closure → identical behavior.

So it cannot regress a setup that currently works; it only repairs the one that crashes.

Testing

  • Verified on Arch / libffi 3.5.2 / glib 2.88.1 (same versions as the maintainer): instrumented the closure, confirmed closure == native_address here, so the change is a no-op locally — startup is fine, examples/glib-timeout.js runs, and Gst.Promise.newWithChangeFunc(...) + reply() fires its change func without crashing.
  • Full suite: 75 passing, only the pre-existing environmental require.js GIRepository 3.0/2.0 bootstrap failure (reproduces on master).

⚠️ Please verify before merging

I could not reproduce the original #393 startup break locally (here the two addresses are equal, so #391 would have worked too). The NULL-guard is my best diagnosis of that break, but since your machine is where it manifested, please confirm node examples/glib-timeout.js and a normal require('node-gtk') still start cleanly for you. If it still breaks at startup, that points to get_closure_native_address returning a non-NULL-but-wrong value on your GI, and I'll dig further.

🤖 Generated with Claude Code

On libffi 3.4+ the executable trampoline is a separate memory mapping from
the writable ffi_closure, so the closure pointer itself is not callable;
passing it to C as the callback function pointer segfaults when the callback
fires. This is reproducible on Ubuntu 26 (libffi 3.5) and matches the
`node examples/glib-timeout.js` crash in the report.

Pass g_callable_info_get_closure_native_address() instead of the raw closure.
This re-applies #391, which was reverted in #393 because it broke startup —
guarded here by falling back to the closure pointer when introspection returns
NULL, so a callback pointer is never NULL at bootstrap. On platforms where the
two addresses coincide (older libffi, or where the closure is already
executable) the behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@romgrk

romgrk commented Jun 14, 2026

Copy link
Copy Markdown
Owner Author

Verified on real Ubuntu 26.04 (rootless podman container)

Built both master and this branch from source inside ubuntu:26.04 with the exact reported stack — libffi 3.5.2, libgirepository 1.86.0, glib 2.88, node 22 — and ran examples/glib-timeout.js:

Build Result
master (2d2edd2) Run loop.SIGSEGV (139) the instant the first callback fires — reproduces #390
this PR (501db55) Run loop.count 0/1/2/3Loop ran. — callbacks fire, loop completes ✅

gdb on master confirms the crash is in the callback trampoline; the fix (calling g_callable_info_get_closure_native_address() instead of the writable ffi_closure*) resolves it. So the divergence I couldn't reproduce on Arch (there closure == native_address) is real on Ubuntu's libffi build, and this is the correct fix.

One caveat — a separate teardown crash (not this PR)

With gi.startLoop(), the program now reaches normal exit and then SIGSEGVs during Node/libuv platform shutdown:

#0 uv_close ()
#1 node::PerIsolatePlatformData::Shutdown()
#2 node::NodePlatform::UnregisterIsolate()
#3 node::NodeMainInstance::~NodeMainInstance()

This is not the callback bug and not caused by this PR — master can't even reach it. It's the StartLoop() integration (src/loop.cc): the GSource wrapping uv_default_loop() / uv_backend_fd is attached and never detached, so it's still live when Node closes its loop at exit. It does not happen without startLoop(), and process.exit(0) avoids it. Worth a separate issue.

@romgrk romgrk merged commit 5e48748 into master Jun 14, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant