Skip to content

fix(sse): prevent memory leak from zombie SSE connections#22551

Closed
zhumengzhu wants to merge 1 commit intoanomalyco:devfrom
zhumengzhu:fix/sse-memory-leak
Closed

fix(sse): prevent memory leak from zombie SSE connections#22551
zhumengzhu wants to merge 1 commit intoanomalyco:devfrom
zhumengzhu:fix/sse-memory-leak

Conversation

@zhumengzhu
Copy link
Copy Markdown

Fixes #22198, fixes #22422.

Problem

Every SSE endpoint (/event, /global/event, /global/sync-event) relies
solely on stream.onAbort(stop) to release resources on disconnect:

const stop = () => {
  clearInterval(heartbeat)  // 10s heartbeat timer
  unsub()                   // Bus.subscribeAll subscription
  q.push(null)              // unblock the for-await loop
}
stream.onAbort(stop)        // ← only cleanup path

In TCP CLOSE_WAIT — the Electron shell reconnecting on minimize/restore, or a
reverse proxy that does not propagate the FIN — Bun/Hono never fires
stream.onAbort. All three resources leak for the lifetime of the process.

The finally { stop() } block does not help: for await is suspended on
q.next(), which returns a Promise that never resolves because stop() is
the only thing that pushes null into the queue.

With 66–74 zombie connections observed in the wild, RSS grew at ~14 MB/sec
and peaked at 24.5 GB before OOM.

Full root cause analysis with sequence and flow diagrams:
https://gist.github.com/zhumengzhu/1e3392390b34a913d9dd59a61df87f9a

Fix

Four changes across four files:

c.req.raw.signal as a second abort path (event.ts, global.ts)

Request.signal is aborted by Bun when the TCP connection closes, including
CLOSE_WAIT cases where stream.onAbort does not fire:

stream.onAbort(stop)
c.req.raw.signal.addEventListener("abort", stop)  // added

writeSSE try/catch (event.ts, global.ts)

If neither abort path fires, the next write to a dead socket throws. Cleanup
happens at most one heartbeat interval (10 s) after the connection dies:

try {
  await stream.writeSSE({ data })
} catch {
  stop()
  return
}

AsyncQueue capacity limit (util/queue.ts)

Added an optional limit parameter (default 1000). When the queue is full
and no resolver is waiting, the oldest entry is dropped. Defensive backstop:
memory growth is bounded even if both abort paths are somehow missed.

GlobalBus.setMaxListeners(100) (bus/global.ts)

Each SSE connection registers one listener on GlobalBus and removes it on
disconnect. With the fixes above the count stays low in practice, but the
default limit of 10 still triggers MaxListenersExceededWarning during any
transient burst of reconnections.

The existing done flag in stop() ensures idempotency — if onAbort and
req.raw.signal both fire, cleanup runs exactly once.

Testing

Full test suite: 1890 pass, 1 fail — the failing test
(tool.write > file permissions) is pre-existing on upstream/dev,
confirmed before any changes.

Targeted tests for the changed logic:

  • test-queue.ts — 13 assertions: capacity limit, FIFO ordering, resolver bypass, null termination, zombie scenario
  • test-sse-logic.ts — 6 assertions: each code path including confirmation of the upstream bug

Raw output: test-output.txt

Note: a direct before/after RSS comparison on Linux local connections is not
meaningful — onAbort fires correctly there and CLOSE_WAIT does not occur.
The logic tests directly verify the broken code paths.

Prior art

Approach for c.req.raw.signal inspired by #18395 (@abrekhov).
writeSSE try/catch approach from #15646 (@brendandebeasi).
setMaxListeners fix addresses the same symptom as #21873 (@saurav-shakya).
All three were closed without merging; this is an independent reimplementation.

SSE connections stuck in TCP CLOSE_WAIT never triggered stream.onAbort(),
leaving AsyncQueue, heartbeat intervals, and Bus subscriptions running
forever — causing unbounded memory growth (~14 MB/sec, peak 24.5 GB).

Four changes across four files:

- util/queue.ts: add capacity limit (default 1000), drop oldest entry on
  overflow — defensive backstop so growth is bounded even if cleanup is missed
- server/instance/event.ts: listen on c.req.raw.signal as a second abort
  path; wrap writeSSE in try/catch and call stop() on write failure
- server/instance/global.ts: same two changes applied to streamEvents()
  which backs /global/event and /global/sync-event
- bus/global.ts: raise GlobalBus.setMaxListeners to 100 — each SSE
  connection adds one listener; the default limit of 10 causes
  MaxListenersExceededWarning under normal concurrent usage

Fixes anomalyco#22198, fixes anomalyco#22422

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the needs:compliance This means the issue will auto-close after 2 hours. label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@zhumengzhu zhumengzhu closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:compliance This means the issue will auto-close after 2 hours.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak: SSE connections stuck in CLOSE_WAIT cause unbounded AsyncQueue growth (~14 MB/sec) Memory Leak Warning

1 participant