fix(sse): prevent memory leak from zombie SSE connections#22551
Closed
zhumengzhu wants to merge 1 commit intoanomalyco:devfrom
Closed
fix(sse): prevent memory leak from zombie SSE connections#22551zhumengzhu wants to merge 1 commit intoanomalyco:devfrom
zhumengzhu wants to merge 1 commit intoanomalyco:devfrom
Conversation
SSE connections stuck in TCP CLOSE_WAIT never triggered stream.onAbort(), leaving AsyncQueue, heartbeat intervals, and Bus subscriptions running forever — causing unbounded memory growth (~14 MB/sec, peak 24.5 GB). Four changes across four files: - util/queue.ts: add capacity limit (default 1000), drop oldest entry on overflow — defensive backstop so growth is bounded even if cleanup is missed - server/instance/event.ts: listen on c.req.raw.signal as a second abort path; wrap writeSSE in try/catch and call stop() on write failure - server/instance/global.ts: same two changes applied to streamEvents() which backs /global/event and /global/sync-event - bus/global.ts: raise GlobalBus.setMaxListeners to 100 — each SSE connection adds one listener; the default limit of 10 causes MaxListenersExceededWarning under normal concurrent usage Fixes anomalyco#22198, fixes anomalyco#22422 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #22198, fixes #22422.
Problem
Every SSE endpoint (
/event,/global/event,/global/sync-event) reliessolely on
stream.onAbort(stop)to release resources on disconnect:In TCP CLOSE_WAIT — the Electron shell reconnecting on minimize/restore, or a
reverse proxy that does not propagate the FIN — Bun/Hono never fires
stream.onAbort. All three resources leak for the lifetime of the process.The
finally { stop() }block does not help:for awaitis suspended onq.next(), which returns a Promise that never resolves becausestop()isthe only thing that pushes
nullinto the queue.With 66–74 zombie connections observed in the wild, RSS grew at ~14 MB/sec
and peaked at 24.5 GB before OOM.
Full root cause analysis with sequence and flow diagrams:
https://gist.github.com/zhumengzhu/1e3392390b34a913d9dd59a61df87f9a
Fix
Four changes across four files:
c.req.raw.signalas a second abort path (event.ts,global.ts)Request.signalis aborted by Bun when the TCP connection closes, includingCLOSE_WAIT cases where
stream.onAbortdoes not fire:writeSSEtry/catch (event.ts,global.ts)If neither abort path fires, the next write to a dead socket throws. Cleanup
happens at most one heartbeat interval (10 s) after the connection dies:
AsyncQueuecapacity limit (util/queue.ts)Added an optional
limitparameter (default1000). When the queue is fulland no resolver is waiting, the oldest entry is dropped. Defensive backstop:
memory growth is bounded even if both abort paths are somehow missed.
GlobalBus.setMaxListeners(100)(bus/global.ts)Each SSE connection registers one listener on
GlobalBusand removes it ondisconnect. With the fixes above the count stays low in practice, but the
default limit of 10 still triggers
MaxListenersExceededWarningduring anytransient burst of reconnections.
The existing
doneflag instop()ensures idempotency — ifonAbortandreq.raw.signalboth fire, cleanup runs exactly once.Testing
Full test suite: 1890 pass, 1 fail — the failing test
(
tool.write > file permissions) is pre-existing onupstream/dev,confirmed before any changes.
Targeted tests for the changed logic:
test-queue.ts— 13 assertions: capacity limit, FIFO ordering, resolver bypass, null termination, zombie scenariotest-sse-logic.ts— 6 assertions: each code path including confirmation of the upstream bugRaw output:
test-output.txtNote: a direct before/after RSS comparison on Linux local connections is not
meaningful —
onAbortfires correctly there and CLOSE_WAIT does not occur.The logic tests directly verify the broken code paths.
Prior art
Approach for
c.req.raw.signalinspired by #18395 (@abrekhov).writeSSEtry/catch approach from #15646 (@brendandebeasi).setMaxListenersfix addresses the same symptom as #21873 (@saurav-shakya).All three were closed without merging; this is an independent reimplementation.