fix: don't leak exec I/O threads when waiting on the change fails#2558
Open
tonyandrewmeyer wants to merge 2 commits into
Open
fix: don't leak exec I/O threads when waiting on the change fails#2558tonyandrewmeyer wants to merge 2 commits into
tonyandrewmeyer wants to merge 2 commits into
Conversation
…ails If the change for an exec didn't finish before the client-side timeout (for example, because an orphaned grandchild of the exec'd command kept the stdio pipe open past the exec timeout), or waiting on it failed in any other way, ExecProcess._wait() propagated the exception without any teardown. The I/O pump threads were left blocked in recv() on the still open websockets, and being non-daemon threads they then blocked interpreter shutdown indefinitely (wedging the Juju hook). Tear the exec down when the wait fails: - Shut the websocket connections down at the OS level (SHUT_RDWR), not just close the file descriptors: closing a socket doesn't wake other threads blocked in recv() on it, so a plain shutdown() (or joining the threads first, as the success path does) would hang. - Treat a connection closed underneath an I/O pump thread as end-of-stream rather than letting the exception escape, which printed noisy WebSocketConnectionClosedException tracebacks from threading.excepthook. - Create the I/O threads as daemon threads, so that any teardown path that's still missed (including never calling wait()) can't block interpreter shutdown. Also close any already-connected websockets when connecting the exec websockets fails partway through, rather than leaking them. Fixes canonical#2556 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…#21) The error path added in _teardown_after_error() nulls out _cancel_stdin and _cancel_reader after using them, but the success path in _wait() did not. Because both wait() and wait_output() call _wait(), calling them in turn (or wait() twice) would re-run _cancel_stdin() and os.close() the already-closed cancel_reader fd, raising OSError on the second call. Clear both on the success path too, matching _teardown_after_error(), and add a regression test. Also note in _force_close_websocket() that it depends on websocket-client's internal WebSocket.sock attribute. Co-authored-by: Claude <noreply@anthropic.com>
hpidcock
approved these changes
Jun 12, 2026
hpidcock
left a comment
Member
There was a problem hiding this comment.
Seems logical, not a python expert so had to read up on threading/daemon threads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If the change for an exec doesn't finish before the client-side timeout, or waiting on it fails in any other way,
ExecProcess._wait()propagates the exception without any teardown. The I/O pump threads are left blocked inrecv()on the still open websockets, and being non-daemon threads they then block interpreter shutdown indefinitely, wedging the Juju hook. See #2556 for the reproduction details (note that the leak needs the change to outlive the client-side wait, for example an orphan holding the exec's stdio pipe open; a plain exec timeout completes the change server-side and already cleans up correctly).This tears the exec down when the wait fails:
SHUT_RDWR) rather than only closing the file descriptors. Closing a socket doesn't wake other threads blocked inrecv()on it, so thefinallyapproach suggested in the issue (or joining the threads first, as the success path does) would hang the hook immediately instead of at exit.WebSocketExceptionorOSError) as end-of-stream rather than letting the exception escape the thread, which printed noisyWebSocketConnectionClosedExceptiontracebacks viathreading.excepthook.wait()at all) can't block interpreter shutdown.The behaviour of the success path is unchanged: the threads are still joined before the websockets are shut down, so output is never truncated.
Verified against a real Pebble (v1.31.0) on Python 3.10 through 3.14: before the fix the reproduction from #2556 leaks two
copyfileobjthreads and hangs at exit; with the fix it raisesops.pebble.TimeoutErroras before, leaves no threads behind, and exits cleanly.Fixes #2556