fix: add retry status timeout to prevent infinite hang#2263
Open
cloudwaddie-agent wants to merge 2 commits intocode-yeongyu:devfrom
Open
fix: add retry status timeout to prevent infinite hang#2263cloudwaddie-agent wants to merge 2 commits intocode-yeongyu:devfrom
cloudwaddie-agent wants to merge 2 commits intocode-yeongyu:devfrom
Conversation
When runtime fallback triggers on errors like 503, the session status becomes 'retry'. The poll-for-completion loop treats 'retry' the same as 'busy', causing it to wait indefinitely without any timeout. This fix adds a 3-minute timeout (based on max_fallback_attempts * cooldown_seconds = 3 * 60s) for the retry status. If the session stays in retry status longer than the timeout, it exits with an error instead of hanging forever (exit code 143). Fixes: Session hangs with exit code 143 when 503 errors trigger runtime fallback retry loops
This was referenced Mar 3, 2026
There was a problem hiding this comment.
1 issue found across 1 file
Confidence score: 2/5
- There is a high-confidence, high-severity logic issue in
src/cli/run/poll-for-completion.ts:retryStatusStartTimestampis not reset onbusy, which can incorrectly accumulate elapsed time across unrelated retries. - This can falsely trigger the 3-minute timeout and prematurely fail completion polling, so there is clear user-impacting regression risk if merged as-is.
- Given the concrete behavior impact and strong confidence (8/10 severity, 10/10 confidence), this is better treated as high merge risk until fixed.
- Pay close attention to
src/cli/run/poll-for-completion.ts- timeout tracking needs to be scoped to the current retry window, not carried across transient status changes.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/cli/run/poll-for-completion.ts">
<violation number="1" location="src/cli/run/poll-for-completion.ts:115">
P1: Custom agent: **Opencode Compatibility**
The `retryStatusStartTimestamp` is not reset when the status changes to `busy`, causing accumulated time between unrelated transient errors to falsely trigger the 3-minute timeout and prematurely terminate long-running sessions.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When runtime fallback triggers on errors like 503, the session status becomes "retry". The poll-for-completion loop treats "retry" the same as "busy", causing it to wait indefinitely without any timeout.
This fix adds a 3-minute timeout (based on
max_fallback_attempts * cooldown_seconds = 3 * 60s) for the retry status. If the session stays in retry status longer than the timeout, it exits with an error instead of hanging forever (exit code 143).Changes
DEFAULT_RETRY_STATUS_TIMEOUT_MSconstant (180000ms = 3 minutes)retryStatusTimeoutMsoption toPollOptionsinterfaceRoot Cause
The issue was reported in CloudWaddie/actions-agent#82 where the agent hangs with exit code 143 when 503 errors trigger runtime fallback retry loops. The session gets stuck in "retry" status and the poll loop keeps waiting forever.
Testing
The fix should be tested by:
Fixes: CloudWaddie/actions-agent#82
Summary by cubic
Prevent infinite hangs when runtime fallback keeps retrying by adding a 3-minute timeout for sessions in "retry". The poll loop now exits with a clear, actionable error instead of waiting forever.
Written for commit e1ce286. Summary will update on new commits.