Skip to content

agent+acp: preserve session on retriable agent errors#544

Merged
tlongwell-block merged 1 commit into
mainfrom
acp-retry-on-retriable-error
May 12, 2026
Merged

agent+acp: preserve session on retriable agent errors#544
tlongwell-block merged 1 commit into
mainfrom
acp-retry-on-retriable-error

Conversation

@tlongwell-block

@tlongwell-block tlongwell-block commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Don't invalidate the session when the agent returns an application error (AgentError). The agent caught the problem before the session became unusable — the existing requeue mechanism retries on the same session with full context intact.

What

One change in pool.rs: skip agent.state.invalidate(&source) when the error is AcpError::AgentError.

Why

Transient LLM glitches (e.g. stop=tool_use with zero tool calls) cause sprout-agent to return a JSON-RPC application error. The session is still healthy — MCP servers are running, history is valid, the stdio pipe is intact. Previously the harness invalidated the session on any prompt error, forcing a fresh session/new on retry: new MCP server processes, lost history, lost todo state.

Now the harness distinguishes retriable application errors from session-corrupting transport/protocol errors. The session stays alive and the batch retries on the same session via the existing requeue + backoff mechanism.

@tlongwell-block tlongwell-block force-pushed the acp-retry-on-retriable-error branch from cd48954 to 36f2996 Compare May 12, 2026 15:34
@tlongwell-block tlongwell-block changed the title acp: retry transient agent errors on the same session agent+acp: preserve session on retriable agent errors May 12, 2026
Don't invalidate the session when the agent returns AgentError. The
agent caught the problem (e.g. LLM returned stop=tool_use with zero
tool_calls) and the session is still usable. The existing requeue
mechanism retries on the same session with full context intact.

Previously, any prompt error invalidated the session, forcing a fresh
session/new on retry — new MCP servers, lost history, lost todo state.
Now AgentError is distinguished from transport/protocol errors that
genuinely corrupt session state.
@tlongwell-block tlongwell-block force-pushed the acp-retry-on-retriable-error branch from 36f2996 to 248338c Compare May 12, 2026 15:51
@tlongwell-block tlongwell-block merged commit 385e171 into main May 12, 2026
15 checks passed
@tlongwell-block tlongwell-block deleted the acp-retry-on-retriable-error branch May 12, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant