Skip to content

fix(sprout-agent): cap tool-result text at 50 KiB with middle elision#952

Open
tlongwell-block wants to merge 1 commit into
mainfrom
eva/tool-result-text-cap
Open

fix(sprout-agent): cap tool-result text at 50 KiB with middle elision#952
tlongwell-block wants to merge 1 commit into
mainfrom
eva/tool-result-text-cap

Conversation

@tlongwell-block

Copy link
Copy Markdown
Collaborator

Problem

A single MCP tool result can carry up to 8 MiB of text into agent history (MAX_TOOL_RESULT_BYTES, raised from 256 KiB in #602 to let view_image return real images). Tool results are re-sent on every request, so a couple of fat cats blow past the context budget, force a handoff, and the agent keeps only a lossy ~8k-token summary — it feels reset mid-task. Reported by Wes in #sprout-bugs; the investigation itself got handed off four times mid-research, which is all the repro anyone needs.

Fix

Split the budget at the existing chokepoint, tool_result_content() in mcp.rs — the single path every MCP tool result takes before entering history:

  • Text capped at 50 KiB per result (new SPROUT_AGENT_MAX_TOOL_RESULT_TEXT_BYTES, validated 1 KiB..=8 MiB). 50 KiB matches the shell caps in sprout-dev-mcp, goose, and pi; codex's 10 KB is the aggressive end.
  • Images keep the 8 MiB total budget dev-mcp: add view_image tool #602 needed — they're genuinely large and already accounted via estimated_bytes.
  • Middle elision, not head-only: head and tail survive, with an inline marker reporting bytes elided. Build logs put the verdict at the end; head-only truncation is how an agent confidently retries a command whose failure reason was cut off. (Codex's strategy; pi/goose tail-keep makes the same point.)

Why here

  • Not sprout-dev-mcp: its tools already self-limit (shell = 50 KiB + artifact spill). Fixing only the tools leaves every other MCP server able to flood history — goose has exactly this gap. Defense belongs in the harness.
  • Not request-assembly or compaction-time: source-time truncation is deterministic and prompt-cache-friendly — what the model saw once is what it sees forever. Codex truncates at history-record time; results enter our history through this one path, so source-time alone covers it.

Outcome

Worst-case turn (64 tool calls) drops from "up to 512 MiB of history" to ≤3.2 MB of text. Context fills from work, not from one fat cat. Handoffs become rare instead of routine.

Verification

  • cargo test -p sprout-agent — 110 passed, 0 failed (full suite, incl. 5 new tests: middle-elision, budget-untouched, image-exempt, UTF-8 boundary stress)
  • cargo clippy -p sprout-agent --all-targets clean, cargo fmt --check clean
  • Merges clean against origin/main

A single MCP tool result could carry 8 MiB of text into history
(MAX_TOOL_RESULT_BYTES, raised from 256 KiB in #602 for view_image).
A couple of fat results blow the context budget, force a handoff, and
the agent loses its thread mid-task.

Split the budget at the existing chokepoint, tool_result_content():

- text is capped at 50 KiB per result (new
  SPROUT_AGENT_MAX_TOOL_RESULT_TEXT_BYTES, validated 1 KiB..=8 MiB),
  matching the shell caps in sprout-dev-mcp, goose, and pi
- images keep the 8 MiB total budget #602 needed
- oversized text is middle-elided: head and tail survive (build logs
  end with the verdict), with an inline marker reporting bytes elided
  so the model knows to re-run a narrower command

Truncating at the source (not request assembly) keeps history
deterministic and prompt-cache-friendly; codex, pi, and goose all
converge on this shape.

Co-authored-by: npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
@tlongwell-block tlongwell-block requested a review from a team as a code owner June 10, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant