relay: round-trip tool metadata + surface upstream non-2xx as error (PR #323 review fix)#325
Conversation
Two PR #323 review fixes. P1: Tool-call metadata MUST round-trip through the envelope. A multi- turn relayed session that fires a tool call had assistant messages with ToolCalls and tool-result messages with ToolCallId/Name; both got dropped because BuildRelayRequestBody only emitted role+content, and the worker's DecodeMessages only read those two. The second turn then sent OpenAI/Anthropic/Gemini a request with no tool_call_id -> "missing tool_call_id" 400 / silent stall. Encoder now emits the OpenAI-shape fields (tool_calls array, tool_call_id, name) when non-empty; worker decoder reads them back. Same wire fields workers already see for response tool calls (HandleRelayRespond), so any worker that already forwards results forward this too. P2: Workers must encode any non-2xx upstream provider status as a relay error, not as a successful reply. Pre-fix, BuildResponseJSON only treated StatusCode=-1 (socket/TLS failure) as an error. A 429 or 5xx from upstream OpenAI/etc. fell into the else branch with the error JSON body as content, and the gateway's DecodeResponse mapped that to StatusCode := 200 -- bypassing Cfg.Fallbacks entirely and surfacing the provider error text to the agent as the assistant's reply. New ResponseIsError helper treats StatusCode -1 and anything outside [200..299] as error (StatusCode = 0 from older providers still treats as success to preserve compat). Tests: extended src/tests/relay_queue_tests.pas with - tool-call metadata round-trips through the envelope - worker encodes non-2xx upstream status as a relay error covering 429 / 500 / -1 / 200 / 0 paths. Same suite, runs via `make test-relay-queue`. All 20 cases green. BuildRelayWorkerResponseJSON is now exported from PasClaw.Cmd.Relay solely so the test program can drive the encoder directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00d9ba60c8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| TCObj.PutStr('id', Messages[i].ToolCalls[j].Id); | ||
| TCObj.PutStr('type', Messages[i].ToolCalls[j].Kind); | ||
| FnObj := TJsonObject.Create; | ||
| FnObj.PutStr('name', Messages[i].ToolCalls[j].Func.Name); | ||
| FnObj.PutStr('arguments', Messages[i].ToolCalls[j].Func.Arguments); |
There was a problem hiding this comment.
Preserve provider_signature with relayed tool calls
When the relay worker is backed by Gemini 3+ and returns a tool call, the TToolCall.ProviderSignature must be echoed on the next request so PasClaw.Providers.Gemini can emit thoughtSignature (see src/pkg/providers/PasClaw.Providers.Gemini.pas:531-539). This new request envelope serializes only id/type/function fields, and the matching decoder only restores those fields, so relayed Gemini tool-call continuations still drop the signature and the next Gemini request fails with the documented missing thought_signature 400.
Useful? React with 👍 / 👎.
Summary
Two review fixes against PR #323. Stacked on
claude/cmd-relay-workerso the diff shows only the delta; auto-rebases to main once #323 merges.P1 — tool-call metadata must round-trip
A multi-turn relayed session that fires a tool call had its assistant
ToolCallsand tool-resultToolCallId/Namesilently dropped — both ends of the wire were stripping them. The next turn forwarded a request with notool_call_id, and OpenAI 400'd with "missing tool_call_id" (Anthropic / Gemini equivalent on their side). Fixed at both ends:BuildRelayRequestBody(gateway side,PasClaw.Providers.Relay.pas): emitstool_callsarray (id / type / function.name / function.arguments), top-leveltool_call_id, and top-levelnamewhen non-empty.DecodeMessages(worker side,PasClaw.Cmd.Relay.pas): reads them back into the correspondingTMessagefields, with a newDecodeToolCallshelper that mirrors the existing response-side decoder inHandleRelayRespond.P2 — surface upstream non-2xx as a relay error
BuildResponseJSONonly encodederrorwhenStatusCode = -1(pre-HTTP socket failure). A 429 / 5xx from upstream OpenAI/Anthropic/etc. fell into the success branch, the gateway saw no error field, mapped toStatusCode := 200, and the agent saw the provider's error JSON as the assistant's reply — bypassingCfg.Fallbacksentirely. NewResponseIsErrorhelper treats-1and any code outside [200..299] as an error;StatusCode = 0(older providers that don't populate the code) still treats as success for back-compat. RenamedBuildResponseJSON→BuildRelayWorkerResponseJSONand exposed it in the interface so the new test can drive it.Test plan
Both fixes covered by new cases in
src/tests/relay_queue_tests.pas:tool-call metadata round-trips through the envelope— assistant withToolCalls, tool result withToolCallId+Name; assertstool_callsarray,id, functionname+arguments,tool_call_id,role:"tool"all present.worker encodes non-2xx upstream status as a relay error— coversStatusCode = 429,500,-1(all →errorkey with descriptive prefix), and200/0(both → noerrorkey, success path).make test-relay-queue— 20/20 cases green.make allclean.🤖 Generated with Claude Code
https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
Generated by Claude Code