tools: parallel dispatch of read-only tool calls in RunToolLoop#98
Merged
Conversation
When the model returns multiple tool_use blocks in one turn,
RunToolLoop now runs the read-only ones concurrently on worker
threads instead of one-at-a-time. Mutating tools (fs_write, shell,
fs_edit_hashline) still serialise — each lands in its own batch
of one — so race-prone tools never share the wire.
Picoclaw is fully serial (pkg/agent/pipeline_execute.go, plain
for-range loop); nanobot has it as an opt-in flag on AgentRunSpec
(nanobot/agent/runner.py, batched asyncio.gather). This brings
PasClaw to nanobot parity, with the same "categorise tools first,
batch by category" insight.
Implementation:
PasClaw.Tools.Types
+ TToolCategory = (tcMutating, tcReadOnly)
+ TTool.Category field
Zero-init defaults to tcMutating so a forgotten Category on
any record-built tool fails closed — never accidentally
parallel.
PasClaw.Tools.Obj
+ TPasClawTool.Category: TToolCategory virtual
Default tcMutating; OOP tool authors override to tcReadOnly
when their Run does no shared-state mutation. Install
propagates Self.Category into the TTool record.
PasClaw.Tools (built-in OOP wrappers)
+ TWebSearchTool, TWebFetchTool, TMemoryTool override
Category → tcReadOnly.
TShellTool stays at base tcMutating; TFileSystemTool is a
bundle that delegates to RegisterFSTools so per-sub-tool
Category is set there.
Built-in record-based tools (PasClaw.Tools.{WebSearch,
WebFetch,Memory,FS}) — explicitly set Category on each Register*
call. Read-only: web_search, web_fetch, memory_search, fs_read,
fs_grep, fs_list. Mutating: fs_write, fs_edit_hashline,
shell_exec.
PasClaw.Tools.ToolLoop
+ TToolLoopConfig.Parallel: Boolean (default False on the
record). Caller opts in.
+ DispatchOneToolCall function — extracted from the old inline
for-loop body, now the single source of truth for preflight
+ RunTool + fs_edit_hashline retry logic. Pure with respect
to shared state (per-call HTTP clients, thread-safe
LogWarn, read-only sandbox globals).
+ TToolCallWorker — TThread that runs DispatchOneToolCall on
one TToolCallDispatch slot. Suspended on construct, main
thread Starts + WaitFor + Free in array order.
+ PartitionToolBatches helper — walks Resp.ToolCalls, coalesces
consecutive tcReadOnly calls into one batch, emits a
batch-of-one for every tcMutating. Order preserved across
batches so tool_result history append order matches the
model's tool_use emit order (Anthropic pairs by id, but
OpenAI/Gemini pair positionally — both work).
Rewrote the dispatch block: for each batch, fire all OnToolCall
in array order on the main thread, then run the batch (parallel
if Cfg.Parallel and size > 1; serial otherwise), then fire all
OnToolResult and append tool_result messages in array order on
the main thread. Callbacks stay on the main thread (avoids
threading the embedder's Indy/TUI Form across).
Callers opted in to Parallel := True:
src/cmd/PasClaw.Cmd.Agent (CLI agent loop)
src/pkg/agent/PasClaw.Agent (TPasClawAgent.ChatHistory)
src/pkg/gateway/PasClaw.Gateway.Server (3 sites — gateway,
chat/completions,
responses endpoint)
src/pkg/tui/PasClaw.TUI (interactive TUI)
src/pkg/channels/PasClaw.Channels.{Telegram,Discord,LINE,
WhatsApp,Matrix,IRC}
Six channel handlers all share the same one-line pattern.
Verification:
Wall-clock test (write + run + remove): two TPasClawTool subclasses
each Sleep(500ms), tcReadOnly, registered via Agent.RegisterTool.
Stub TFakeProvider returns one round of two tool_use blocks then
loop-exit. RunToolLoop timed both ways:
serial (Parallel=False): 1000 ms
parallel (Parallel=True): 501 ms
speedup: 49.9%
Matches the theoretical max — max(500,500) = 500ms parallel vs
500+500 = 1000ms serial.
make smoke 11/11 green. All three samples (Console / Simple /
Server) still build.
https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
This was referenced May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the model returns multiple
tool_useblocks in one turn,RunToolLoopnow runs the read-only ones concurrently on worker threads instead of one-at-a-time. Mutating tools (fs_write,shell,fs_edit_hashline) still serialise — each lands in its own batch of one — so race-prone tools never share the wire.Picoclaw is fully serial (
pkg/agent/pipeline_execute.go, plainfor-rangeloop). Nanobot has it as an opt-in flag onAgentRunSpec(nanobot/agent/runner.py, batchedasyncio.gather). This PR brings PasClaw to nanobot parity, using the same "categorise tools first, batch by category" insight.Implementation
Tool category metadata
PasClaw.Tools.Types.TToolCategory = (tcMutating, tcReadOnly)—tcMutatingis the first/zero-init value so a forgottenCategoryon any record-built tool fails closed (never accidentally parallel).TTool.Categoryfield on the record.TPasClawTool.Category: TToolCategoryvirtual, defaulttcMutating. OOP tool authors override totcReadOnlywhen theirRundoes no shared-state mutation.InstallpropagatesSelf.Categoryinto the TTool record.PasClaw.Tools:TWebSearchTool,TWebFetchTool,TMemoryTooloverride →tcReadOnly.TShellToolstaystcMutating.TFileSystemToolis a bundle that delegates toRegisterFSTools(per-sub-tool category set there).Categoryon eachRegister*call:web_search,web_fetch,memory_search,fs_read,fs_grep,fs_listfs_write,fs_edit_hashline,shell_execLoop dispatch
TToolLoopConfig.Parallel: Boolean(default False on the record — caller opts in).DispatchOneToolCallfunction extracted from the old inline for-loop body. Single source of truth for preflight +Registry.RunTool+fs_edit_hashlineretry logic. Pure with respect to shared state (per-call HTTP clients, thread-safeLogWarn, read-only sandbox globals).TToolCallWorker: TThreadrunsDispatchOneToolCallon oneTToolCallDispatchslot. Suspended on construct, main threadStarts +WaitFors +Frees in array order.PartitionToolBatcheswalksResp.ToolCalls, coalesces consecutivetcReadOnlycalls into one batch, emits a batch-of-one for everytcMutating. Order preserved across batches sotool_resulthistory append order matches the model'stool_useemit order (Anthropic pairs by id, OpenAI/Gemini pair positionally — both work).OnToolCallin array order on the main thread, then run the batch (parallel ifCfg.Paralleland size > 1; serial otherwise), then fire allOnToolResultand appendtool_resultmessages in array order on the main thread. Callbacks stay on the main thread — avoids threading the embedder's Indy/TUI/Form across.Opt-in by 12 callers
Cmd.Agent(CLI agent loop)TPasClawAgent.ChatHistoryGateway.Server(3 sites — gateway, chat/completions, responses endpoint)TUI.PasClaw.TUI(interactive TUI)Verification
Wall-clock test (built + run + removed; not committed): two
TPasClawToolsubclasses eachSleep(500ms),tcReadOnly. StubTFakeProviderreturns one round of twotool_useblocks then loop-exit.RunToolLooptimed both ways:Matches the theoretical max —
max(500, 500) = 500msparallel vs500 + 500 = 1000msserial.make smoke11/11 green. All three samples (Console / Simple / Server) still build.Future work
tcMutating(registered withoutCategoryset → zero-init falls through to mutating). Correct for now — they share a single stdin/stdout pipe per server. Adding a request-mux lock would let them parallelize too.max_parallel_toolscap: not bounded today. If a runaway agent ever fans out 20+ concurrent web fetches and trips a resource limit, add a queue.https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
Generated by Claude Code