Skip to content

tools: parallel dispatch of read-only tool calls in RunToolLoop#98

Merged
FMXExpress merged 1 commit into
mainfrom
claude/parallel-tools
May 30, 2026
Merged

tools: parallel dispatch of read-only tool calls in RunToolLoop#98
FMXExpress merged 1 commit into
mainfrom
claude/parallel-tools

Conversation

@FMXExpress

Copy link
Copy Markdown
Owner

Summary

When the model returns multiple tool_use blocks in one turn, RunToolLoop now runs the read-only ones concurrently on worker threads instead of one-at-a-time. Mutating tools (fs_write, shell, fs_edit_hashline) still serialise — each lands in its own batch of one — so race-prone tools never share the wire.

Picoclaw is fully serial (pkg/agent/pipeline_execute.go, plain for-range loop). Nanobot has it as an opt-in flag on AgentRunSpec (nanobot/agent/runner.py, batched asyncio.gather). This PR brings PasClaw to nanobot parity, using the same "categorise tools first, batch by category" insight.

Implementation

Tool category metadata

  • PasClaw.Tools.Types.TToolCategory = (tcMutating, tcReadOnly)tcMutating is the first/zero-init value so a forgotten Category on any record-built tool fails closed (never accidentally parallel).
  • TTool.Category field on the record.
  • TPasClawTool.Category: TToolCategory virtual, default tcMutating. OOP tool authors override to tcReadOnly when their Run does no shared-state mutation. Install propagates Self.Category into the TTool record.
  • Built-in OOP wrappers in PasClaw.Tools: TWebSearchTool, TWebFetchTool, TMemoryTool override → tcReadOnly. TShellTool stays tcMutating. TFileSystemTool is a bundle that delegates to RegisterFSTools (per-sub-tool category set there).
  • Built-in record-based tools explicitly set Category on each Register* call:
    • Read-only: web_search, web_fetch, memory_search, fs_read, fs_grep, fs_list
    • Mutating: fs_write, fs_edit_hashline, shell_exec

Loop dispatch

  • TToolLoopConfig.Parallel: Boolean (default False on the record — caller opts in).
  • DispatchOneToolCall function extracted from the old inline for-loop body. Single source of truth for preflight + Registry.RunTool + fs_edit_hashline retry logic. Pure with respect to shared state (per-call HTTP clients, thread-safe LogWarn, read-only sandbox globals).
  • TToolCallWorker: TThread runs DispatchOneToolCall on one TToolCallDispatch slot. Suspended on construct, main thread Starts + WaitFors + Frees in array order.
  • PartitionToolBatches walks Resp.ToolCalls, coalesces consecutive tcReadOnly calls into one batch, emits a batch-of-one for every tcMutating. Order preserved across batches so tool_result history append order matches the model's tool_use emit order (Anthropic pairs by id, OpenAI/Gemini pair positionally — both work).
  • Rewrote the dispatch block: for each batch, fire all OnToolCall in array order on the main thread, then run the batch (parallel if Cfg.Parallel and size > 1; serial otherwise), then fire all OnToolResult and append tool_result messages in array order on the main thread. Callbacks stay on the main thread — avoids threading the embedder's Indy/TUI/Form across.

Opt-in by 12 callers

  • Cmd.Agent (CLI agent loop)
  • TPasClawAgent.ChatHistory
  • Gateway.Server (3 sites — gateway, chat/completions, responses endpoint)
  • TUI.PasClaw.TUI (interactive TUI)
  • 6 channel handlers (Telegram, Discord, LINE, WhatsApp, Matrix, IRC) — same one-line pattern

Verification

Wall-clock test (built + run + removed; not committed): two TPasClawTool subclasses each Sleep(500ms), tcReadOnly. Stub TFakeProvider returns one round of two tool_use blocks then loop-exit. RunToolLoop timed both ways:

serial   (Parallel=False): 1000 ms
parallel (Parallel=True):   501 ms
speedup: 49.9%
PASS

Matches the theoretical max — max(500, 500) = 500ms parallel vs 500 + 500 = 1000ms serial.

make smoke 11/11 green. All three samples (Console / Simple / Server) still build.

Future work

  • MCP-stdio tools default to tcMutating (registered without Category set → zero-init falls through to mutating). Correct for now — they share a single stdin/stdout pipe per server. Adding a request-mux lock would let them parallelize too.
  • max_parallel_tools cap: not bounded today. If a runaway agent ever fans out 20+ concurrent web fetches and trips a resource limit, add a queue.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon


Generated by Claude Code

When the model returns multiple tool_use blocks in one turn,
RunToolLoop now runs the read-only ones concurrently on worker
threads instead of one-at-a-time. Mutating tools (fs_write, shell,
fs_edit_hashline) still serialise — each lands in its own batch
of one — so race-prone tools never share the wire.

Picoclaw is fully serial (pkg/agent/pipeline_execute.go, plain
for-range loop); nanobot has it as an opt-in flag on AgentRunSpec
(nanobot/agent/runner.py, batched asyncio.gather). This brings
PasClaw to nanobot parity, with the same "categorise tools first,
batch by category" insight.

Implementation:

  PasClaw.Tools.Types
    + TToolCategory = (tcMutating, tcReadOnly)
    + TTool.Category field
    Zero-init defaults to tcMutating so a forgotten Category on
    any record-built tool fails closed — never accidentally
    parallel.

  PasClaw.Tools.Obj
    + TPasClawTool.Category: TToolCategory virtual
    Default tcMutating; OOP tool authors override to tcReadOnly
    when their Run does no shared-state mutation. Install
    propagates Self.Category into the TTool record.

  PasClaw.Tools (built-in OOP wrappers)
    + TWebSearchTool, TWebFetchTool, TMemoryTool override
      Category → tcReadOnly.
    TShellTool stays at base tcMutating; TFileSystemTool is a
    bundle that delegates to RegisterFSTools so per-sub-tool
    Category is set there.

  Built-in record-based tools (PasClaw.Tools.{WebSearch,
  WebFetch,Memory,FS}) — explicitly set Category on each Register*
  call. Read-only: web_search, web_fetch, memory_search, fs_read,
  fs_grep, fs_list. Mutating: fs_write, fs_edit_hashline,
  shell_exec.

  PasClaw.Tools.ToolLoop
    + TToolLoopConfig.Parallel: Boolean (default False on the
      record). Caller opts in.
    + DispatchOneToolCall function — extracted from the old inline
      for-loop body, now the single source of truth for preflight
      + RunTool + fs_edit_hashline retry logic. Pure with respect
      to shared state (per-call HTTP clients, thread-safe
      LogWarn, read-only sandbox globals).
    + TToolCallWorker — TThread that runs DispatchOneToolCall on
      one TToolCallDispatch slot. Suspended on construct, main
      thread Starts + WaitFor + Free in array order.
    + PartitionToolBatches helper — walks Resp.ToolCalls, coalesces
      consecutive tcReadOnly calls into one batch, emits a
      batch-of-one for every tcMutating. Order preserved across
      batches so tool_result history append order matches the
      model's tool_use emit order (Anthropic pairs by id, but
      OpenAI/Gemini pair positionally — both work).
    Rewrote the dispatch block: for each batch, fire all OnToolCall
    in array order on the main thread, then run the batch (parallel
    if Cfg.Parallel and size > 1; serial otherwise), then fire all
    OnToolResult and append tool_result messages in array order on
    the main thread. Callbacks stay on the main thread (avoids
    threading the embedder's Indy/TUI Form across).

  Callers opted in to Parallel := True:
    src/cmd/PasClaw.Cmd.Agent        (CLI agent loop)
    src/pkg/agent/PasClaw.Agent      (TPasClawAgent.ChatHistory)
    src/pkg/gateway/PasClaw.Gateway.Server (3 sites — gateway,
                                            chat/completions,
                                            responses endpoint)
    src/pkg/tui/PasClaw.TUI          (interactive TUI)
    src/pkg/channels/PasClaw.Channels.{Telegram,Discord,LINE,
                                       WhatsApp,Matrix,IRC}
    Six channel handlers all share the same one-line pattern.

Verification:

  Wall-clock test (write + run + remove): two TPasClawTool subclasses
  each Sleep(500ms), tcReadOnly, registered via Agent.RegisterTool.
  Stub TFakeProvider returns one round of two tool_use blocks then
  loop-exit. RunToolLoop timed both ways:

    serial   (Parallel=False): 1000 ms
    parallel (Parallel=True):   501 ms
    speedup: 49.9%

  Matches the theoretical max — max(500,500) = 500ms parallel vs
  500+500 = 1000ms serial.

  make smoke 11/11 green. All three samples (Console / Simple /
  Server) still build.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants