Skip to content

⚡ Claude Token Optimization2026-06-04 — security-guard #4306

Description

@github-actions

Target Workflow: security-guard

Source report: #4304
Estimated cost per active run: $0.45
Total tokens per active run: ~424K raw / ~3.4M effective
Cache read rate: N/A (api-proxy passthrough — no cache telemetry)
Cache write rate: N/A
LLM turns: 6.2 avg (instrumented runs); max-turns cap is 6, many runs exceed via +1 final turn
Model: claude-sonnet-4-5

Current Configuration

Setting Value
Tools loaded github (pull_requests + repos toolsets) — ~10 tools
Tools actually used mcp__github__get_pull_request_diff, mcp__github__list_pull_request_files, mcp__github__get_pull_request
Bash tools (actual) gh pr diff, gh pr view, gh api repos/... — all avoidable
Network groups github only ✅
Pre-agent steps Yes — fetches up to 100 KB of PR diff
Prompt size ~4,200 chars (~1,050 tokens static) + injected diff (up to ~25K tokens)

Effective token multiplier: 8.1× — context is 424K raw tokens repeated across 6+ turns = 3.4M effective tokens billed.

Key Problem: Agent Ignores the Pre-Fetched Diff

The steps: pre-agent block fetches the full PR diff and injects it as PR_FILES into the prompt. Despite this, the tool_usage log shows the agent making 7 redundant gh pr diff bash calls (across 4 runs) and 13 gh api repos/... calls (across 10 runs), plus 4 gh pr view calls.

This adds 1–2 extra turns, each of which re-sends the entire accumulated context (~50–70K tokens). Across 12 instrumented runs at $0.45/run, this alone costs an estimated $2.00–$2.50 in avoidable turns per week.

Recommendations

1. Enforce fast-path: block ALL bash diff/API calls in the prompt

Estimated savings: ~100–150K effective tokens/run (~35–45% cost reduction)

The current prompt says "Do NOT call gh pr diff, git diff, or gh api .../files" but allows "direct file reads from the checked-out repository" and doesn't explicitly block gh pr view. The agent is treating these as loopholes.

Change in .github/workflows/security-guard.md — replace the current instruction block:

3. **Use the pre-fetched diff below as your primary source of truth. Do NOT call `gh pr diff`, `git diff`, or `gh api .../files`.** If you see `[DIFF TRUNCATED ...]`, fetch full context once with `mcp__github__get_pull_request_diff`, then continue.
4. **Do not use local branch comparisons or commit history** (for example `git diff main...HEAD` or `git log main..`) unless you first confirm the base branch exists locally...
5. **Use direct file reads from the checked-out repository** only for files you need to inspect further...

Replace with a single hard rule:

3. **Use ONLY the pre-fetched diff below.** Do NOT call `gh pr diff`, `gh pr view`, `gh api`, `git diff`, `git log`, or `git show`. Do NOT read files from the checkout. If `[DIFF TRUNCATED ...]` appears, call `mcp__github__get_pull_request_diff` once — then stop making tool calls and analyze inline.

This eliminates the 2–3 extra bash-tool turns per active run.

2. Pre-fetch PR metadata in steps: to eliminate gh pr view calls

Estimated savings: ~35–50K effective tokens/run (~10–15%)

The agent makes gh pr view calls to get PR title, description, and author. Pre-fetch this in the steps: block and inject it alongside the diff.

Add to the pr-diff step or as a new step:

- name: Fetch PR metadata
  id: pr-meta
  if: github.event.pull_request.number
  run: |
    PR_INFO=$(gh pr view "$PR_NUMBER" --repo "$GH_REPO" \
      --json title,author,body,baseRefName,headRefName \
      --jq '"**Title:** " + .title + "\n**Author:** " + .author.login + "\n**Base→Head:** " + .baseRefName + "→" + .headRefName')
    echo "PR_META<<GHAWMETA" >> "$GITHUB_OUTPUT"
    echo "$PR_INFO" >> "$GITHUB_OUTPUT"
    echo "GHAWMETA" >> "$GITHUB_OUTPUT"
  env:
    GH_TOKEN: ${{ github.token }}
    PR_NUMBER: ${{ github.event.pull_request.number }}
    GH_REPO: ${{ github.repository }}

Then inject ${{ steps.pr-meta.outputs.PR_META }} at the top of the "Changed Files" section in the prompt.

3. Remove verbose "Repository Context" section

Estimated savings: ~20–30K effective tokens/run (~5–8%) across 6 turns

The current prompt has a 400+ word "Repository Context" and "Architecture" description explaining how AWF containers work. The security agent only needs to know which patterns are dangerous — not the full architecture.

Replace the entire "Repository Context" block with a 3-line summary:

## Repository Context

AWF is a network firewall for AI agents. Security-critical files: `src/host-iptables.ts`, `containers/agent/setup-iptables.sh`, `src/squid-config.ts`, `src/docker-manager.ts`, `containers/agent/entrypoint.sh`, `src/domain-patterns.ts`.

This saves ~450 tokens × 6 turns = ~2,700 tokens raw, ~21,900 effective tokens per run.

4. Reduce max-turns from 6 to 4

Estimated savings: ~80–120K effective tokens/run (~25%) for runs currently hitting the cap

With recommendation #1 eliminating bash tool calls, the agent should complete in 3–4 turns:

  • Turn 1: Read pre-fetched diff + fast-path check
  • Turn 2: Deep analysis (if needed)
  • Turn 3: Write PR comment + call noop/add_labels
engine:
  id: claude
  model: claude-sonnet-4-5
  max-turns: 4   # was: 6

Cache Analysis (Anthropic-Specific)

Cache telemetry is not available for this workflow (token_usage_summary absent — api-proxy operates in passthrough mode without cache tracking).

However, the 8.1× effective/raw multiplier indicates Anthropic's automatic prefix caching IS active: the growing context window is cached and reused within each session. This is beneficial — the first turn's large context (PR diff + system prompt) is cached for subsequent turns.

The problem is not cache inefficiency — it's too many turns causing too many cache reads. Reducing turns (via recommendations #1 and #4) will cut cache-read roundtrips proportionally.

Turn Est. Raw Tokens Effective Tokens (8.1× factor) Note
1 ~55K ~55K First turn — full context
2 ~60K ~120K Growing context
3 ~65K ~260K Cache reads compound
4 ~70K ~540K Large cache-read
5 ~75K ~1.1M Dominates cost
6 ~80K ~1.3M Often just writing the comment

Reducing from 6 to 3 turns cuts effective tokens by ~75%.

Expected Impact

Metric Current Projected Savings
Total tokens/active run (raw) 424K ~200K −53%
Effective tokens/active run 3.4M ~900K −74%
Cost/active run $0.45 ~$0.12 −73%
LLM turns/active run 6.2 ~3 −3 turns
Total weekly cost (Security Guard) $5.36 ~$1.45 −$3.91/wk

Implementation Checklist

Generated by Daily Claude Token Optimization Advisor · sonnet46 1.7M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions