Target Workflow: security-guard
Source report: #4304
Estimated cost per active run: $0.45
Total tokens per active run: ~424K raw / ~3.4M effective
Cache read rate: N/A (api-proxy passthrough — no cache telemetry)
Cache write rate: N/A
LLM turns: 6.2 avg (instrumented runs); max-turns cap is 6, many runs exceed via +1 final turn
Model: claude-sonnet-4-5
Current Configuration
| Setting |
Value |
| Tools loaded |
github (pull_requests + repos toolsets) — ~10 tools |
| Tools actually used |
mcp__github__get_pull_request_diff, mcp__github__list_pull_request_files, mcp__github__get_pull_request |
| Bash tools (actual) |
gh pr diff, gh pr view, gh api repos/... — all avoidable |
| Network groups |
github only ✅ |
| Pre-agent steps |
Yes — fetches up to 100 KB of PR diff |
| Prompt size |
~4,200 chars (~1,050 tokens static) + injected diff (up to ~25K tokens) |
Effective token multiplier: 8.1× — context is 424K raw tokens repeated across 6+ turns = 3.4M effective tokens billed.
Key Problem: Agent Ignores the Pre-Fetched Diff
The steps: pre-agent block fetches the full PR diff and injects it as PR_FILES into the prompt. Despite this, the tool_usage log shows the agent making 7 redundant gh pr diff bash calls (across 4 runs) and 13 gh api repos/... calls (across 10 runs), plus 4 gh pr view calls.
This adds 1–2 extra turns, each of which re-sends the entire accumulated context (~50–70K tokens). Across 12 instrumented runs at $0.45/run, this alone costs an estimated $2.00–$2.50 in avoidable turns per week.
Recommendations
1. Enforce fast-path: block ALL bash diff/API calls in the prompt
Estimated savings: ~100–150K effective tokens/run (~35–45% cost reduction)
The current prompt says "Do NOT call gh pr diff, git diff, or gh api .../files" but allows "direct file reads from the checked-out repository" and doesn't explicitly block gh pr view. The agent is treating these as loopholes.
Change in .github/workflows/security-guard.md — replace the current instruction block:
3. **Use the pre-fetched diff below as your primary source of truth. Do NOT call `gh pr diff`, `git diff`, or `gh api .../files`.** If you see `[DIFF TRUNCATED ...]`, fetch full context once with `mcp__github__get_pull_request_diff`, then continue.
4. **Do not use local branch comparisons or commit history** (for example `git diff main...HEAD` or `git log main..`) unless you first confirm the base branch exists locally...
5. **Use direct file reads from the checked-out repository** only for files you need to inspect further...
Replace with a single hard rule:
3. **Use ONLY the pre-fetched diff below.** Do NOT call `gh pr diff`, `gh pr view`, `gh api`, `git diff`, `git log`, or `git show`. Do NOT read files from the checkout. If `[DIFF TRUNCATED ...]` appears, call `mcp__github__get_pull_request_diff` once — then stop making tool calls and analyze inline.
This eliminates the 2–3 extra bash-tool turns per active run.
2. Pre-fetch PR metadata in steps: to eliminate gh pr view calls
Estimated savings: ~35–50K effective tokens/run (~10–15%)
The agent makes gh pr view calls to get PR title, description, and author. Pre-fetch this in the steps: block and inject it alongside the diff.
Add to the pr-diff step or as a new step:
- name: Fetch PR metadata
id: pr-meta
if: github.event.pull_request.number
run: |
PR_INFO=$(gh pr view "$PR_NUMBER" --repo "$GH_REPO" \
--json title,author,body,baseRefName,headRefName \
--jq '"**Title:** " + .title + "\n**Author:** " + .author.login + "\n**Base→Head:** " + .baseRefName + "→" + .headRefName')
echo "PR_META<<GHAWMETA" >> "$GITHUB_OUTPUT"
echo "$PR_INFO" >> "$GITHUB_OUTPUT"
echo "GHAWMETA" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ github.token }}
PR_NUMBER: ${{ github.event.pull_request.number }}
GH_REPO: ${{ github.repository }}
Then inject ${{ steps.pr-meta.outputs.PR_META }} at the top of the "Changed Files" section in the prompt.
3. Remove verbose "Repository Context" section
Estimated savings: ~20–30K effective tokens/run (~5–8%) across 6 turns
The current prompt has a 400+ word "Repository Context" and "Architecture" description explaining how AWF containers work. The security agent only needs to know which patterns are dangerous — not the full architecture.
Replace the entire "Repository Context" block with a 3-line summary:
## Repository Context
AWF is a network firewall for AI agents. Security-critical files: `src/host-iptables.ts`, `containers/agent/setup-iptables.sh`, `src/squid-config.ts`, `src/docker-manager.ts`, `containers/agent/entrypoint.sh`, `src/domain-patterns.ts`.
This saves ~450 tokens × 6 turns = ~2,700 tokens raw, ~21,900 effective tokens per run.
4. Reduce max-turns from 6 to 4
Estimated savings: ~80–120K effective tokens/run (~25%) for runs currently hitting the cap
With recommendation #1 eliminating bash tool calls, the agent should complete in 3–4 turns:
- Turn 1: Read pre-fetched diff + fast-path check
- Turn 2: Deep analysis (if needed)
- Turn 3: Write PR comment + call noop/add_labels
engine:
id: claude
model: claude-sonnet-4-5
max-turns: 4 # was: 6
Cache Analysis (Anthropic-Specific)
Cache telemetry is not available for this workflow (token_usage_summary absent — api-proxy operates in passthrough mode without cache tracking).
However, the 8.1× effective/raw multiplier indicates Anthropic's automatic prefix caching IS active: the growing context window is cached and reused within each session. This is beneficial — the first turn's large context (PR diff + system prompt) is cached for subsequent turns.
The problem is not cache inefficiency — it's too many turns causing too many cache reads. Reducing turns (via recommendations #1 and #4) will cut cache-read roundtrips proportionally.
| Turn |
Est. Raw Tokens |
Effective Tokens (8.1× factor) |
Note |
| 1 |
~55K |
~55K |
First turn — full context |
| 2 |
~60K |
~120K |
Growing context |
| 3 |
~65K |
~260K |
Cache reads compound |
| 4 |
~70K |
~540K |
Large cache-read |
| 5 |
~75K |
~1.1M |
Dominates cost |
| 6 |
~80K |
~1.3M |
Often just writing the comment |
Reducing from 6 to 3 turns cuts effective tokens by ~75%.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/active run (raw) |
424K |
~200K |
−53% |
| Effective tokens/active run |
3.4M |
~900K |
−74% |
| Cost/active run |
$0.45 |
~$0.12 |
−73% |
| LLM turns/active run |
6.2 |
~3 |
−3 turns |
| Total weekly cost (Security Guard) |
$5.36 |
~$1.45 |
−$3.91/wk |
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · sonnet46 1.7M · ◷
Target Workflow:
security-guardSource report: #4304
Estimated cost per active run: $0.45
Total tokens per active run: ~424K raw / ~3.4M effective
Cache read rate: N/A (api-proxy passthrough — no cache telemetry)
Cache write rate: N/A
LLM turns: 6.2 avg (instrumented runs); max-turns cap is 6, many runs exceed via +1 final turn
Model: claude-sonnet-4-5
Current Configuration
github(pull_requests + repos toolsets) — ~10 toolsmcp__github__get_pull_request_diff,mcp__github__list_pull_request_files,mcp__github__get_pull_requestgh pr diff,gh pr view,gh api repos/...— all avoidablegithubonly ✅Effective token multiplier: 8.1× — context is 424K raw tokens repeated across 6+ turns = 3.4M effective tokens billed.
Key Problem: Agent Ignores the Pre-Fetched Diff
The
steps:pre-agent block fetches the full PR diff and injects it asPR_FILESinto the prompt. Despite this, the tool_usage log shows the agent making 7 redundantgh pr diffbash calls (across 4 runs) and 13gh api repos/...calls (across 10 runs), plus 4gh pr viewcalls.This adds 1–2 extra turns, each of which re-sends the entire accumulated context (~50–70K tokens). Across 12 instrumented runs at $0.45/run, this alone costs an estimated $2.00–$2.50 in avoidable turns per week.
Recommendations
1. Enforce fast-path: block ALL bash diff/API calls in the prompt
Estimated savings: ~100–150K effective tokens/run (~35–45% cost reduction)
The current prompt says "Do NOT call
gh pr diff,git diff, orgh api .../files" but allows "direct file reads from the checked-out repository" and doesn't explicitly blockgh pr view. The agent is treating these as loopholes.Change in
.github/workflows/security-guard.md— replace the current instruction block:Replace with a single hard rule:
This eliminates the 2–3 extra bash-tool turns per active run.
2. Pre-fetch PR metadata in
steps:to eliminategh pr viewcallsEstimated savings: ~35–50K effective tokens/run (~10–15%)
The agent makes
gh pr viewcalls to get PR title, description, and author. Pre-fetch this in thesteps:block and inject it alongside the diff.Add to the
pr-diffstep or as a new step:Then inject
${{ steps.pr-meta.outputs.PR_META }}at the top of the "Changed Files" section in the prompt.3. Remove verbose "Repository Context" section
Estimated savings: ~20–30K effective tokens/run (~5–8%) across 6 turns
The current prompt has a 400+ word "Repository Context" and "Architecture" description explaining how AWF containers work. The security agent only needs to know which patterns are dangerous — not the full architecture.
Replace the entire "Repository Context" block with a 3-line summary:
This saves ~450 tokens × 6 turns = ~2,700 tokens raw, ~21,900 effective tokens per run.
4. Reduce max-turns from 6 to 4
Estimated savings: ~80–120K effective tokens/run (~25%) for runs currently hitting the cap
With recommendation #1 eliminating bash tool calls, the agent should complete in 3–4 turns:
Cache Analysis (Anthropic-Specific)
Cache telemetry is not available for this workflow (
token_usage_summaryabsent — api-proxy operates in passthrough mode without cache tracking).However, the 8.1× effective/raw multiplier indicates Anthropic's automatic prefix caching IS active: the growing context window is cached and reused within each session. This is beneficial — the first turn's large context (PR diff + system prompt) is cached for subsequent turns.
The problem is not cache inefficiency — it's too many turns causing too many cache reads. Reducing turns (via recommendations #1 and #4) will cut cache-read roundtrips proportionally.
Reducing from 6 to 3 turns cuts effective tokens by ~75%.
Expected Impact
Implementation Checklist
.github/workflows/security-guard.md: replace multi-line "do NOT call" rules with single hard rule (Rec Improve links in readme to AW project #1).github/workflows/security-guard.md: addpr-metapre-agent step and inject into prompt (Rec Secret proxying #2).github/workflows/security-guard.md: condense "Repository Context" section to 3 lines (Rec feat: add integration test for rostbuness #3).github/workflows/security-guard.md: setmax-turns: 4(Rec fix: add missing Docker image pulls to robustness test workflow #4)gh aw compile .github/workflows/security-guard.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts