You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Source (redacted) Daily Copilot Token Usage Analyzer run (2026-05-15) Estimated cost per run: N/A (Copilot billing) Total tokens per run: ~1,177,011 Effective tokens per run: ~11,719,789 Cache hit rate (cache reads / total input): ~89.7% (1,049,528 cache read / 1,170,266 input) Cache write tokens: 0 (no new prefix cache established this run) LLM requests: 28 LLM turns (conversation): 1 Model: claude-sonnet-4.6 Last run status: ❌ Failure Duration: 6.7 minutes
Current Configuration
Setting
Value
Tools loaded
2 (edit, bash)
Network groups
none configured
Pre-agent steps
✅ Yes (git-changes, doc-files)
Prompt body size
~4,000 chars (~1,000 tokens)
Timeout
15 minutes
Model
claude-sonnet-4.6 (frontier)
The tool surface is already minimal. The large token count (~1.17M) comes from 28 sequential bash round-trips, each carrying a growing conversation history as the agent individually reads source files, runs git show <sha> per commit, and reads documentation files.
Recommendations
1. Downgrade model to claude-haiku-4-5 or gpt-4.1-mini
The system's own agentic assessment flagged this run as model_downgrade_available:
"This Repo Maintenance run may not need a frontier model. A smaller model (e.g. gpt-4.1-mini, claude-haiku-4-5) could handle the task at lower cost."
Documentation review — comparing markdown files against git diffs and editing prose — is well within smaller model capability. Add to frontmatter in .github/workflows/doc-maintainer.md:
engine:
model: claude-haiku-4-5
or alternatively:
engine:
model: gpt-4.1-mini
2. Pre-compute git diffs in the steps block (reduce 28 LLM requests)
Currently the agent gets a bare commit list (git log --name-only) and then runs git show <sha> individually for each relevant commit. Each additional bash call adds the full conversation history to the next request. With 28 total requests, the history accumulation is the primary token driver.
Replace the git-changes step to pre-compute full diffs for source-touching commits in a single shell pass:
steps:
- name: Gather recent git diffsid: git-changesrun: | echo "RECENT_DIFFS<<EOF" >> $GITHUB_OUTPUT # Get full diffs for src/, containers/, scripts/ changes in one pass git log --since="7 days ago" --format="%H %s" -- src/ containers/ scripts/ | \ while read sha title; do echo "=== Commit $sha: $title ===" git show --stat --unified=3 "$sha" -- src/ containers/ scripts/ docs/ "*.md" 2>/dev/null | head -150 done | head -500 echo "EOF" >> $GITHUB_OUTPUT
- name: List documentation filesid: doc-filesrun: | echo "DOC_FILES<<EOF" >> $GITHUB_OUTPUT find docs/ -name "*.md" 2>/dev/null | sort find . -maxdepth 1 -name "*.md" | sort echo "EOF" >> $GITHUB_OUTPUT
- name: Identify affected docsid: affected-docsrun: | echo "AFFECTED_DOCS<<EOF" >> $GITHUB_OUTPUT # Find which doc files may need updates based on changed source files git log --since="7 days ago" --name-only --format="" -- src/ containers/ scripts/ | \ grep -v '^$' | sort -u | head -30 echo "EOF" >> $GITHUB_OUTPUT
Then update the prompt to reference ${{ steps.git-changes.outputs.RECENT_DIFFS }} and instruct the agent to work from the pre-computed diffs rather than running git show individually. This alone could eliminate 10–15 bash round-trips.
3. Fix zero cache-write tokens (improve future cache reuse)
Estimated savings: reduces effective token cost by up to 50% on repeat runs
This run had 0 cache_write_tokens, meaning no new cache prefix was established. This is significant: the 89.7% cache-read rate came from prior runs' cache, but if this pattern continues (0 writes), future cache hits will expire. The likely cause is that dynamic git data (RECENT_COMMITS) is injected early in the system prompt before stable static content, preventing the model from establishing a stable cache prefix.
Fix: Structure the prompt so static instructions appear first, dynamic data last:
# Documentation Maintainer[All static instructions and guidelines — keep at top for caching]---## Context for This Run### Recent Changes (Pre-computed)
${{ steps.git-changes.outputs.RECENT_DIFFS }}
### Documentation Files
${{ steps.doc-files.outputs.DOC_FILES }}
This ensures the large static block (~1,000 tokens) is always a stable prefix, allowing the model to cache it across daily runs.
4. Scope-limit documentation review with pre-computed affected files
Currently the agent reads all documentation files to decide which ones are affected. The affected-docs pre-step above can identify candidate docs automatically, and the prompt can instruct the agent to start with those files rather than reviewing all docs:
## Prioritized Documentation Files
The following files are likely affected based on changed source paths:
${{ steps.affected-docs.outputs.AFFECTED_DOCS }}
Review these first. Only review other docs if a clear connection to the changes is evident.
5. Add explicit exit condition to reduce wasted requests when no changes are relevant
Estimated savings: up to 1,177,011 tokens on no-op runs (eliminates full run)
The workflow has a skip-if-match guard for open PRs, but no guard for "no relevant source code changed in 7 days." Add a pre-step that checks and skips:
Compare token usage on new run vs this baseline (~1.17M tokens/run)
Additional Notes
The last run failed (error_count=1). Investigate the failure cause independently — if it's a tool or code issue it may inflate future token counts too.
The agentic_fraction = 1 confirms the agent spent the entire session in agentic (tool-calling) mode — reducing round-trips (rec Secret proxying #2) is the highest-leverage structural change.
Target Workflow:
Documentation Maintainer**Source (redacted) Daily Copilot Token Usage Analyzer run (2026-05-15)
Estimated cost per run: N/A (Copilot billing)
Total tokens per run: ~1,177,011
Effective tokens per run: ~11,719,789
Cache hit rate (cache reads / total input): ~89.7% (1,049,528 cache read / 1,170,266 input)
Cache write tokens: 0 (no new prefix cache established this run)
LLM requests: 28
LLM turns (conversation): 1
Model: claude-sonnet-4.6
Last run status: ❌ Failure
Duration: 6.7 minutes
Current Configuration
edit,bash)git-changes,doc-files)claude-sonnet-4.6(frontier)The tool surface is already minimal. The large token count (~1.17M) comes from 28 sequential bash round-trips, each carrying a growing conversation history as the agent individually reads source files, runs
git show <sha>per commit, and reads documentation files.Recommendations
1. Downgrade model to
claude-haiku-4-5orgpt-4.1-miniEstimated savings: ~60–80% cost reduction; ~20–30% throughput improvement
The system's own agentic assessment flagged this run as
model_downgrade_available:Documentation review — comparing markdown files against git diffs and editing prose — is well within smaller model capability. Add to frontmatter in
.github/workflows/doc-maintainer.md:or alternatively:
2. Pre-compute git diffs in the steps block (reduce 28 LLM requests)
Estimated savings: ~300,000–500,000 tokens/run (~25–40% reduction)
Currently the agent gets a bare commit list (
git log --name-only) and then runsgit show <sha>individually for each relevant commit. Each additional bash call adds the full conversation history to the next request. With 28 total requests, the history accumulation is the primary token driver.Replace the
git-changesstep to pre-compute full diffs for source-touching commits in a single shell pass:Then update the prompt to reference
${{ steps.git-changes.outputs.RECENT_DIFFS }}and instruct the agent to work from the pre-computed diffs rather than runninggit showindividually. This alone could eliminate 10–15 bash round-trips.3. Fix zero cache-write tokens (improve future cache reuse)
Estimated savings: reduces effective token cost by up to 50% on repeat runs
This run had 0 cache_write_tokens, meaning no new cache prefix was established. This is significant: the 89.7% cache-read rate came from prior runs' cache, but if this pattern continues (0 writes), future cache hits will expire. The likely cause is that dynamic git data (
RECENT_COMMITS) is injected early in the system prompt before stable static content, preventing the model from establishing a stable cache prefix.Fix: Structure the prompt so static instructions appear first, dynamic data last:
This ensures the large static block (~1,000 tokens) is always a stable prefix, allowing the model to cache it across daily runs.
4. Scope-limit documentation review with pre-computed affected files
Estimated savings: ~100,000–200,000 tokens/run (~8–15% reduction)
Currently the agent reads all documentation files to decide which ones are affected. The
affected-docspre-step above can identify candidate docs automatically, and the prompt can instruct the agent to start with those files rather than reviewing all docs:## Prioritized Documentation Files The following files are likely affected based on changed source paths:${{ steps.affected-docs.outputs.AFFECTED_DOCS }}
5. Add explicit exit condition to reduce wasted requests when no changes are relevant
Estimated savings: up to 1,177,011 tokens on no-op runs (eliminates full run)
The workflow has a
skip-if-matchguard for open PRs, but no guard for "no relevant source code changed in 7 days." Add a pre-step that checks and skips:Then in the prompt, use a condition or instruct the agent to exit immediately if
${{ steps.has-changes.outputs.has_changes }}isfalse.Expected Impact
Implementation Checklist
engine: { model: claude-haiku-4-5 }to frontmattergit-changesstep with full-diff pre-computationaffected-docsstep to scope document reviewhas-changesguard step for no-op run skippinggh aw compile .github/workflows/doc-maintainer.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts(if applicable)Additional Notes
agentic_fraction = 1confirms the agent spent the entire session in agentic (tool-calling) mode — reducing round-trips (rec Secret proxying #2) is the highest-leverage structural change.