Skip to content

⚡ Copilot Token Optimization2026-06-06 — Smoke Copilot BYOK #4429

Description

@github-actions

Target Workflow: smoke-copilot-byok

Source report: #4427
Estimated cost per run: ~$8.78 (estimated from token counts × Anthropic pricing)
Total tokens per run: ~398K (avg over 5 runs)
Effective tokens per run: ~12.3M avg (30.8× cost multiplier)
Cache hit rate: ~0% (unique prompts every run — see Root Cause below)
LLM turns: 1

Current Configuration

Setting Value
Model claude-opus-4.8 (explicitly overridden in env:)
Tools loaded bash: ["*"], github: toolsets: [pull_requests]
Network groups defaults, github
Pre-agent steps ✅ Yes (pre-computes PR data, HTTP check, file I/O)
Prompt size ~7.2 KB (173 lines)
Dynamic content in prompt ✅ 4 template substitutions that change every run

Root Cause Analysis

Problem 1 — Wrong Model for Task Complexity

The smoke test performs four trivial verifications:

  1. Run cat on a pre-created file to confirm it exists
  2. Call github-list_pull_requests and confirm data returns
  3. Check SMOKE_HTTP_CODE (always 200) is OK
  4. Confirm BYOK inference is working (trivially proven by the agent responding at all)

claude-opus-4.8 costs $15/M input and $75/M output — the most expensive tier. For a task requiring only basic instruction-following and one cat command, this is extreme overkill. The sibling Smoke Copilot workflow achieves the same validation pattern with the default claude-sonnet-4.6, producing only a ~10× effective-token multiplier vs BYOK's 30.8×.

Problem 2 — Cache-Busting File Path

The pre-step creates a unique file path per run:

TEST_FILE="$TEST_DIR/smoke-test-copilot-byok-${GITHUB_RUN_ID}.txt"

This path is injected into the prompt body via ${{ steps.smoke-data.outputs.SMOKE_FILE_PATH }}. Since GITHUB_RUN_ID changes every run, the entire prompt is unique — prefix caching never activates, so every token is billed at full price.

The combination of an expensive model + zero caching explains the 30.8× multiplier (vs ~2× we'd expect for Haiku with good caching).

Recommendations

1. Switch Model from claude-opus-4.8 to claude-haiku-3.5

Estimated savings: ~11.6M effective tokens/run (~94.7%)

Change in .github/workflows/smoke-copilot-byok.md:

 env:
-  COPILOT_MODEL: claude-opus-4.8
+  COPILOT_MODEL: claude-haiku-3.5

Pricing comparison:

Model Input Output Est. cost/run (no cache)
claude-opus-4.8 (current) $15/M $75/M ~$8.78
claude-haiku-3.5 (proposed) $0.80/M $4/M ~$0.47

Effective tokens: ~12.3M/run → ~655K/run

The agent needs zero advanced reasoning: it confirms pre-computed results and writes a short 5–10 line comment. Haiku is purpose-built for exactly this level of task.

Note: the sibling smoke-copilot.md does NOT override COPILOT_MODEL and runs on the repo-level default (claude-sonnet-4.6). The BYOK override to Opus appears unnecessary for smoke validation — it just needs any model that can follow instructions and call tools.

If validating an Opus-specific inference path is genuinely required, consider a separate targeted test that runs less frequently (weekly instead of every 12h + every PR).

2. Fix Cache-Busting File Path

Estimated savings: additional ~20–30% after recommendation #1

Replace the run-unique file name with a deterministic one in the pre-step:

-TEST_FILE="$TEST_DIR/smoke-test-copilot-byok-${GITHUB_RUN_ID}.txt"
+TEST_FILE="$TEST_DIR/smoke-test-copilot-byok.txt"

The file is still freshly written each run (no stale-data risk), but the path is now constant. This allows prefix caching to activate on the stable instruction portion of the prompt.

3. Move Dynamic Data to End of Prompt

Estimated savings: additional ~20% — enables prefix caching of the stable ~80% prefix

The current prompt inlines dynamic substitutions in the middle of the document, preventing prefix caching. Restructure so all ${{ steps.smoke-data.outputs.* }} references appear only in a final section:

-### 2. GitHub.com Connectivity
-Pre-step result: HTTP ${{ steps.smoke-data.outputs.SMOKE_HTTP_CODE }} from github.com.
-✅ if HTTP 200 or 301, ❌ otherwise.
-
-### 3. File Write/Read Test
-Pre-step wrote and read back: "${{ steps.smoke-data.outputs.SMOKE_FILE_CONTENT }}"
-File path: ${{ steps.smoke-data.outputs.SMOKE_FILE_PATH }}
-Verify by running `cat` on the file path using bash to confirm it exists.
+### 2. GitHub.com Connectivity
+Check the HTTP code in **Pre-Fetched Data** below. ✅ if HTTP 200 or 301, ❌ otherwise.
+
+### 3. File Write/Read Test
+Run `cat` on the file path from **Pre-Fetched Data** below to confirm it exists.
+
+...
 
-## Pre-Fetched PR Data
-
-```
-${{ steps.smoke-data.outputs.SMOKE_PR_DATA }}
-```
+## Pre-Fetched Data
+<!-- Dynamic section — keep all template substitutions here, at the end, to enable prefix caching above -->
+
+- HTTP code: `${{ steps.smoke-data.outputs.SMOKE_HTTP_CODE }}`
+- File path: `${{ steps.smoke-data.outputs.SMOKE_FILE_PATH }}`
+- File content: `${{ steps.smoke-data.outputs.SMOKE_FILE_CONTENT }}`
+- PR data:
+```
+${{ steps.smoke-data.outputs.SMOKE_PR_DATA }}
+```

With the stable instruction prefix fixed, prefix caching should activate on the majority of each prompt.

Expected Impact

Metric Current After #1 (model only) After #1+#2+#3 (full)
Effective tokens/run ~12.3M ~655K ~400K
Est. cost/run ~$8.78 ~$0.47 ~$0.30
Effective token multiplier 30.8× ~1.6× ~1.0×
Reduction -94.7% -96.7%

Projected across 5 runs/12h period (as observed in report):

Current Projected
Total effective tokens 61.3M ~2.0M
Savings -96.7%

Implementation Checklist

  • In .github/workflows/smoke-copilot-byok.md: change COPILOT_MODEL: claude-opus-4.8COPILOT_MODEL: claude-haiku-3.5
  • In pre-step run: block: change smoke-test-copilot-byok-${GITHUB_RUN_ID}.txtsmoke-test-copilot-byok.txt
  • Restructure prompt: move all ${{ steps.smoke-data.outputs.* }} substitutions to a single Pre-Fetched Data section at the very end of the prompt body
  • Recompile: gh aw compile .github/workflows/smoke-copilot-byok.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Verify CI passes (smoke tests should still pass — task complexity unchanged)
  • Compare effective tokens on next run vs baseline (~12.3M avg); target <1M effective

Generated by Daily Copilot Token Optimization Advisor · sonnet46 1.7M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions