[awf] api-proxy: correctly capture Responses API cache reads in token usage rollups by Copilot · Pull Request #4311 · github/gh-aw-firewall

Copilot · 2026-06-04T14:34:54Z

token-usage.jsonl was recording cache_read_tokens: 0 for Responses API traffic even when upstream usage metadata reported cache hits, which then propagated incorrect cache metrics into downstream usage summaries. This PR aligns cache-read extraction with Responses API payload shapes and adds a mismatch warning when per-event cache reads disappear in the rollup.

Parser updates: Responses API cache-read extraction
- Added shared cache-read extraction logic in containers/api-proxy/token-parsers.js.
- Supports all observed shapes:
  - usage.cache_read_input_tokens
  - usage.prompt_tokens_details.cached_tokens
  - token-detail entries like { token_type: "cache_read", token_count: N } (including nested details arrays).
- Applied across both non-streaming JSON and streaming/event parsing paths.
Rollup consistency guard
- In HTTP and WebSocket trackers, track the highest cache-read value observed in per-event usage.
- Emit token_cache_read_rollup_mismatch when observed cache reads are non-zero but normalized rollup resolves to zero.
- This gives explicit signal when upstream event-level cache data is present but final aggregate is inconsistent.
Targeted coverage for regression shape
- Added parser tests for Responses API payloads with prompt_tokens_details.cached_tokens: 32128.
- Added parser tests for token_type: "cache_read" token-detail entries.
- Added tracker test asserting mismatch warning emission when event-level cache reads are overwritten to zero in final rollup.

// New extraction path now handled
usage: {
  input_tokens: 40000,
  output_tokens: 64,
  prompt_tokens_details: { cached_tokens: 32128 }
}
// => normalized.cache_read_tokens = 32128

github-actions · 2026-06-04T15:41:09Z

✅ Coverage Check Passed

Overall Coverage

Metric	Base	PR	Delta
Lines	96.53%	96.57%	📈 +0.04%
Statements	96.41%	96.45%	📈 +0.04%
Functions	98.69%	98.69%	➡️ +0.00%
Branches	90.53%	90.56%	📈 +0.03%

📁 Per-file Coverage Changes (1 files)

File	Lines (Before → After)	Statements (Before → After)
`src/config-writer.ts`	89.3% → 90.9% (+1.65%)	89.3% → 90.9% (+1.65%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot

Pull request overview

This PR fixes undercounted cache-read token metrics for OpenAI Responses API traffic in the api-proxy by improving cache-read extraction across multiple usage payload shapes, and adds a guard/warning when event-level cache-read usage is lost during rollup.

Changes:

Added shared cache-read extraction logic to parse prompt_tokens_details.cached_tokens and token_type: "cache_read" detail entries.
Updated HTTP and WebSocket token trackers to record the maximum per-event cache-read tokens and warn on rollup-to-zero mismatches.
Added targeted regression tests for Responses API cache-read payload shapes and the HTTP rollup mismatch warning.

Show a summary per file

File	Description
containers/api-proxy/token-parsers.js	Adds `extractCacheReadTokens()` and wires it into JSON + SSE parsing paths.
containers/api-proxy/token-tracker-http.js	Tracks max observed per-event cache reads and emits a rollup mismatch warning when needed.
containers/api-proxy/token-tracker-ws.js	Adds the same observed-cache-read tracking and mismatch warning for WebSocket streaming.
containers/api-proxy/token-tracker.parsing.test.js	Adds parser regression tests for Responses API cached token shapes and token detail entries.
containers/api-proxy/token-tracker.http.test.js	Adds an integration test asserting mismatch warning emission when rollup overwrites cache reads to zero.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 5/5 changed files
Comments generated: 3

+    let total = 0;
+    let found = false;
+    for (const entry of entries) {
+      if (!entry || typeof entry !== 'object') continue;
+      if (entry.token_type !== 'cache_read') continue;
+      const count = entry.token_count;
+      if (typeof count === 'number') {
+        total += count;
+        found = true;
+      }
+    }


+    setTimeout(() => {
+      const lines = writeSpy.mock.calls
+        .map((call) => call[0])
+        .filter((line) => typeof line === 'string' && line.includes('test-cache-rollup-mismatch'))
+        .map((line) => {
+          try { return JSON.parse(line); } catch { return null; }
+        })
+        .filter(Boolean);
+
+      expect(lines.some((line) => line.event === 'token_cache_read_rollup_mismatch'
+        && line.observed_cache_read_tokens === 99
+        && line.rolled_up_cache_read_tokens === 0)).toBe(true);
+      writeSpy.mockRestore();
+      done();
+    }, 10);


+    if (observedCacheReadTokens > 0 && normalized.cache_read_tokens === 0) {
+      logRequest('warn', 'token_cache_read_rollup_mismatch', {
+        request_id: requestId,
+        provider,
+        model: streamingModel || 'unknown',
+        observed_cache_read_tokens: observedCacheReadTokens,
+        rolled_up_cache_read_tokens: normalized.cache_read_tokens,
+        streaming: true,
+        transport: 'websocket',
+      });


lpcox · 2026-06-04T15:51:50Z

@copilot address review feedback

…est spy cleanup, WS mismatch test

Copilot · 2026-06-04T15:57:46Z

@copilot address review feedback

Addressed all three review points in the latest commit:

extractCacheReadTokens() nested details — the inner loop now also iterates over entry.details arrays, so { token_type: "cache_read" } entries nested one level deeper are correctly summed.
HTTP test setTimeout spy cleanup — wrapped the assertion block in try/catch/finally so mockRestore() and done(err) always run even if an assertion throws.
WebSocket mismatch test — added warns when per-frame cache-read is later overwritten to zero in WebSocket rollup to token-tracker.websocket.test.js, covering the token_cache_read_rollup_mismatch emission with transport: 'websocket'.

github-actions · 2026-06-04T16:12:30Z

Smoke Test: Claude Engine

✅ GitHub API: 2 PR entries found
✅ GitHub check: playwright_check=PASS
✅ File verify: smoke-test-claude-26963433955.txt exists

Result: PASS

💥 [THE END] — Illustrated by Smoke Claude

github-actions · 2026-06-04T16:13:17Z

Smoke Test: Copilot BYOK (Direct) — Run #26963433956

Test	Result
GitHub MCP (list PRs)	✅ `Mark validateAwfFileConfig as test-only internal export`
GitHub.com connectivity	✅ HTTP 200/301
File write/read	✅ `/tmp/gh-aw/agent/smoke-test-copilot-byok-26963433956.txt`
BYOK inference (api-proxy → api.githubcopilot.com)	✅

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY) via api-proxy → api.githubcopilot.com
Author: @Copilot — Assignees: @lpcox, @Copilot
Overall: PASS

🔑 BYOK report filed by Smoke Copilot BYOK

github-actions · 2026-06-04T16:13:23Z

🔬 Smoke Test Results

Test	Result
GitHub MCP connectivity	✅ PR listed successfully
GitHub.com HTTP	⚠️ Pre-step data not injected (template vars unexpanded)
File write/read	⚠️ Pre-step data not injected (template vars unexpanded)

PR: [awf] api-proxy: correctly capture Responses API cache reads in token usage rollups
Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PARTIAL — MCP ✅, pre-step smoke data unavailable (workflow template vars not expanded)

📰 BREAKING: Report filed by Smoke Copilot

github-actions · 2026-06-04T16:14:01Z

Mark \validateAwfFileConfig` as test-only internal export` ✅
Narrow \resetAgentExternallyKilled` to internal test helper surface` ✅
Browser title check: ✅
File write/read check: ✅
Build check: ✅
Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

github-actions · 2026-06-04T16:14:13Z

Smoke Test: API Proxy OpenTelemetry Tracing

Scenario	Result	Notes
1. Module Loading	✅ Pass	`otel.js` loads successfully; exports: `startRequestSpan`, `setTokenAttributes`, `endSpan`, `endSpanError`, `shutdown`, `isEnabled`
2. Test Suite	✅ Pass	33/33 tests passed — spans, token attrs, parent context, OTLP export, file exporter, graceful degradation all covered
3. Env Var Forwarding	✅ Pass	`src/services/api-proxy-service-config.ts` forwards `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_EXPORTER_OTLP_HEADERS`, `GITHUB_AW_OTEL_TRACE_ID`, `GITHUB_AW_OTEL_PARENT_SPAN_ID`, and `OTEL_SERVICE_NAME` to the api-proxy container
4. Token Tracker Integration	✅ Pass	`onUsage` callback exists in `token-tracker-http.js` (line 62/268) as the OTEL hook point
5. OTEL Diagnostics	✅ Pass	No OTLP endpoint configured in this run → file fallback (`otel.jsonl`); no unexpected errors

All 5 scenarios pass. OTEL tracing integration is functioning correctly.

📡 OTel tracing validated by Smoke OTel Tracing

github-actions · 2026-06-04T16:14:25Z

Chroot Smoke Test Results

Runtime	Host Version	Chroot Version	Match?
Python	Python 3.12.13	Python 3.12.3	❌
Node.js	v24.16.0	v22.22.3	❌
Go	go1.22.12	go1.22.12	✅

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

github-actions · 2026-06-04T16:15:19Z

🏗️ Build Test Suite Results

Ecosystem	Project	Build/Install	Tests	Status
Bun	elysia	✅	1/1 passed	✅ PASS
Bun	hono	✅	1/1 passed	✅ PASS
C++	fmt	✅	N/A	✅ PASS
C++	json	✅	N/A	✅ PASS
Deno	oak	N/A	1/1 passed	✅ PASS
Deno	std	N/A	1/1 passed	✅ PASS
.NET	hello-world	✅	N/A	✅ PASS
.NET	json-parse	✅	N/A	✅ PASS
Go	color	✅	1/1 passed	✅ PASS
Go	env	✅	1/1 passed	✅ PASS
Go	uuid	✅	1/1 passed	✅ PASS
Java	gson	✅	1/1 passed	✅ PASS
Java	caffeine	✅	1/1 passed	✅ PASS
Node.js	clsx	✅	All passed	✅ PASS
Node.js	execa	✅	All passed	✅ PASS
Node.js	p-limit	✅	All passed	✅ PASS
Rust	fd	✅	1/1 passed	✅ PASS
Rust	zoxide	✅	1/1 passed	✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #4311 · sonnet46 931.5K · ◷

github-actions · 2026-06-04T16:16:01Z

Smoke Test Results

Check	Result
Redis PING	❌ Timeout (no response)
PostgreSQL pg_isready	❌ No response
PostgreSQL SELECT 1	❌ Not attempted (pg_isready failed)

Overall: FAIL — host.docker.internal is not reachable from this environment. Service containers appear unavailable.

🔌 Service connectivity validated by Smoke Services

github-actions · 2026-06-04T16:16:57Z

Smoke Test Results: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

Initial plan

f53800c

Copilot AI assigned Copilot and lpcox Jun 4, 2026

Copilot started work on behalf of lpcox June 4, 2026 14:35 View session

Copilot AI linked an issue Jun 4, 2026 that may be closed by this pull request

[awf] api-proxy: cache_read_tokens reported as 0 despite cache hits in model usage metadata #4310

Closed

fix(api-proxy): parse responses cache reads and warn on rollup mismatch

d1dfce1

Copilot AI changed the title ~~[WIP] Fix cache_read_tokens reporting in API proxy~~ [awf] api-proxy: correctly capture Responses API cache reads in token usage rollups Jun 4, 2026

Copilot finished work on behalf of lpcox June 4, 2026 14:42

Copilot AI requested a review from lpcox June 4, 2026 14:42

lpcox marked this pull request as ready for review June 4, 2026 15:39

Copilot AI review requested due to automatic review settings June 4, 2026 15:39

Copilot started reviewing on behalf of lpcox June 4, 2026 15:39 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes