Skip to content

[awf] api-proxy: correctly capture Responses API cache reads in token usage rollups#4311

Merged
lpcox merged 3 commits into
mainfrom
copilot/awf-fix-cache-read-tokens
Jun 4, 2026
Merged

[awf] api-proxy: correctly capture Responses API cache reads in token usage rollups#4311
lpcox merged 3 commits into
mainfrom
copilot/awf-fix-cache-read-tokens

Conversation

Copilot AI commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

token-usage.jsonl was recording cache_read_tokens: 0 for Responses API traffic even when upstream usage metadata reported cache hits, which then propagated incorrect cache metrics into downstream usage summaries. This PR aligns cache-read extraction with Responses API payload shapes and adds a mismatch warning when per-event cache reads disappear in the rollup.

  • Parser updates: Responses API cache-read extraction

    • Added shared cache-read extraction logic in containers/api-proxy/token-parsers.js.
    • Supports all observed shapes:
      • usage.cache_read_input_tokens
      • usage.prompt_tokens_details.cached_tokens
      • token-detail entries like { token_type: "cache_read", token_count: N } (including nested details arrays).
    • Applied across both non-streaming JSON and streaming/event parsing paths.
  • Rollup consistency guard

    • In HTTP and WebSocket trackers, track the highest cache-read value observed in per-event usage.
    • Emit token_cache_read_rollup_mismatch when observed cache reads are non-zero but normalized rollup resolves to zero.
    • This gives explicit signal when upstream event-level cache data is present but final aggregate is inconsistent.
  • Targeted coverage for regression shape

    • Added parser tests for Responses API payloads with prompt_tokens_details.cached_tokens: 32128.
    • Added parser tests for token_type: "cache_read" token-detail entries.
    • Added tracker test asserting mismatch warning emission when event-level cache reads are overwritten to zero in final rollup.
// New extraction path now handled
usage: {
  input_tokens: 40000,
  output_tokens: 64,
  prompt_tokens_details: { cached_tokens: 32128 }
}
// => normalized.cache_read_tokens = 32128

Copilot AI changed the title [WIP] Fix cache_read_tokens reporting in API proxy [awf] api-proxy: correctly capture Responses API cache reads in token usage rollups Jun 4, 2026
Copilot finished work on behalf of lpcox June 4, 2026 14:42
Copilot AI requested a review from lpcox June 4, 2026 14:42
@lpcox lpcox marked this pull request as ready for review June 4, 2026 15:39
Copilot AI review requested due to automatic review settings June 4, 2026 15:39
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 96.53% 96.57% 📈 +0.04%
Statements 96.41% 96.45% 📈 +0.04%
Functions 98.69% 98.69% ➡️ +0.00%
Branches 90.53% 90.56% 📈 +0.03%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 89.3% → 90.9% (+1.65%) 89.3% → 90.9% (+1.65%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes undercounted cache-read token metrics for OpenAI Responses API traffic in the api-proxy by improving cache-read extraction across multiple usage payload shapes, and adds a guard/warning when event-level cache-read usage is lost during rollup.

Changes:

  • Added shared cache-read extraction logic to parse prompt_tokens_details.cached_tokens and token_type: "cache_read" detail entries.
  • Updated HTTP and WebSocket token trackers to record the maximum per-event cache-read tokens and warn on rollup-to-zero mismatches.
  • Added targeted regression tests for Responses API cache-read payload shapes and the HTTP rollup mismatch warning.
Show a summary per file
File Description
containers/api-proxy/token-parsers.js Adds extractCacheReadTokens() and wires it into JSON + SSE parsing paths.
containers/api-proxy/token-tracker-http.js Tracks max observed per-event cache reads and emits a rollup mismatch warning when needed.
containers/api-proxy/token-tracker-ws.js Adds the same observed-cache-read tracking and mismatch warning for WebSocket streaming.
containers/api-proxy/token-tracker.parsing.test.js Adds parser regression tests for Responses API cached token shapes and token detail entries.
containers/api-proxy/token-tracker.http.test.js Adds an integration test asserting mismatch warning emission when rollup overwrites cache reads to zero.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 3

Comment on lines +92 to +102
let total = 0;
let found = false;
for (const entry of entries) {
if (!entry || typeof entry !== 'object') continue;
if (entry.token_type !== 'cache_read') continue;
const count = entry.token_count;
if (typeof count === 'number') {
total += count;
found = true;
}
}
Comment on lines +224 to +238
setTimeout(() => {
const lines = writeSpy.mock.calls
.map((call) => call[0])
.filter((line) => typeof line === 'string' && line.includes('test-cache-rollup-mismatch'))
.map((line) => {
try { return JSON.parse(line); } catch { return null; }
})
.filter(Boolean);

expect(lines.some((line) => line.event === 'token_cache_read_rollup_mismatch'
&& line.observed_cache_read_tokens === 99
&& line.rolled_up_cache_read_tokens === 0)).toBe(true);
writeSpy.mockRestore();
done();
}, 10);
Comment on lines +203 to +212
if (observedCacheReadTokens > 0 && normalized.cache_read_tokens === 0) {
logRequest('warn', 'token_cache_read_rollup_mismatch', {
request_id: requestId,
provider,
model: streamingModel || 'unknown',
observed_cache_read_tokens: observedCacheReadTokens,
rolled_up_cache_read_tokens: normalized.cache_read_tokens,
streaming: true,
transport: 'websocket',
});
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions github-actions Bot mentioned this pull request Jun 4, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@lpcox

lpcox commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

@copilot address review feedback

Copilot AI commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@copilot address review feedback

Addressed all three review points in the latest commit:

  1. extractCacheReadTokens() nested details — the inner loop now also iterates over entry.details arrays, so { token_type: "cache_read" } entries nested one level deeper are correctly summed.

  2. HTTP test setTimeout spy cleanup — wrapped the assertion block in try/catch/finally so mockRestore() and done(err) always run even if an assertion throws.

  3. WebSocket mismatch test — added warns when per-frame cache-read is later overwritten to zero in WebSocket rollup to token-tracker.websocket.test.js, covering the token_cache_read_rollup_mismatch emission with transport: 'websocket'.

Copilot finished work on behalf of lpcox June 4, 2026 15:58
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Smoke Test: Claude Engine

  • ✅ GitHub API: 2 PR entries found
  • ✅ GitHub check: playwright_check=PASS
  • ✅ File verify: smoke-test-claude-26963433955.txt exists

Result: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) — Run #26963433956

Test Result
GitHub MCP (list PRs) Mark validateAwfFileConfig as test-only internal export
GitHub.com connectivity ✅ HTTP 200/301
File write/read /tmp/gh-aw/agent/smoke-test-copilot-byok-26963433956.txt
BYOK inference (api-proxy → api.githubcopilot.com)

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY) via api-proxy → api.githubcopilot.com
Author: @Copilot — Assignees: @lpcox, @Copilot
Overall: PASS

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

Test Result
GitHub MCP connectivity ✅ PR listed successfully
GitHub.com HTTP ⚠️ Pre-step data not injected (template vars unexpanded)
File write/read ⚠️ Pre-step data not injected (template vars unexpanded)

PR: [awf] api-proxy: correctly capture Responses API cache reads in token usage rollups
Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PARTIAL — MCP ✅, pre-step smoke data unavailable (workflow template vars not expanded)

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor
  • Mark \validateAwfFileConfig` as test-only internal export` ✅
  • Narrow \resetAgentExternallyKilled` to internal test helper surface` ✅
  • Browser title check: ✅
  • File write/read check: ✅
  • Build check: ✅
  • Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Notes
1. Module Loading ✅ Pass otel.js loads successfully; exports: startRequestSpan, setTokenAttributes, endSpan, endSpanError, shutdown, isEnabled
2. Test Suite ✅ Pass 33/33 tests passed — spans, token attrs, parent context, OTLP export, file exporter, graceful degradation all covered
3. Env Var Forwarding ✅ Pass src/services/api-proxy-service-config.ts forwards OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, and OTEL_SERVICE_NAME to the api-proxy container
4. Token Tracker Integration ✅ Pass onUsage callback exists in token-tracker-http.js (line 62/268) as the OTEL hook point
5. OTEL Diagnostics ✅ Pass No OTLP endpoint configured in this run → file fallback (otel.jsonl); no unexpected errors

All 5 scenarios pass. OTEL tracing integration is functioning correctly.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Chroot Smoke Test Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.16.0 v22.22.3
Go go1.22.12 go1.22.12

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #4311 · sonnet46 931.5K ·

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Smoke Test Results

Check Result
Redis PING ❌ Timeout (no response)
PostgreSQL pg_isready ❌ No response
PostgreSQL SELECT 1 ❌ Not attempted (pg_isready failed)

Overall: FAILhost.docker.internal is not reachable from this environment. Service containers appear unavailable.

🔌 Service connectivity validated by Smoke Services

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Smoke Test Results: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@lpcox lpcox merged commit 9eb9aa4 into main Jun 4, 2026
67 of 68 checks passed
@lpcox lpcox deleted the copilot/awf-fix-cache-read-tokens branch June 4, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[awf] api-proxy: cache_read_tokens reported as 0 despite cache hits in model usage metadata

3 participants