[awf] api-proxy: correctly capture Responses API cache reads in token usage rollups#4311
Conversation
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
There was a problem hiding this comment.
Pull request overview
This PR fixes undercounted cache-read token metrics for OpenAI Responses API traffic in the api-proxy by improving cache-read extraction across multiple usage payload shapes, and adds a guard/warning when event-level cache-read usage is lost during rollup.
Changes:
- Added shared cache-read extraction logic to parse
prompt_tokens_details.cached_tokensandtoken_type: "cache_read"detail entries. - Updated HTTP and WebSocket token trackers to record the maximum per-event cache-read tokens and warn on rollup-to-zero mismatches.
- Added targeted regression tests for Responses API cache-read payload shapes and the HTTP rollup mismatch warning.
Show a summary per file
| File | Description |
|---|---|
| containers/api-proxy/token-parsers.js | Adds extractCacheReadTokens() and wires it into JSON + SSE parsing paths. |
| containers/api-proxy/token-tracker-http.js | Tracks max observed per-event cache reads and emits a rollup mismatch warning when needed. |
| containers/api-proxy/token-tracker-ws.js | Adds the same observed-cache-read tracking and mismatch warning for WebSocket streaming. |
| containers/api-proxy/token-tracker.parsing.test.js | Adds parser regression tests for Responses API cached token shapes and token detail entries. |
| containers/api-proxy/token-tracker.http.test.js | Adds an integration test asserting mismatch warning emission when rollup overwrites cache reads to zero. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 3
| let total = 0; | ||
| let found = false; | ||
| for (const entry of entries) { | ||
| if (!entry || typeof entry !== 'object') continue; | ||
| if (entry.token_type !== 'cache_read') continue; | ||
| const count = entry.token_count; | ||
| if (typeof count === 'number') { | ||
| total += count; | ||
| found = true; | ||
| } | ||
| } |
| setTimeout(() => { | ||
| const lines = writeSpy.mock.calls | ||
| .map((call) => call[0]) | ||
| .filter((line) => typeof line === 'string' && line.includes('test-cache-rollup-mismatch')) | ||
| .map((line) => { | ||
| try { return JSON.parse(line); } catch { return null; } | ||
| }) | ||
| .filter(Boolean); | ||
|
|
||
| expect(lines.some((line) => line.event === 'token_cache_read_rollup_mismatch' | ||
| && line.observed_cache_read_tokens === 99 | ||
| && line.rolled_up_cache_read_tokens === 0)).toBe(true); | ||
| writeSpy.mockRestore(); | ||
| done(); | ||
| }, 10); |
| if (observedCacheReadTokens > 0 && normalized.cache_read_tokens === 0) { | ||
| logRequest('warn', 'token_cache_read_rollup_mismatch', { | ||
| request_id: requestId, | ||
| provider, | ||
| model: streamingModel || 'unknown', | ||
| observed_cache_read_tokens: observedCacheReadTokens, | ||
| rolled_up_cache_read_tokens: normalized.cache_read_tokens, | ||
| streaming: true, | ||
| transport: 'websocket', | ||
| }); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@copilot address review feedback |
…est spy cleanup, WS mismatch test
Addressed all three review points in the latest commit:
|
Smoke Test: Claude Engine
Result: PASS
|
|
Smoke Test: Copilot BYOK (Direct) — Run #26963433956
Running in direct BYOK mode (
|
🔬 Smoke Test Results
PR: [awf] api-proxy: correctly capture Responses API cache reads in token usage rollups Overall: PARTIAL — MCP ✅, pre-step smoke data unavailable (workflow template vars not expanded)
|
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
Smoke Test: API Proxy OpenTelemetry Tracing
All 5 scenarios pass. OTEL tracing integration is functioning correctly.
|
Chroot Smoke Test Results
Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Smoke Test Results
Overall: FAIL —
|
|
Smoke Test Results: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
token-usage.jsonlwas recordingcache_read_tokens: 0for Responses API traffic even when upstream usage metadata reported cache hits, which then propagated incorrect cache metrics into downstream usage summaries. This PR aligns cache-read extraction with Responses API payload shapes and adds a mismatch warning when per-event cache reads disappear in the rollup.Parser updates: Responses API cache-read extraction
containers/api-proxy/token-parsers.js.usage.cache_read_input_tokensusage.prompt_tokens_details.cached_tokens{ token_type: "cache_read", token_count: N }(including nesteddetailsarrays).Rollup consistency guard
token_cache_read_rollup_mismatchwhen observed cache reads are non-zero but normalized rollup resolves to zero.Targeted coverage for regression shape
prompt_tokens_details.cached_tokens: 32128.token_type: "cache_read"token-detail entries.