You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[aw] Failure Investigation — 6h window ending 2026-06-17 19:34 UTC
Executive summary
Scope: 29 failed/cancelled runs in github/gh-aw over the last 6h. 13 were cancelled (concurrency/guard, not defects); 16 were genuine failure outcomes across 6 signature clusters.
P0 (provider/config): Codex engine cannot reach its model — every Codex run 404s on gpt-5-codex-alpha-2025-11-07. Tracked per-workflow but root cause not escalated.
P1 (product bug, new/untracked): four asset-producing workflows succeed at the agent step but the upload_assets job fails because the declared PNG asset files are never staged. Filed as sub-issue below.
P1 (provider): Copilot-CLI BYOK proxy rejects claude-sonnet-4.6 with 400 model not supported; partially covered by the existing cascade rollup.
No open agentic-workflows issue qualified for closure — all 20 are <6h old and reflect still-active failure modes; none have fresh evidence of being fixed.
Dominant error: 404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07 at `(172.30.0.30/redacted)
The Codex engine resolves alias gpt-5-codex to backend id gpt-5-codex-alpha-2025-11-07, which no longer exists on the proxy. The Unknown model gpt-5-codex ... fallback model metadata path swaps metadata only — the request still targets the missing backend, so all 5 reconnect retries 404 and the turn fails with 0 tool calls.
Audit posture: read-only, turns 11→0 vs baseline (agent never started real work).
upload_assets job fails: ERR_SYSTEM: Asset file not found: /tmp/gh-aw/safeoutputs/assets/quality_score_breakdown.png while the agent job succeeded.
The agent emitted upload_asset safe-output items referencing PNGs with sha + byte size + markdown image links (e.g. quality_score_breakdown.png size=178654, historical_trends.png size=422386) that were never written to the staging dir.
Path mismatch: agent referenced .../.gh-aw-assets/(file) while the upload pipeline reads /tmp/gh-aw/safeoutputs/assets/; the safe-outputs-assets artifact was never produced (Artifact not found for name: safe-outputs-assets).
Representative §27703869930: 400 The requested model is not supported (model=claude-sonnet-4.6, isModelNotSupportedError=true), not retried; not auth, not denial-limit (permissionDeniedCount=0).
Cascade rollup [aw] Failure cascade detected #39852 already groups 10 of today's [aw] * failed issues — consistent with C1/C3 sharing provider-side root causes.
Gaps: C2 (phantom asset) has no tracking coverage → addressed by the sub-issue below. C1 root cause is filed only as per-workflow symptoms; recommend treating [aw] Daily Cache Strategy Analyzer failed #39878 as the canonical P0 and updating the Codex model alias.
Closures: none performed — every open agentic-workflows issue is <6h old and reflects an active failure mode; no fresh evidence of resolution.
Fix roadmap
P0 — Restore the Codex model route. The configured Codex model resolves to gpt-5-codex-alpha-2025-11-07, which 404s on the proxy. Point the alias at a live model (or restore the backend) and make the "unknown model" fallback re-target the request, not just metadata. Canonical tracker: [aw] Daily Cache Strategy Analyzer failed #39878.
P1 — Fix asset staging (sub-issue below). Ensure declared upload_asset files are staged to /tmp/gh-aw/safeoutputs/assets/ before the upload_assets job, and validate file existence at safe-output emission time so the agent cannot declare phantom assets.
P1 — Copilot BYOK model support. Resolve claude-sonnet-4.6 rejection (400 not supported) on the BYOK proxy; covered by cascade [aw] Failure cascade detected #39852.
Scope: 18 failed/cancelled runs. 5 cancelled Smoke CI (concurrency/guard — not defects); 13 genuine failure across 5 signature clusters. No P0 (no provider-down). No issue closures — all open trackers reflect still-active modes.
Closures: none. #39885 and #39946 both recur this window; parent (this issue) and #39790 (token audit) are unrelated/fresh.
New sub-issue: LintMonster — scheduled run fails Process Safe Outputs because the agent emitted update_issue target:triggering (unsatisfiable outside issue context) and the skip is counted as a hard failure. Linked to this report.
Monitored singletons (no issue filed — single occurrence, post-agent infra step):
Avenger §27740190790 (main): agent job marked failure at Parse agent logs for step summary, but the agent itself completed and emitted a noop ("No PR created"). Post-processing log-parser step failure; watch for recurrence.
Daily Safe Outputs Git Simulator §27739899787 (main): agent success (emitted noop, state persisted to repo memory), but push_repo_memory/Push repo-memory changes (default) failed — likely a repo-memory branch push race/conflict. Single occurrence.
[aw] Failure Investigation — 6h window ending 2026-06-17 19:34 UTC
Executive summary
github/gh-awover the last 6h. 13 werecancelled(concurrency/guard, not defects); 16 were genuinefailureoutcomes across 6 signature clusters.gpt-5-codex-alpha-2025-11-07. Tracked per-workflow but root cause not escalated.upload_assetsjob fails because the declared PNG asset files are never staged. Filed as sub-issue below.claude-sonnet-4.6with400 model not supported; partially covered by the existing cascade rollup.agentic-workflowsissue qualified for closure — all 20 are <6h old and reflect still-active failure modes; none have fresh evidence of being fixed.Failure cluster table
Evidence
C1 — Codex model 404 (P0)
404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07at `(172.30.0.30/redacted)gpt-5-codexto backend idgpt-5-codex-alpha-2025-11-07, which no longer exists on the proxy. TheUnknown model gpt-5-codex ... fallback model metadatapath swaps metadata only — the request still targets the missing backend, so all 5 reconnect retries 404 and the turn fails with 0 tool calls.C2 — Phantom asset (P1, new)
upload_assetsjob fails:ERR_SYSTEM: Asset file not found: /tmp/gh-aw/safeoutputs/assets/quality_score_breakdown.pngwhile the agent job succeeded.upload_assetsafe-output items referencing PNGs with sha + byte size + markdown image links (e.g.quality_score_breakdown.pngsize=178654,historical_trends.pngsize=422386) that were never written to the staging dir..../.gh-aw-assets/(file)while the upload pipeline reads/tmp/gh-aw/safeoutputs/assets/; thesafe-outputs-assetsartifact was never produced (Artifact not found for name: safe-outputs-assets).C3 — Copilot BYOK (P1, heterogeneous)
400 The requested model is not supported(model=claude-sonnet-4.6,isModelNotSupportedError=true), not retried; not auth, not denial-limit (permissionDeniedCount=0).hasNumerousPermissionDenied(permissionDeniedCount=11).Existing issue correlation
[aw] * failedissues — consistent with C1/C3 sharing provider-side root causes.agentic-workflowsissue is <6h old and reflects an active failure mode; no fresh evidence of resolution.Fix roadmap
gpt-5-codex-alpha-2025-11-07, which 404s on the proxy. Point the alias at a live model (or restore the backend) and make the "unknown model" fallback re-target the request, not just metadata. Canonical tracker: [aw] Daily Cache Strategy Analyzer failed #39878.upload_assetfiles are staged to/tmp/gh-aw/safeoutputs/assets/before theupload_assetsjob, and validate file existence at safe-output emission time so the agent cannot declare phantom assets.claude-sonnet-4.6rejection (400 not supported) on the BYOK proxy; covered by cascade [aw] Failure cascade detected #39852.Sub-issues created
References: §27713303874 · §27713375907 · §27703869930
6h-window follow-up — 2026-06-18 08:26 UTC
Scope: 18 failed/cancelled runs. 5
cancelledSmoke CI (concurrency/guard — not defects); 13 genuinefailureacross 5 signature clusters. No P0 (no provider-down). No issue closures — all open trackers reflect still-active modes.upload_assets/Push assets, agent success (P1)Execute Copilot CLIexit 1, classifiers false (P1)Process Safe Outputsfail on unsatisfiable item (P1)Process Safe Outputson PR/dev branches (P2)Closures: none. #39885 and #39946 both recur this window; parent (this issue) and #39790 (token audit) are unrelated/fresh.
New sub-issue: LintMonster — scheduled run fails
Process Safe Outputsbecause the agent emittedupdate_issue target:triggering(unsatisfiable outside issue context) and the skip is counted as a hard failure. Linked to this report.Monitored singletons (no issue filed — single occurrence, post-agent infra step):
main):agentjob markedfailureatParse agent logs for step summary, but the agent itself completed and emitted anoop("No PR created"). Post-processing log-parser step failure; watch for recurrence.main): agentsuccess(emittednoop, state persisted to repo memory), butpush_repo_memory/Push repo-memory changes (default)failed — likely a repo-memory branch push race/conflict. Single occurrence.References: §27735258410 · §27737401463 · §27738455642