Executive Summary
Telemetry source: Sentry spans dataset (org github, project gh-aw), last 24h. Sentry MCP authenticated; errors and logs datasets queried and both empty.
Overall health for the last 24h is stable but with a recurring failing workflow. Of ~2,213 spans carrying a gh-aw.run.status value, 24 spans across 6 distinct runs reported failure (≈1.1% of status-bearing runs). Failures are confirmed via the gh-aw.run.status:failure attribute, but the root cause (timeout vs. error vs. truncation) is inconclusive because span.status, gen_ai.response.finish_reasons, and service.version/release are null on every span in the window — a confirmed instrumentation/export gap.
The dominant operational signal is PR Sous Chef: 4 of the 6 failed runs, each correlated with a single long-running gpt-5-mini gen_ai span clustering around ~5–6 minutes (291s–374s). This is a recurring pattern, not a one-off outlier.
Top Reliability Findings
| Priority |
Workflow |
Problem |
Evidence |
Next Action |
| P1 |
PR Sous Chef |
4 failed runs; each dominated by a ~5–6 min gpt-5-mini agent span ending in failure |
gh-aw.run.status:failure ×16 spans / 4 runs; longest gen_ai spans 373.8s, 373.0s, 341.6s, 291.1s (traces 59480a8d, 9dd3f585, f8f589e6, 74450fa9) |
Inspect PR Sous Chef agent turn/timeout budget on gpt-5-mini; confirm whether the long span is a model-gateway stall or an agent timeout |
| P2 |
Auto-Triage Issues |
1 failed run with a 130.7s gpt-5-mini span |
gh-aw.run.status:failure ×4 / run 27111922896; trace 43e59de4 max span 130661ms |
Verify run conclusion; same model family as PR Sous Chef — check shared gateway behavior |
| P2 |
Issue Monster |
1 failed run with a 98.8s claude-haiku-4.5 span |
gh-aw.run.status:failure ×4 / run 27161284635; trace 9470d921 max span 98812ms |
Verify run conclusion; different model, so likely independent of the gpt-5-mini pattern |
| P3 |
All workflows (instrumentation) |
span.status null on all 24,541 spans; gen_ai.response.finish_reasons null on all spans incl. the 22 gh-aw.conclusion.setup spans that should always emit it |
Aggregate span.status → single bucket null (24,541); finish_reasons → single bucket null (24,569) |
Investigate why OTLP status.code and conclusion-span finish_reasons are not reaching Sentry (see Notes) — blocks timeout-vs-truncation diagnosis |
| P3 |
All workflows (release correlation) |
service.version/release null on all spans despite being emitted |
release and service.version aggregates → single null bucket (24,577) |
Restore Sentry release mapping so regressions can be tied to a gh-aw version |
No timeouts, cancellations, OTLP export failures, or finish_reasons:length truncations could be confirmed — the fields needed to confirm them are not present (see Notes). The gateway.request / invoke_agent max-duration outliers (e.g. 2,830s on a null-transaction span, 909s on invoke_agent) were not marked failure and are treated as isolated slow-but-successful spans, not reliability defects.
Representative Traces
View representative traces
All traces below are in github/gh-aw. Trace continuity was validated via list_events filtered by trace:<id> (the MCP build exposes no get_trace_details); setup, gen_ai, and gateway.request spans share the same trace ID, so correlation is intact.
PR Sous Chef (P1) — recurring long gpt-5-mini failure
- Trace
9dd3f5851632dc553e2b73ba5449175e — dominant span: gen_ai / gpt-5-mini, 373,016 ms, gh-aw.run.status:failure. Same trace contains gh-aw.pre_activation.setup and gateway.request children (continuity intact). Run 27169378248 → https://github.com/github/gh-aw/actions/runs/27169378248
- Trace
59480a8daee57f8bfaea693a5038c0a3 — gen_ai / gpt-5-mini, 373,775 ms, failure. Run 27160481528 → https://github.com/github/gh-aw/actions/runs/27160481528
- Trace
f8f589e6a5257568ad60b83ad3fe61d2 — gen_ai / gpt-5-mini, 341,566 ms, failure. Run 27153104150 → https://github.com/github/gh-aw/actions/runs/27153104150
- Trace
74450fa916d0ed9b6cfd733bdb402e64 — gen_ai / gpt-5-mini, 291,120 ms, failure. Run 27118267018 → https://github.com/github/gh-aw/actions/runs/27118267018
Auto-Triage Issues (P2)
Issue Monster (P2)
Recommendations
- Triage PR Sous Chef first (smallest useful fix). Pull the 4 GitHub Actions run logs above and confirm whether the ~5–6 min
gpt-5-mini span is hitting an agent turn/timeout limit or a gateway stall. The tight 291–374s clustering suggests a bounded timeout rather than random model errors.
- Fix the
finish_reasons export gap. actions/setup/js/send_otlp_span.cjs:2076 always pushes gen_ai.response.finish_reasons on the conclusion span (effectiveStopReason, set to "timeout" when isAgentTimedOut), yet it is null on all 22 gh-aw.conclusion.setup spans in Sentry. Confirm whether the array attribute is being dropped on export or not indexed by Sentry; without it, timeout vs. truncation cannot be distinguished.
- Restore
span.status / release correlation. Map OTLP status.code/status.message to a Sentry-searchable span.status, and verify the service.version → Sentry release mapping so failures can be tied to a gh-aw version. Today the only reliable failure signal is the custom gh-aw.run.status attribute.
- Use
gh-aw.workflow.name (dotted) in queries/dashboards, not gh_aw.workflow_name — the underscore form is null for all 24,577 spans; the dotted form is correctly populated.
Notes
View notes
Datasets checked
spans: ~24,541–24,577 spans / 24h — healthy export volume.
errors: empty (no results) — no exception events captured; reliability signal relies entirely on spans.
logs: empty (no results).
Confirmed-present attributes
gh-aw.run.status: success 2,189 · failure 24 · null 22,365.
gh-aw.workflow.name, gh-aw.run.id, gh-aw.repository, gen_ai.request.model all populated and usable for grouping.
Confirmed-missing attributes (instrumentation/export gap)
span.status: null on all 24,541 spans. Emit-side buildOTLPSpan (send_otlp_span.cjs:293) defaults code to OK(1) unless statusCode is passed, so failing runs may not set OTLP ERROR at all — the dotted gh-aw.run.status:failure is the only dependable failure filter.
gen_ai.response.finish_reasons: null on all spans, including conclusion spans where it is supposed to be unconditional. → cannot confirm any finish_reasons:length truncation or timeout finish reason.
service.version / release: null on all spans despite send_otlp_span.cjs:324 emitting service.version — Sentry release mapping appears backend-dependent and is currently not populated.
Tooling limitations
- This Sentry MCP build exposes
list_events but not search_events or get_trace_details; trace continuity was validated with trace:<id> filters on list_events instead.
- Inconclusive by design: failures are confirmed, but root cause classification (timeout / error / truncation) is inconclusive until the missing fields above are restored.
References:
Generated by 🚨 Daily Reliability Review · 138 AIC · ⌖ 12.5 AIC · ⊞ 5.7K · ◷
Executive Summary
Telemetry source: Sentry spans dataset (org
github, projectgh-aw), last 24h. Sentry MCP authenticated;errorsandlogsdatasets queried and both empty.Overall health for the last 24h is stable but with a recurring failing workflow. Of ~2,213 spans carrying a
gh-aw.run.statusvalue, 24 spans across 6 distinct runs reportedfailure(≈1.1% of status-bearing runs). Failures are confirmed via thegh-aw.run.status:failureattribute, but the root cause (timeout vs. error vs. truncation) is inconclusive becausespan.status,gen_ai.response.finish_reasons, andservice.version/releaseare null on every span in the window — a confirmed instrumentation/export gap.The dominant operational signal is
PR Sous Chef: 4 of the 6 failed runs, each correlated with a single long-runninggpt-5-minigen_aispan clustering around ~5–6 minutes (291s–374s). This is a recurring pattern, not a one-off outlier.Top Reliability Findings
gpt-5-miniagent span ending infailuregh-aw.run.status:failure×16 spans / 4 runs; longest gen_ai spans 373.8s, 373.0s, 341.6s, 291.1s (traces59480a8d,9dd3f585,f8f589e6,74450fa9)gpt-5-mini; confirm whether the long span is a model-gateway stall or an agent timeoutgpt-5-minispangh-aw.run.status:failure×4 / run27111922896; trace43e59de4max span 130661msclaude-haiku-4.5spangh-aw.run.status:failure×4 / run27161284635; trace9470d921max span 98812msspan.statusnull on all 24,541 spans;gen_ai.response.finish_reasonsnull on all spans incl. the 22gh-aw.conclusion.setupspans that should always emit itspan.status→ single bucketnull(24,541);finish_reasons→ single bucketnull(24,569)status.codeand conclusion-spanfinish_reasonsare not reaching Sentry (see Notes) — blocks timeout-vs-truncation diagnosisservice.version/releasenull on all spans despite being emittedreleaseandservice.versionaggregates → singlenullbucket (24,577)releasemapping so regressions can be tied to a gh-aw versionNo timeouts, cancellations, OTLP export failures, or
finish_reasons:lengthtruncations could be confirmed — the fields needed to confirm them are not present (see Notes). Thegateway.request/invoke_agentmax-duration outliers (e.g. 2,830s on a null-transaction span, 909s oninvoke_agent) were not markedfailureand are treated as isolated slow-but-successful spans, not reliability defects.Representative Traces
View representative traces
All traces below are in
github/gh-aw. Trace continuity was validated vialist_eventsfiltered bytrace:<id>(the MCP build exposes noget_trace_details); setup,gen_ai, andgateway.requestspans share the same trace ID, so correlation is intact.PR Sous Chef (P1) — recurring long gpt-5-mini failure
9dd3f5851632dc553e2b73ba5449175e— dominant span:gen_ai/gpt-5-mini, 373,016 ms,gh-aw.run.status:failure. Same trace containsgh-aw.pre_activation.setupandgateway.requestchildren (continuity intact). Run27169378248→ https://github.com/github/gh-aw/actions/runs/2716937824859480a8daee57f8bfaea693a5038c0a3—gen_ai/gpt-5-mini, 373,775 ms, failure. Run27160481528→ https://github.com/github/gh-aw/actions/runs/27160481528f8f589e6a5257568ad60b83ad3fe61d2—gen_ai/gpt-5-mini, 341,566 ms, failure. Run27153104150→ https://github.com/github/gh-aw/actions/runs/2715310415074450fa916d0ed9b6cfd733bdb402e64—gen_ai/gpt-5-mini, 291,120 ms, failure. Run27118267018→ https://github.com/github/gh-aw/actions/runs/27118267018Auto-Triage Issues (P2)
43e59de40cb8ea3905d0d009ed9d2073—gen_ai/gpt-5-mini, 130,661 ms, failure. Run27111922896→ https://github.com/github/gh-aw/actions/runs/27111922896Issue Monster (P2)
9470d9213cba5fdd5127923eab96b4ba—gen_ai/claude-haiku-4.5, 98,812 ms, failure. Run27161284635→ https://github.com/github/gh-aw/actions/runs/27161284635Recommendations
gpt-5-minispan is hitting an agent turn/timeout limit or a gateway stall. The tight 291–374s clustering suggests a bounded timeout rather than random model errors.finish_reasonsexport gap.actions/setup/js/send_otlp_span.cjs:2076always pushesgen_ai.response.finish_reasonson the conclusion span (effectiveStopReason, set to"timeout"whenisAgentTimedOut), yet it is null on all 22gh-aw.conclusion.setupspans in Sentry. Confirm whether the array attribute is being dropped on export or not indexed by Sentry; without it, timeout vs. truncation cannot be distinguished.span.status/releasecorrelation. Map OTLPstatus.code/status.messageto a Sentry-searchablespan.status, and verify theservice.version→ Sentryreleasemapping so failures can be tied to a gh-aw version. Today the only reliable failure signal is the customgh-aw.run.statusattribute.gh-aw.workflow.name(dotted) in queries/dashboards, notgh_aw.workflow_name— the underscore form is null for all 24,577 spans; the dotted form is correctly populated.Notes
View notes
Datasets checked
spans: ~24,541–24,577 spans / 24h — healthy export volume.errors: empty (no results) — no exception events captured; reliability signal relies entirely on spans.logs: empty (no results).Confirmed-present attributes
gh-aw.run.status:success2,189 ·failure24 ·null22,365.gh-aw.workflow.name,gh-aw.run.id,gh-aw.repository,gen_ai.request.modelall populated and usable for grouping.Confirmed-missing attributes (instrumentation/export gap)
span.status: null on all 24,541 spans. Emit-sidebuildOTLPSpan(send_otlp_span.cjs:293) defaultscodeto OK(1) unlessstatusCodeis passed, so failing runs may not set OTLP ERROR at all — the dottedgh-aw.run.status:failureis the only dependable failure filter.gen_ai.response.finish_reasons: null on all spans, including conclusion spans where it is supposed to be unconditional. → cannot confirm anyfinish_reasons:lengthtruncation or timeout finish reason.service.version/release: null on all spans despitesend_otlp_span.cjs:324emittingservice.version— Sentry release mapping appears backend-dependent and is currently not populated.Tooling limitations
list_eventsbut notsearch_eventsorget_trace_details; trace continuity was validated withtrace:<id>filters onlist_eventsinstead.References: