Skip to content

[reliability] Daily Reliability Review - 2026-06-08 #37969

@github-actions

Description

@github-actions

Executive Summary

Telemetry source: Sentry spans dataset (org github, project gh-aw), last 24h. Sentry MCP authenticated; errors and logs datasets queried and both empty.

Overall health for the last 24h is stable but with a recurring failing workflow. Of ~2,213 spans carrying a gh-aw.run.status value, 24 spans across 6 distinct runs reported failure (≈1.1% of status-bearing runs). Failures are confirmed via the gh-aw.run.status:failure attribute, but the root cause (timeout vs. error vs. truncation) is inconclusive because span.status, gen_ai.response.finish_reasons, and service.version/release are null on every span in the window — a confirmed instrumentation/export gap.

The dominant operational signal is PR Sous Chef: 4 of the 6 failed runs, each correlated with a single long-running gpt-5-mini gen_ai span clustering around ~5–6 minutes (291s–374s). This is a recurring pattern, not a one-off outlier.

Top Reliability Findings

Priority Workflow Problem Evidence Next Action
P1 PR Sous Chef 4 failed runs; each dominated by a ~5–6 min gpt-5-mini agent span ending in failure gh-aw.run.status:failure ×16 spans / 4 runs; longest gen_ai spans 373.8s, 373.0s, 341.6s, 291.1s (traces 59480a8d, 9dd3f585, f8f589e6, 74450fa9) Inspect PR Sous Chef agent turn/timeout budget on gpt-5-mini; confirm whether the long span is a model-gateway stall or an agent timeout
P2 Auto-Triage Issues 1 failed run with a 130.7s gpt-5-mini span gh-aw.run.status:failure ×4 / run 27111922896; trace 43e59de4 max span 130661ms Verify run conclusion; same model family as PR Sous Chef — check shared gateway behavior
P2 Issue Monster 1 failed run with a 98.8s claude-haiku-4.5 span gh-aw.run.status:failure ×4 / run 27161284635; trace 9470d921 max span 98812ms Verify run conclusion; different model, so likely independent of the gpt-5-mini pattern
P3 All workflows (instrumentation) span.status null on all 24,541 spans; gen_ai.response.finish_reasons null on all spans incl. the 22 gh-aw.conclusion.setup spans that should always emit it Aggregate span.status → single bucket null (24,541); finish_reasons → single bucket null (24,569) Investigate why OTLP status.code and conclusion-span finish_reasons are not reaching Sentry (see Notes) — blocks timeout-vs-truncation diagnosis
P3 All workflows (release correlation) service.version/release null on all spans despite being emitted release and service.version aggregates → single null bucket (24,577) Restore Sentry release mapping so regressions can be tied to a gh-aw version

No timeouts, cancellations, OTLP export failures, or finish_reasons:length truncations could be confirmed — the fields needed to confirm them are not present (see Notes). The gateway.request / invoke_agent max-duration outliers (e.g. 2,830s on a null-transaction span, 909s on invoke_agent) were not marked failure and are treated as isolated slow-but-successful spans, not reliability defects.

Representative Traces

View representative traces

All traces below are in github/gh-aw. Trace continuity was validated via list_events filtered by trace:<id> (the MCP build exposes no get_trace_details); setup, gen_ai, and gateway.request spans share the same trace ID, so correlation is intact.

PR Sous Chef (P1) — recurring long gpt-5-mini failure

Auto-Triage Issues (P2)

Issue Monster (P2)

Recommendations

  1. Triage PR Sous Chef first (smallest useful fix). Pull the 4 GitHub Actions run logs above and confirm whether the ~5–6 min gpt-5-mini span is hitting an agent turn/timeout limit or a gateway stall. The tight 291–374s clustering suggests a bounded timeout rather than random model errors.
  2. Fix the finish_reasons export gap. actions/setup/js/send_otlp_span.cjs:2076 always pushes gen_ai.response.finish_reasons on the conclusion span (effectiveStopReason, set to "timeout" when isAgentTimedOut), yet it is null on all 22 gh-aw.conclusion.setup spans in Sentry. Confirm whether the array attribute is being dropped on export or not indexed by Sentry; without it, timeout vs. truncation cannot be distinguished.
  3. Restore span.status / release correlation. Map OTLP status.code/status.message to a Sentry-searchable span.status, and verify the service.version → Sentry release mapping so failures can be tied to a gh-aw version. Today the only reliable failure signal is the custom gh-aw.run.status attribute.
  4. Use gh-aw.workflow.name (dotted) in queries/dashboards, not gh_aw.workflow_name — the underscore form is null for all 24,577 spans; the dotted form is correctly populated.

Notes

View notes

Datasets checked

  • spans: ~24,541–24,577 spans / 24h — healthy export volume.
  • errors: empty (no results) — no exception events captured; reliability signal relies entirely on spans.
  • logs: empty (no results).

Confirmed-present attributes

  • gh-aw.run.status: success 2,189 · failure 24 · null 22,365.
  • gh-aw.workflow.name, gh-aw.run.id, gh-aw.repository, gen_ai.request.model all populated and usable for grouping.

Confirmed-missing attributes (instrumentation/export gap)

  • span.status: null on all 24,541 spans. Emit-side buildOTLPSpan (send_otlp_span.cjs:293) defaults code to OK(1) unless statusCode is passed, so failing runs may not set OTLP ERROR at all — the dotted gh-aw.run.status:failure is the only dependable failure filter.
  • gen_ai.response.finish_reasons: null on all spans, including conclusion spans where it is supposed to be unconditional. → cannot confirm any finish_reasons:length truncation or timeout finish reason.
  • service.version / release: null on all spans despite send_otlp_span.cjs:324 emitting service.version — Sentry release mapping appears backend-dependent and is currently not populated.

Tooling limitations

  • This Sentry MCP build exposes list_events but not search_events or get_trace_details; trace continuity was validated with trace:<id> filters on list_events instead.
  • Inconclusive by design: failures are confirmed, but root cause classification (timeout / error / truncation) is inconclusive until the missing fields above are restored.

References:

Generated by 🚨 Daily Reliability Review · 138 AIC · ⌖ 12.5 AIC · ⊞ 5.7K ·

  • expires on Jun 10, 2026, 3:23 PM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions