[reliability] Daily Reliability Review - 2026-06-08

### Executive Summary

Telemetry source: Sentry **spans** dataset (org `github`, project `gh-aw`), last 24h. Sentry MCP authenticated; `errors` and `logs` datasets queried and **both empty**.

Overall health for the last 24h is **stable but with a recurring failing workflow**. Of ~2,213 spans carrying a `gh-aw.run.status` value, **24 spans across 6 distinct runs reported `failure`** (≈1.1% of status-bearing runs). Failures are **confirmed** via the `gh-aw.run.status:failure` attribute, but the **root cause (timeout vs. error vs. truncation) is inconclusive** because `span.status`, `gen_ai.response.finish_reasons`, and `service.version`/`release` are null on every span in the window — a confirmed instrumentation/export gap.

The dominant operational signal is **`PR Sous Chef`**: 4 of the 6 failed runs, each correlated with a single long-running `gpt-5-mini` `gen_ai` span clustering around **~5–6 minutes** (291s–374s). This is a recurring pattern, not a one-off outlier.

### Top Reliability Findings

| Priority | Workflow | Problem | Evidence | Next Action |
| --- | --- | --- | --- | --- |
| P1 | PR Sous Chef | 4 failed runs; each dominated by a ~5–6 min `gpt-5-mini` agent span ending in `failure` | `gh-aw.run.status:failure` ×16 spans / 4 runs; longest gen_ai spans 373.8s, 373.0s, 341.6s, 291.1s (traces `59480a8d`, `9dd3f585`, `f8f589e6`, `74450fa9`) | Inspect PR Sous Chef agent turn/timeout budget on `gpt-5-mini`; confirm whether the long span is a model-gateway stall or an agent timeout |
| P2 | Auto-Triage Issues | 1 failed run with a 130.7s `gpt-5-mini` span | `gh-aw.run.status:failure` ×4 / run `27111922896`; trace `43e59de4` max span 130661ms | Verify run conclusion; same model family as PR Sous Chef — check shared gateway behavior |
| P2 | Issue Monster | 1 failed run with a 98.8s `claude-haiku-4.5` span | `gh-aw.run.status:failure` ×4 / run `27161284635`; trace `9470d921` max span 98812ms | Verify run conclusion; different model, so likely independent of the gpt-5-mini pattern |
| P3 | All workflows (instrumentation) | `span.status` null on **all 24,541** spans; `gen_ai.response.finish_reasons` null on **all** spans incl. the 22 `gh-aw.conclusion.setup` spans that should always emit it | Aggregate `span.status` → single bucket `null` (24,541); `finish_reasons` → single bucket `null` (24,569) | Investigate why OTLP `status.code` and conclusion-span `finish_reasons` are not reaching Sentry (see Notes) — blocks timeout-vs-truncation diagnosis |
| P3 | All workflows (release correlation) | `service.version`/`release` null on **all** spans despite being emitted | `release` and `service.version` aggregates → single `null` bucket (24,577) | Restore Sentry `release` mapping so regressions can be tied to a gh-aw version |

No timeouts, cancellations, OTLP export failures, or `finish_reasons:length` truncations could be **confirmed** — the fields needed to confirm them are not present (see Notes). The `gateway.request` / `invoke_agent` max-duration outliers (e.g. 2,830s on a null-transaction span, 909s on `invoke_agent`) were **not** marked `failure` and are treated as isolated slow-but-successful spans, not reliability defects.

### Representative Traces

<details>
<summary>View representative traces</summary>

All traces below are in `github/gh-aw`. Trace continuity was validated via `list_events` filtered by `trace:<id>` (the MCP build exposes no `get_trace_details`); setup, `gen_ai`, and `gateway.request` spans share the same trace ID, so correlation is intact.

**PR Sous Chef (P1) — recurring long gpt-5-mini failure**
- Trace `9dd3f5851632dc553e2b73ba5449175e` — dominant span: `gen_ai` / `gpt-5-mini`, **373,016 ms**, `gh-aw.run.status:failure`. Same trace contains `gh-aw.pre_activation.setup` and `gateway.request` children (continuity intact). Run `27169378248` → https://github.com/github/gh-aw/actions/runs/27169378248
- Trace `59480a8daee57f8bfaea693a5038c0a3` — `gen_ai` / `gpt-5-mini`, **373,775 ms**, failure. Run `27160481528` → https://github.com/github/gh-aw/actions/runs/27160481528
- Trace `f8f589e6a5257568ad60b83ad3fe61d2` — `gen_ai` / `gpt-5-mini`, **341,566 ms**, failure. Run `27153104150` → https://github.com/github/gh-aw/actions/runs/27153104150
- Trace `74450fa916d0ed9b6cfd733bdb402e64` — `gen_ai` / `gpt-5-mini`, **291,120 ms**, failure. Run `27118267018` → https://github.com/github/gh-aw/actions/runs/27118267018

**Auto-Triage Issues (P2)**
- Trace `43e59de40cb8ea3905d0d009ed9d2073` — `gen_ai` / `gpt-5-mini`, **130,661 ms**, failure. Run `27111922896` → https://github.com/github/gh-aw/actions/runs/27111922896

**Issue Monster (P2)**
- Trace `9470d9213cba5fdd5127923eab96b4ba` — `gen_ai` / `claude-haiku-4.5`, **98,812 ms**, failure. Run `27161284635` → https://github.com/github/gh-aw/actions/runs/27161284635

</details>

### Recommendations

1. **Triage PR Sous Chef first (smallest useful fix).** Pull the 4 GitHub Actions run logs above and confirm whether the ~5–6 min `gpt-5-mini` span is hitting an agent turn/timeout limit or a gateway stall. The tight 291–374s clustering suggests a bounded timeout rather than random model errors.
2. **Fix the `finish_reasons` export gap.** `actions/setup/js/send_otlp_span.cjs:2076` always pushes `gen_ai.response.finish_reasons` on the conclusion span (`effectiveStopReason`, set to `"timeout"` when `isAgentTimedOut`), yet it is null on all 22 `gh-aw.conclusion.setup` spans in Sentry. Confirm whether the array attribute is being dropped on export or not indexed by Sentry; without it, timeout vs. truncation cannot be distinguished.
3. **Restore `span.status` / `release` correlation.** Map OTLP `status.code`/`status.message` to a Sentry-searchable `span.status`, and verify the `service.version` → Sentry `release` mapping so failures can be tied to a gh-aw version. Today the only reliable failure signal is the custom `gh-aw.run.status` attribute.
4. **Use `gh-aw.workflow.name` (dotted) in queries/dashboards**, not `gh_aw.workflow_name` — the underscore form is null for all 24,577 spans; the dotted form is correctly populated.

### Notes

<details>
<summary>View notes</summary>

**Datasets checked**
- `spans`: ~24,541–24,577 spans / 24h — healthy export volume.
- `errors`: **empty** (no results) — no exception events captured; reliability signal relies entirely on spans.
- `logs`: **empty** (no results).

**Confirmed-present attributes**
- `gh-aw.run.status`: `success` 2,189 · `failure` 24 · `null` 22,365.
- `gh-aw.workflow.name`, `gh-aw.run.id`, `gh-aw.repository`, `gen_ai.request.model` all populated and usable for grouping.

**Confirmed-missing attributes (instrumentation/export gap)**
- `span.status`: null on all 24,541 spans. Emit-side `buildOTLPSpan` (`send_otlp_span.cjs:293`) defaults `code` to OK(1) unless `statusCode` is passed, so failing runs may not set OTLP ERROR at all — the dotted `gh-aw.run.status:failure` is the only dependable failure filter.
- `gen_ai.response.finish_reasons`: null on all spans, including conclusion spans where it is supposed to be unconditional. → **cannot confirm any `finish_reasons:length` truncation or timeout finish reason.**
- `service.version` / `release`: null on all spans despite `send_otlp_span.cjs:324` emitting `service.version` — Sentry release mapping appears backend-dependent and is currently not populated.

**Tooling limitations**
- This Sentry MCP build exposes `list_events` but **not** `search_events` or `get_trace_details`; trace continuity was validated with `trace:<id>` filters on `list_events` instead.
- Inconclusive by design: failures are confirmed, but **root cause classification (timeout / error / truncation) is inconclusive** until the missing fields above are restored.

**References:**
- [§27169378248](https://github.com/github/gh-aw/actions/runs/27169378248)
- [§27160481528](https://github.com/github/gh-aw/actions/runs/27160481528)
- [§27153104150](https://github.com/github/gh-aw/actions/runs/27153104150)

</details>







> Generated by [🚨 Daily Reliability Review](https://github.com/github/gh-aw/actions/runs/27172971812) · 138 AIC · ⌖ 12.5 AIC · ⊞ 5.7K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-reliability-review%22&type=issues)
> - [x] expires  on Jun 10, 2026, 3:23 PM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[reliability] Daily Reliability Review - 2026-06-08 #37969

Executive Summary

Top Reliability Findings

Representative Traces

Recommendations

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Priority	Workflow	Problem	Evidence	Next Action
P1	PR Sous Chef	4 failed runs; each dominated by a ~5–6 min `gpt-5-mini` agent span ending in `failure`	`gh-aw.run.status:failure` ×16 spans / 4 runs; longest gen_ai spans 373.8s, 373.0s, 341.6s, 291.1s (traces `59480a8d`, `9dd3f585`, `f8f589e6`, `74450fa9`)	Inspect PR Sous Chef agent turn/timeout budget on `gpt-5-mini`; confirm whether the long span is a model-gateway stall or an agent timeout
P2	Auto-Triage Issues	1 failed run with a 130.7s `gpt-5-mini` span	`gh-aw.run.status:failure` ×4 / run `27111922896`; trace `43e59de4` max span 130661ms	Verify run conclusion; same model family as PR Sous Chef — check shared gateway behavior
P2	Issue Monster	1 failed run with a 98.8s `claude-haiku-4.5` span	`gh-aw.run.status:failure` ×4 / run `27161284635`; trace `9470d921` max span 98812ms	Verify run conclusion; different model, so likely independent of the gpt-5-mini pattern
P3	All workflows (instrumentation)	`span.status` null on all 24,541 spans; `gen_ai.response.finish_reasons` null on all spans incl. the 22 `gh-aw.conclusion.setup` spans that should always emit it	Aggregate `span.status` → single bucket `null` (24,541); `finish_reasons` → single bucket `null` (24,569)	Investigate why OTLP `status.code` and conclusion-span `finish_reasons` are not reaching Sentry (see Notes) — blocks timeout-vs-truncation diagnosis
P3	All workflows (release correlation)	`service.version`/`release` null on all spans despite being emitted	`release` and `service.version` aggregates → single `null` bucket (24,577)	Restore Sentry `release` mapping so regressions can be tied to a gh-aw version

[reliability] Daily Reliability Review - 2026-06-08 #37969

Description

Executive Summary

Top Reliability Findings

Representative Traces

Recommendations

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions