You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci-analysis: replace canned recommendations with JSON summary + agent reasoning
Apply the Data vs. Reasoning Boundary pattern:
- Script emits [CI_ANALYSIS_SUMMARY] JSON block with structured facts
(totalFailedJobs, failedJobNames, knownIssues, prCorrelation, recommendationHint)
- Removed 47-line if/elseif recommendation chain producing canned prose
- Added 'Generating Recommendations' section to SKILL.md with decision table
- Updated 'Presenting Results' to reference JSON summary flow
- Agent now reasons over structured data instead of parroting script output
Tested with Claude Sonnet 4 and GPT-5 against PR dotnet#124232 — both rated
JSON completeness 4/5 and generated better recommendations than the old
heuristic.
Copy file name to clipboardExpand all lines: .github/skills/ci-analysis/SKILL.md
+55-24Lines changed: 55 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
name: ci-analysis
3
-
description: Analyze CI build and test status from Azure DevOps and Helix for dotnet repository PRs. Use when checking CI status, investigating failures, determining if a PR is ready to merge, or given URLs containing dev.azure.com or helix.dot.net. Also use when asked "why is CI red", "test failures", "retry CI", "rerun tests", or "is CI green".
3
+
description: Analyze CI build and test status from Azure DevOps and Helix for dotnet repository PRs. Use when checking CI status, investigating failures, determining if a PR is ready to merge, or given URLs containing dev.azure.com or helix.dot.net. Also use when asked "why is CI red", "test failures", "retry CI", "rerun tests", "is CI green", "build failed", "checks failing", or "flaky tests".
4
4
---
5
5
6
6
# Azure DevOps and Helix CI Analysis
@@ -9,6 +9,8 @@ Analyze CI build status and test failures in Azure DevOps and Helix for dotnet r
9
9
10
10
> 🚨 **NEVER** use `gh pr review --approve` or `--request-changes`. Only `--comment` is allowed. Approval and blocking are human-only actions.
11
11
12
+
**Workflow**: Run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
13
+
12
14
## When to Use This Skill
13
15
14
16
Use this skill when:
@@ -81,11 +83,13 @@ The script operates in three distinct modes depending on what information you ha
81
83
6. Fetches console logs (with `-ShowLogs`)
82
84
7. Searches for known issues with "Known Build Error" label
83
85
8. Correlates failures with PR file changes
84
-
9.**Provides smart retry recommendations**
86
+
9.**Emits structured summary** — `[CI_ANALYSIS_SUMMARY]` JSON block with all key facts for the agent to reason over
87
+
88
+
> **After the script runs**, you (the agent) generate recommendations. The script collects data; you synthesize the advice. See [Generating Recommendations](#generating-recommendations) below.
85
89
86
90
### Build ID Mode (`-BuildId`)
87
91
1. Fetches the build timeline directly (skips PR discovery)
88
-
2. Performs steps 3–7 and 9 from PR Analysis Mode, but does **not** fetch Build Analysis known issues or correlate failures with PR file changes (those require a PR number)
92
+
2. Performs steps 3–7 from PR Analysis Mode, but does **not** fetch Build Analysis known issues or correlate failures with PR file changes (those require a PR number). Still emits `[CI_ANALYSIS_SUMMARY]` JSON.
1. With `-HelixJob` alone: enumerates work items for the job and summarizes their status
@@ -116,24 +120,49 @@ The script operates in three distinct modes depending on what information you ha
116
120
117
121
> ❌ **Missing packages on flow PRs are NOT always infrastructure failures.** When a codeflow or dependency-update PR fails with "package not found" or "version not available", don't assume it's a feed propagation delay. Flow PRs bring in behavioral changes from upstream repos that can cause the build to request *different* packages than before. Example: an SDK flow changed runtime pack resolution logic, causing builds to look for `Microsoft.NETCore.App.Runtime.browser-wasm` (CoreCLR — doesn't exist) instead of `Microsoft.NETCore.App.Runtime.Mono.browser-wasm` (what had always been used). The fix was in the flowed code, not in feed infrastructure. Always check *which* package is missing and *why* it's being requested before diagnosing as infrastructure.
118
122
119
-
## Retry Recommendations
123
+
## Generating Recommendations
124
+
125
+
After the script outputs the `[CI_ANALYSIS_SUMMARY]` JSON block, **you** synthesize recommendations. Do not parrot the JSON — reason over it.
126
+
127
+
### Decision logic
128
+
129
+
Read `recommendationHint` as a starting point, then layer in context:
130
+
131
+
| Hint | Action |
132
+
|------|--------|
133
+
|`BUILD_SUCCESSFUL`| No failures. Confirm CI is green. |
134
+
|`KNOWN_ISSUES_DETECTED`| Known tracked issues found. Recommend retry if failures match known issues. Link the issues. |
135
+
|`LIKELY_PR_RELATED`| Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
136
+
|`POSSIBLY_TRANSIENT`| No correlation with PR changes, no known issues. Suggest checking main branch, searching for issues, or retrying. |
137
+
|`REVIEW_REQUIRED`| Could not auto-determine cause. Review failures manually. |
120
138
121
-
The script provides a recommendation at the end:
139
+
Then layer in nuance the heuristic can't capture:
122
140
123
-
| Recommendation | Meaning |
124
-
|----------------|---------|
125
-
|**KNOWN ISSUES DETECTED**| Tracked issues found that may correlate with failures. Review details. |
|**POSSIBLY TRANSIENT**| No clear cause - check main branch, search for issues. |
128
-
|**REVIEW REQUIRED**| Could not auto-determine cause. Manual review needed. |
141
+
-**Mixed signals**: Some failures match known issues AND some correlate with PR changes → separate them. Known issues = safe to retry; correlated = fix first.
142
+
-**Canceled jobs with recoverable results**: If `canceledJobNames` is non-empty, mention that canceled jobs may have passing Helix results (see "Recovering Results from Canceled Jobs").
143
+
-**Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
144
+
-**Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
145
+
-**BuildId mode**: `knownIssues` and `prCorrelation` will be empty (those require a PR number). Don't say "no known issues" — say "Build Analysis not available in BuildId mode."
146
+
-**Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on main. See the anti-patterns in "Interpreting Results" above.
147
+
148
+
### How to Retry
149
+
150
+
-**AzDO builds**: Comment `/azp run {pipeline-name}` on the PR (e.g., `/azp run dotnet-sdk-public`)
151
+
-**All pipelines**: Comment `/azp run` to retry all failing pipelines
152
+
-**Helix work items**: Cannot be individually retried — must re-run the entire AzDO build
153
+
154
+
### Tone
155
+
156
+
Be direct. Lead with the most important finding. Use 2-4 bullet points, not long paragraphs. Distinguish what's known vs. uncertain.
2.**Run the script** with `-ShowLogs` for detailed failure info
134
162
3.**Check Build Analysis** - Known issues are safe to retry
135
163
4.**Correlate with PR changes** - Same files failing = likely PR-related
136
-
5.**Interpret patterns** (but don't jump to conclusions):
164
+
5.**Compare with baseline** - If a test passes on main but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
165
+
6.**Interpret patterns** (but don't jump to conclusions):
137
166
- Same error across many jobs → Real code issue
138
167
- Build Analysis flags a known issue → Safe to retry
139
168
- Failure is **not** in Build Analysis → Investigate further before assuming transient
@@ -142,19 +171,21 @@ The script provides a recommendation at the end:
142
171
143
172
## Presenting Results
144
173
145
-
The script provides a recommendation at the end, but this is based on heuristics and may be incomplete. Before presenting conclusions to the user:
146
-
147
-
> ❌ **Don't blindly trust the script's recommendation.** The heuristic can misclassify failures. If the recommendation says "POSSIBLY TRANSIENT" but you see the same test failing 5 times on the same code path the PR touched — it's PR-related.
174
+
The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both:
148
175
149
-
1. Review the detailed failure information, not just the summary
150
-
2. Look for patterns the script may have missed (e.g., related failures across jobs)
151
-
3. Consider the PR context (what files changed, what the PR is trying to do)
152
-
4. Present findings with appropriate caveats - state what is known vs. uncertain
153
-
5. If the script's recommendation seems inconsistent with the details, trust the details
176
+
1. Read the JSON summary for structured facts (failed jobs, known issues, PR correlation, recommendation hint)
177
+
2. Read the human-readable output for failure details, console logs, and error messages
178
+
3. Reason over both to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
179
+
4. Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
180
+
5. Consider the PR context (what files changed, what the PR is trying to do)
181
+
6. Present findings with appropriate caveats — state what is known vs. uncertain
154
182
155
183
## References
156
184
157
185
-**Helix artifacts & binlogs**: See [references/helix-artifacts.md](references/helix-artifacts.md)
186
+
-**Binlog comparison (passing vs failing)**: See [references/binlog-comparison.md](references/binlog-comparison.md)
187
+
-**Subagent delegation patterns**: See [references/delegation-patterns.md](references/delegation-patterns.md)
188
+
-**Azure CLI deep investigation**: See [references/azure-cli.md](references/azure-cli.md)
158
189
-**Manual investigation steps**: See [references/manual-investigation.md](references/manual-investigation.md)
159
190
-**AzDO/Helix details**: See [references/azdo-helix-reference.md](references/azdo-helix-reference.md)
160
191
@@ -164,9 +195,9 @@ Canceled jobs (typically from timeouts) often still have useful artifacts. The H
164
195
165
196
**To investigate canceled jobs:**
166
197
167
-
1.**Download build artifacts**: Use the AzDO artifacts API to get`Logs_Build_*` pipeline artifacts for the canceled job. These contain binlogs even for canceled jobs.
168
-
2.**Extract Helix job IDs**: Use the MSBuild MCP server to load the `SendToHelix.binlog` and search for `"Sent Helix Job"` messages. Each contains a Helix job ID.
169
-
3.**Query Helix directly**: For each job ID, query `https://helix.dot.net/api/2019-06-17/jobs/{jobId}/workitems` to get actual pass/fail results.
198
+
1.**Download build artifacts**: Use `az pipelines runs artifact download` (see [references/azure-cli.md](references/azure-cli.md)) to get pipeline artifacts for the canceled job. These contain binlogs even for canceled jobs.
199
+
2.**Extract Helix job IDs**: Use the MSBuild MCP server to load the `SendToHelix.binlog` and search for `"Sent Helix Job"` messages. Each contains a Helix job ID. See [references/binlog-comparison.md](references/binlog-comparison.md) for the full "binlogs to find binlogs" workflow.
200
+
3.**Query Helix directly**: For each job ID, use the CI script: `./scripts/Get-CIStatus.ps1 -HelixJob "{GUID}" -FindBinlogs`
170
201
171
202
**Example**: A `browser-wasm windows WasmBuildTests` job was canceled after 3 hours. The binlog (truncated) still contained 12 Helix job IDs. Querying them revealed all 226 work items passed — the "failure" was purely a timeout in the AzDO wrapper.
172
203
@@ -278,5 +309,5 @@ This is especially useful when:
278
309
4. Use `-SearchMihuBot` for semantic search of related issues
279
310
5. Binlogs in artifacts help diagnose MSB4018 task failures
280
311
6. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
281
-
7. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** — `state` contains `SUCCESS`/`FAILURE` directly
312
+
7. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field**in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
282
313
8. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls
0 commit comments