Run build-failure-analysis dotnet tool install from /tmp#8795
Conversation
The agent job's `Install NuGet MCP Server` step ran
`dotnet tool install --global` from the repo workspace cwd. The repo's
`global.json` pins SDK `11.0.100-preview.5.26227.104` (internal-only
preview); on the fresh agent runner only `actions/setup-dotnet@v5` 9.0.x
is available. `dotnet` walked up to `global.json`, failed to find the
pinned SDK, printed our custom `errorMessage` ("The .NET SDK could not be
found, please run ./build.cmd...") and exited 155, taking the whole agent
job down (see run 26856420989).
Set `working-directory: /tmp` on the step so `dotnet` no longer walks
into our `global.json`; the `setup-dotnet`-installed SDK on PATH is
sufficient for the global tool install. Apply the same defensive guard to
the sister command-mode workflow, which was guaranteed to hit the same bug
on next slash-command invocation.
Lock files were regenerated via `gh aw compile --strict`; this also
bumps gh-aw setup-action from v0.75.4 -> v0.76.1 in these two lock files
(10 other lock files in the repo were already on v0.76.1, so this just
keeps the metadata file in sync).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the Build Failure Analysis agentic workflows to install the NuGet MCP Server from /tmp, preventing dotnet from discovering the repository global.json (which pins an internal-only SDK) on fresh agent runners and avoiding the observed exit-code-155 failure.
Changes:
- Set
working-directory: /tmpon the “Install NuGet MCP Server” step in both Build Failure Analysis workflow sources. - Regenerated the corresponding
*.lock.ymlworkflows viagh aw compile --strict(including the compiler/version metadata bumps). - Updated
.github/aw/actions-lock.jsonto reflectgithub/gh-aw-actions/setup*atv0.76.1.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/build-failure-analysis.md | Runs the NuGet MCP Server tool install from /tmp to avoid global.json SDK resolution on the agent runner. |
| .github/workflows/build-failure-analysis.lock.yml | Regenerated lock workflow reflecting the /tmp working directory change and updated gh-aw metadata/pins. |
| .github/workflows/build-failure-analysis-command.md | Same /tmp working directory adjustment for the slash-command variant. |
| .github/workflows/build-failure-analysis-command.lock.yml | Regenerated lock workflow reflecting the /tmp working directory change and updated gh-aw metadata/pins. |
| .github/aw/actions-lock.json | Updates gh-aw action lock entries to v0.76.1 to match the regenerated lock workflows. |
Copilot's findings
- Files reviewed: 5/5 changed files
- Comments generated: 0
Evangelink
left a comment
There was a problem hiding this comment.
Expert TestFx Review — PR #8795
Scope: CI/workflow-only fix. No production source, test, analyzer, or build-props files changed.
21-Dimension Results
✅ 21/21 dimensions clean — no findings.
Dimensions 1–2, 4–19 are not applicable (no C# code, tests, analyzers, serialization, or MSBuild files changed).
Dim 3 — Security & IPC Contract Safety — LGTM
working-directory: /tmp is used solely to prevent dotnet from traversing up to global.json. dotnet tool install --global installs to ~/.dotnet/tools, writing nothing to /tmp. GitHub-hosted runners are ephemeral and job-isolated, so /tmp sharing is not a concern. No user-controlled data flows into the path.
Dim 20 — Build Infrastructure & Dependencies — LGTM
Root cause is correctly diagnosed: the agent job runs on a fresh runner where ./build.sh was never executed, so global.json's 11.0.100-preview.5.26227.104 SDK pin cannot be satisfied by the setup-dotnet-installed 9.0.x SDK on PATH. working-directory: /tmp is the minimal, idiomatic fix. Lock files regenerated via gh aw compile --strict as required by repo convention. The action pin bump (v0.75.4 → v0.76.1) is a natural side-effect of recompilation and is consistent with the 10 other lock files already on v0.76.1.
Dim 21 — Scope & PR Discipline — LGTM
PR is tightly scoped. The only "extra" churn is the action pin version bump, which is an inevitable consequence of recompilation and keeps the repo consistent rather than introducing drift.
No blocking or major issues found. This is a clean, minimal, well-reasoned CI fix.
Generated by Expert Code Review (on open) for issue #8795 · sonnet46 1.9M
What
In
.github/workflows/build-failure-analysis.md(and its sisterbuild-failure-analysis-command.md), setworking-directory: /tmpon the agent job'sInstall NuGet MCP Serverstep sodotnetno longer walks up into the repo'sglobal.jsonwhen resolving the SDK fordotnet tool install --global.Why
Run 26856420989 failed at the
Install NuGet MCP Serverstep with exit code 155:The .NET SDK could not be found, please run ./build.cmd on Windows or ./build.sh on Linux and macOS. ##[error]Process completed with exit code 155.Root cause:
global.jsonpins SDK11.0.100-preview.5.26227.104(internal-only preview). The build job populates.dotnet/via./build.shso its owndotnet tool installworks, but the agent job runs on a fresh runner where./build.shnever ran -- only theactions/setup-dotnet@v59.0.x SDK is on PATH. With cwd at the workspace,dotnetfindsglobal.json, can't satisfy its pinned version, and emits our customerrorMessagebefore exiting 155, which then takes the whole agent job down.Running the step from
/tmpkeepsdotnetfrom findingglobal.json; thesetup-dotnet-installed SDK on PATH is sufficient for a global tool install.Notes on the diff
.mdsource edits (a comment + theworking-directoryline on the step)..lock.ymlfiles were regenerated viagh aw compile --strict. This also bumped the gh-aw setup-action fromv0.75.4->v0.76.1in these two lock files; 10 other lock files in the repo were already onv0.76.1, so this just keeps the metadata file consistent.actions-lock.jsonupdated tov0.76.1to match what these (and other) workflow lock files already reference.Out of scope
Two other recent agentic-workflow failures were investigated but don't have a workflow-side fix:
Efficiency Improver) -- gh-aw's internalsafe_outputsjob failed atgit fetch --unshallow originwithCould not resolve host: github.com. Transient DNS/network issue inside the sandboxed bundle-apply step; nothing we can configure here.Test Improver) -- PR [test-improver] test: add StringComparison overload and null-handling tests for Assert.StartsWith/EndsWith #8781 was created successfully, but the agent then emitted anupdate_issuepayload withtarget: "*"and noitem_number. Agent-output bug from the upstreamgithubnext/agenticstest-improver.mdTask 7 prompt; should be addressed upstream.Verification
gh aw compile --stricton the two affected workflows: 0 errors, 0 warnings, idempotent re-compile.working-directory: /tmplands on the intended step in both.lock.ymlfiles.