Skip to content

Run build-failure-analysis dotnet tool install from /tmp#8795

Merged
Evangelink merged 1 commit into
mainfrom
dev/amauryleve/fix-build-failure-analysis-dotnet-tool
Jun 3, 2026
Merged

Run build-failure-analysis dotnet tool install from /tmp#8795
Evangelink merged 1 commit into
mainfrom
dev/amauryleve/fix-build-failure-analysis-dotnet-tool

Conversation

@Evangelink

Copy link
Copy Markdown
Member

What

In .github/workflows/build-failure-analysis.md (and its sister build-failure-analysis-command.md), set working-directory: /tmp on the agent job's Install NuGet MCP Server step so dotnet no longer walks up into the repo's global.json when resolving the SDK for dotnet tool install --global.

Why

Run 26856420989 failed at the Install NuGet MCP Server step with exit code 155:

The .NET SDK could not be found, please run ./build.cmd on Windows or ./build.sh on Linux and macOS. ##[error]Process completed with exit code 155.

Root cause: global.json pins SDK 11.0.100-preview.5.26227.104 (internal-only preview). The build job populates .dotnet/ via ./build.sh so its own dotnet tool install works, but the agent job runs on a fresh runner where ./build.sh never ran -- only the actions/setup-dotnet@v5 9.0.x SDK is on PATH. With cwd at the workspace, dotnet finds global.json, can't satisfy its pinned version, and emits our custom errorMessage before exiting 155, which then takes the whole agent job down.

Running the step from /tmp keeps dotnet from finding global.json; the setup-dotnet-installed SDK on PATH is sufficient for a global tool install.

Notes on the diff

  • Two .md source edits (a comment + the working-directory line on the step).
  • Two .lock.yml files were regenerated via gh aw compile --strict. This also bumped the gh-aw setup-action from v0.75.4 -> v0.76.1 in these two lock files; 10 other lock files in the repo were already on v0.76.1, so this just keeps the metadata file consistent.
  • actions-lock.json updated to v0.76.1 to match what these (and other) workflow lock files already reference.

Out of scope

Two other recent agentic-workflow failures were investigated but don't have a workflow-side fix:

Verification

  • gh aw compile --strict on the two affected workflows: 0 errors, 0 warnings, idempotent re-compile.
  • working-directory: /tmp lands on the intended step in both .lock.yml files.

The agent job's `Install NuGet MCP Server` step ran
`dotnet tool install --global` from the repo workspace cwd. The repo's
`global.json` pins SDK `11.0.100-preview.5.26227.104` (internal-only
preview); on the fresh agent runner only `actions/setup-dotnet@v5` 9.0.x
is available. `dotnet` walked up to `global.json`, failed to find the
pinned SDK, printed our custom `errorMessage` ("The .NET SDK could not be
found, please run ./build.cmd...") and exited 155, taking the whole agent
job down (see run 26856420989).

Set `working-directory: /tmp` on the step so `dotnet` no longer walks
into our `global.json`; the `setup-dotnet`-installed SDK on PATH is
sufficient for the global tool install. Apply the same defensive guard to
the sister command-mode workflow, which was guaranteed to hit the same bug
on next slash-command invocation.

Lock files were regenerated via `gh aw compile --strict`; this also
bumps gh-aw setup-action from v0.75.4 -> v0.76.1 in these two lock files
(10 other lock files in the repo were already on v0.76.1, so this just
keeps the metadata file in sync).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 3, 2026 09:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Build Failure Analysis agentic workflows to install the NuGet MCP Server from /tmp, preventing dotnet from discovering the repository global.json (which pins an internal-only SDK) on fresh agent runners and avoiding the observed exit-code-155 failure.

Changes:

  • Set working-directory: /tmp on the “Install NuGet MCP Server” step in both Build Failure Analysis workflow sources.
  • Regenerated the corresponding *.lock.yml workflows via gh aw compile --strict (including the compiler/version metadata bumps).
  • Updated .github/aw/actions-lock.json to reflect github/gh-aw-actions/setup* at v0.76.1.
Show a summary per file
File Description
.github/workflows/build-failure-analysis.md Runs the NuGet MCP Server tool install from /tmp to avoid global.json SDK resolution on the agent runner.
.github/workflows/build-failure-analysis.lock.yml Regenerated lock workflow reflecting the /tmp working directory change and updated gh-aw metadata/pins.
.github/workflows/build-failure-analysis-command.md Same /tmp working directory adjustment for the slash-command variant.
.github/workflows/build-failure-analysis-command.lock.yml Regenerated lock workflow reflecting the /tmp working directory change and updated gh-aw metadata/pins.
.github/aw/actions-lock.json Updates gh-aw action lock entries to v0.76.1 to match the regenerated lock workflows.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 0

@Evangelink Evangelink left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert TestFx Review — PR #8795

Scope: CI/workflow-only fix. No production source, test, analyzer, or build-props files changed.

21-Dimension Results

21/21 dimensions clean — no findings.

Dimensions 1–2, 4–19 are not applicable (no C# code, tests, analyzers, serialization, or MSBuild files changed).

Dim 3 — Security & IPC Contract Safety — LGTM

working-directory: /tmp is used solely to prevent dotnet from traversing up to global.json. dotnet tool install --global installs to ~/.dotnet/tools, writing nothing to /tmp. GitHub-hosted runners are ephemeral and job-isolated, so /tmp sharing is not a concern. No user-controlled data flows into the path.

Dim 20 — Build Infrastructure & Dependencies — LGTM

Root cause is correctly diagnosed: the agent job runs on a fresh runner where ./build.sh was never executed, so global.json's 11.0.100-preview.5.26227.104 SDK pin cannot be satisfied by the setup-dotnet-installed 9.0.x SDK on PATH. working-directory: /tmp is the minimal, idiomatic fix. Lock files regenerated via gh aw compile --strict as required by repo convention. The action pin bump (v0.75.4v0.76.1) is a natural side-effect of recompilation and is consistent with the 10 other lock files already on v0.76.1.

Dim 21 — Scope & PR Discipline — LGTM

PR is tightly scoped. The only "extra" churn is the action pin version bump, which is an inevitable consequence of recompilation and keeps the repo consistent rather than introducing drift.


No blocking or major issues found. This is a clean, minimal, well-reasoned CI fix.

Generated by Expert Code Review (on open) for issue #8795 · sonnet46 1.9M

@Evangelink Evangelink enabled auto-merge (squash) June 3, 2026 10:39
@Evangelink Evangelink merged commit 32a2c95 into main Jun 3, 2026
29 checks passed
@Evangelink Evangelink deleted the dev/amauryleve/fix-build-failure-analysis-dotnet-tool branch June 3, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants