Skip to content

fix(buzz-dev-mcp): resolve non-WSL bash so the MCP shell works on Windows#1119

Merged
wpfleger96 merged 8 commits into
mainfrom
duncan/windows-shell-bash
Jun 18, 2026
Merged

fix(buzz-dev-mcp): resolve non-WSL bash so the MCP shell works on Windows#1119
wpfleger96 merged 8 commits into
mainfrom
duncan/windows-shell-bash

Conversation

@wpfleger96

@wpfleger96 wpfleger96 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

The MCP shell tool is the only channel-posting transport an agent has on Buzz, and it was fully dead on Windows. shell.rs spawned a bare Command::new("bash"), which on Windows re-enters PATH search and resolves System32\bash.exe — the WSL launcher — failing at spawn with 0x8007072c before any command runs.

What changed

resolve_bash() now returns an absolute, non-WSL bash path and spawns it directly. Probe order (first hit wins), Windows-only — Unix is a no-op that preserves the existing bare-bash behavior:

  1. GIT_BASH env override (explicit operator escape hatch)
  2. Installed Git for Windows (%ProgramFiles%\Git, %LocalAppData%\Programs\Git)
  3. Bundled bash staged next to the executable (makes a bare, Git-less host work)
  4. PATH scan via std::env::split_paths, excluding %SystemRoot% so WSL's bash.exe is never selected

The %SystemRoot% exclusion compares path components case-insensitively (via is_under_dir): Windows paths are case-insensitive but Path::starts_with is not, so a PATH entry spelled C:\WINDOWS\System32 against a %SystemRoot% of C:\Windows would otherwise slip past the guard and re-resolve WSL's bash. The comparison is component-wise (not a lowercased substring match) so a sibling like C:\Windows2 is not falsely treated as inside %SystemRoot%.

If none resolve, the resolver returns an actionable error before spawn instead of the cryptic 0x8007072c.

Bundled bash

The app is self-contained, so it cannot assume Git for Windows is installed. There is no standalone "bash for Windows" upstream — working Windows bash ships only inside git-for-windows. The build downloads PortableGit 2.54.0 and keeps only the MSYS2 bash runtime (usr/ + bin/), dropping the mingw64/ git-program subtree (git.exe etc., which the MCP shell never invokes) as one separable unit.

The retained runtime is kept whole — bash loads msys-2.0.dll and other libraries lazily, and load-bearing pieces (terminfo, gawk libs) live alongside the docs under usr/, so a hand-trimmed copy could pass an existence check yet fail mid-command with a cryptic error. Dropping mingw64/ removes a separable subtree without ever touching the bash runtime.

The download/extract/drop logic lives in scripts/stage-windows-bash.sh, a self-contained script with no release-binary precondition so CI can call it directly (see below). scripts/bundle-sidecars.sh invokes it for Windows targets.

The git-bash install-root directory name is a single path contract shared byte-identical across three sites, each carrying a cross-reference comment:

File Role
scripts/bundle-sidecars.sh stages the runtime to desktop/src-tauri/binaries/git-bash
desktop/scripts/build-release-config.mjs Windows-only bundle.resources source → install-root target git-bash
crates/buzz-dev-mcp/src/shell.rs resolves git-bash\bin\bash.exe relative to the executable at runtime

CI

The windows-rust job gains two gates on real windows-latest:

  • cargo test -p buzz-dev-mcp runs the cfg(windows) bash-resolution tests (including the case-insensitive %SystemRoot% exclusion) — the package was never tested on a Windows runner before, so these had no coverage.
  • A bundled-bash staging smoke step runs scripts/stage-windows-bash.sh and spawns the staged bash.exe on a real coreutils pipeline (echo hello | rev). This exercises the PortableGit download, the SFX -o extraction, the mingw64/ drop, and the lazily-loaded MSYS DLL closure — behaviors that previously ran only in release.yml on a tag, i.e. unexercised until a release reached users.

Notes

The UTF-16 garbling in the original bug report was the WSL launcher's own error output; genuine MSYS2 bash writes UTF-8, so no output decode is added — one would corrupt real command output. The garbling disappears once a real bash is spawned.

…dows

The MCP shell spawned a bare `Command::new("bash")`, which on Windows
re-enters PATH search and resolves `System32\bash.exe` (the WSL launcher).
That fails at spawn with 0x8007072c, so the agent's only channel-posting
transport was fully dead on Windows.

resolve_bash() now returns an absolute, non-WSL bash path and spawns it
directly: GIT_BASH override -> installed Git for Windows -> bundled bash
staged next to the exe -> PATH scan excluding %SystemRoot%. Unix is a no-op.
The System32 exclusion compares path components case-INsensitively because
Windows paths are case-insensitive but `Path::starts_with` is not — a PATH
entry spelled `C:\WINDOWS\System32` would otherwise leak WSL's bash.

To make a bare (Git-less) host work, the build bundles a genuine bash: the
PortableGit MSYS runtime (`usr/` + `bin/`) with the separable `mingw64/`
git-program subtree dropped. The runtime is kept whole — bash loads its
DLLs lazily, so a hand-trimmed copy would pass an existence check yet fail
mid-command. The `git-bash` install-root dir name is a single path contract
shared byte-identical across shell.rs, build-release-config.mjs, and
bundle-sidecars.sh.

The download/extract/drop logic lives in scripts/stage-windows-bash.sh so
the Windows PR CI job can stage the tree and spawn the staged bash on a real
coreutils pipeline — gating the SFX extraction, the post-`mingw64`-drop spawn,
and the lazily-loaded DLL closure that would otherwise ship unexercised until
a tagged release.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
@wpfleger96 wpfleger96 force-pushed the duncan/windows-shell-bash branch from 9466e28 to 211e588 Compare June 18, 2026 19:36
npub1mn7jgtj4w2pd0g0zeuhxsa6jy6p0rewxz4kujt98my82ahfmp72sxjexk7 and others added 7 commits June 18, 2026 16:09
The idempotency skip trusted a single file (bin/bash.exe) as proof of a
whole stage. A cp -a interrupted between the sibling bin/ and usr/ subtrees
can leave bash.exe present while usr/ is incomplete, and the skip would then
accept that partial runtime and ship it. Unreachable in CI (fresh $RUNNER_TEMP)
and release (fresh checkout), so not a shipping defect — eliminated proactively.

A .stage-complete marker written last, only after cp -a and the bash.exe
integrity check both pass, is positive proof the whole tree landed and is
immune to which specific files a future partial leaves behind. An interrupted
stage never writes it, so the skip falls through to a clean re-extract (the
pre-extract rm -rf already cleans the partial).

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
… docs

rust-1.95.0 clippy on the Windows toolchain flags two doc comments in
shell.rs where a numbered list is immediately followed by a left-margin prose
line, parsing the prose as a malformed lazy continuation of the last list item.
Under -D warnings this failed the Windows Rust job at the Clippy step, skipping
Check/Test/the bundled-bash smoke gate. A blank /// line between each list and
its trailing paragraph is the standard markdown separation; doc-only, no
behavior change.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The three 'absolute path outside workspace is allowed' tests hardcoded /etc/hosts as the out-of-workspace fixture. On Windows that resolves to C:\etc\hosts, which does not exist, so the assertions got a path-not-found error (os error 3) instead of the intended outcome. Each test now writes a real file into a second tempdir outside the workspace root, giving a genuinely-absolute existing path on both platforms.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The bundled-bash resolver and CI smoke-step targeted git-bash\bin\bash.exe,
which is git-for-windows's 47KB compat launcher shim, not bash. The shim
validates its install root by requiring a sibling mingw64\bin marker that
Option 2 deliberately drops, so it printed "Top-level not found" and exited
before running any command. Retarget the bundled spawn to the real MSYS2 bash
at git-bash\usr\bin\bash.exe, which boots from its co-located msys-2.0.dll
and needs only usr/. The installed-Git branch stays on bin\bash.exe: a real
Git-for-Windows install has mingw64/, so its launcher is the correct entry.

Also swap the smoke probe's `rev` (not bundled in PortableGit at all) for
`tr a-z A-Z`, which is bundled in usr/bin/ with the same msys-2.0.dll closure,
and assert the file we actually run (usr/bin/bash.exe) in the staging integrity
check rather than the shim.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The timeout_ms runaway-command guard was a no-op on Windows: the kill
path was three cfg(not(unix)) stubs, so on timeout TerminateProcess killed
the bash parent but orphaned its MSYS-forked grandchildren (e.g. sleep).
The orphans held the stdout/stderr pipes open, blocking the reap until they
self-exited — surfacing as shell::tests::timeout_fires eating ~986s of CI,
and meaning a real agent's bash -c hitting its timeout was never killed.

Replace the PID-keyed guard and free kill functions with a platform-
abstracted KillGroup: Unix keeps the killpg behavior byte-for-byte; Windows
assigns the child to a Job Object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE so
terminating the job kills the whole tree atomically — the structural mirror
of killpg. The job HANDLE is held in KillGroup for the whole run so Drop is
the last-resort reaper; closing it earlier would fire KILL_ON_JOB_CLOSE and
kill the child the instant spawn returned.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The shell tool's async future holds a KillGroup across an .await and is
spawned as a Send future, but on Windows KillGroup wraps a raw HANDLE
(*mut c_void) which is neither Send nor Sync — so the Windows clippy gate
failed to compile (5 errors, all from this one missing bound). A job-object
handle is a thread-safe kernel reference, so the impls are sound; the Unix
path was unaffected and already Send + Sync.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The Job Object kill path is raw windows-sys FFI and is inherently unsafe,
but the crate was #![forbid(unsafe_code)] — a hard wall an inner #[allow]
cannot relax. Split the lint by target: Unix keeps the hard no-unsafe
guarantee (forbid), Windows uses deny so the KillGroup FFI items can carry
a scoped #[allow(unsafe_code)]. No safe Win32 API exists for job objects,
mirroring the git-sign-nostr unsafe-FFI precedent.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
@wpfleger96 wpfleger96 merged commit 4d2eb52 into main Jun 18, 2026
29 checks passed
@wpfleger96 wpfleger96 deleted the duncan/windows-shell-bash branch June 18, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant