Revert PR #237's GetConsoleOutputCP fix; cmd pipes use OEM, not console CP#239
Conversation
…le CP Eli's Windows test showed PR #237's review fix actively regressed the bug it was meant to address. cmd console at CP437 (his `chcp` confirms) pasclaw.exe launched from Windows Terminal -> our console CP=65001 GetConsoleOutputCP returns 65001 in DecodeShellOutputBytes MultiByteToWideChar(65001, [0x82 from dir's CP437 output], ...) -> 0x82 is invalid UTF-8 lead byte -> replaces with U+FFFD -> UTF8Encode yields 0xEF 0xBF 0xBD per replaced byte -> model sees `r�sum�.txt` instead of `résumé.txt` Exactly the original mojibake, just routed through a different codepage mismatch. The Codex P2 review reasoning ("operator ran chcp 65001 first") is wrong about how cmd.exe handles piped output. `chcp` switches the CONSOLE's codepage, but cmd.exe's redirected stdout uses the OEM codepage regardless of the console's chcp setting. Our spawned cmd.exe children also don't inherit a parent shell's chcp at all (they're new processes attached to the same console). So GetConsoleOutputCP returns the wrong CP for decoding piped cmd output -- it returns OUR console's CP, not the OEM CP that cmd actually used for its piped output. GetOEMCP returns the system default (CP437 on US English) which is what cmd.exe's pipe output ACTUALLY uses. The "I want UTF-8 output" case (where the operator genuinely wants cmd in CP65001 mode) can still be handled by putting `chcp 65001 &&` inside the shell_exec command, since that affects the spawned cmd.exe's own internal codepage handling. FPC make smoke + full make test (36 targets) green. The fix needs manual Windows verification: rebuild, run `dir /b` of a directory with résumé.txt, model should see résumé.txt clean. https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a56fc3771a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| running it inside their shell_exec command -- `cmd /c "chcp | ||
| 65001 && dir /b > out.utf8 && type out.utf8"` -- but the | ||
| default decode path must match cmd.exe's pipe default. } | ||
| CP := GetOEMCP; |
There was a problem hiding this comment.
Preserve UTF-8 output from PowerShell commands
When Windows shell_exec/execute_code runs a command that emits UTF-8 bytes, this unconditional OEM decode turns non-ASCII output into mojibake. A repo-visible path is Tool_ExecuteCodeImpl → BuildExecuteCodeArgv, which prefers pwsh on PATH for PowerShell scripts; Microsoft documents PowerShell 6+ as defaulting text output to UTF-8, so a script like Write-Output 'résumé' under pwsh produces UTF-8 bytes that are now decoded as CP437/CP OEM. The cmd.exe dir case may need OEM handling, but making the default decoder always GetOEMCP regresses UTF-8-producing children launched through the same RunOneShotWithEnv pipe.
Useful? React with 👍 / 👎.
Codex P2, legit. The straight revert to GetOEMCP fixed cmd.exe
pipe output (Eli's original bug) but regressed the pwsh case:
PowerShell 6+ defaults to UTF-8 stdout, so a script like
Write-Output 'résumé' from execute_code's pwsh branch produces
UTF-8 bytes that GetOEMCP would now decode as CP437 -> mojibake.
Auto-detection heuristic: try strict UTF-8 first
(MB_ERR_INVALID_CHARS), fall back to OEM only when validation
fails.
Pure ASCII (most output) -> valid UTF-8 (no-op)
pwsh UTF-8 output (résumé etc.) -> valid UTF-8, passes through
cmd.exe CP437 non-ASCII (0x82 etc.) -> invalid UTF-8 lead byte,
falls back to GetOEMCP=437
Robustness:
- 0x82 from CP437 has binary 10000010, a UTF-8 continuation
marker, NOT a valid lead byte -> reliably caught by
MB_ERR_INVALID_CHARS
- Multi-byte CP437 sequences for filenames similarly cluster
bytes that don't form valid UTF-8 sequences
- Edge case (OEM bytes that coincidentally form a valid UTF-8
sequence) is vanishingly unlikely for filename / dir output
New test TestAutoDetectPrefersUTF8ForValidSequences pins the pwsh
case: 8 UTF-8 bytes for "résumé" pass through verbatim when
Codepage=0. The earlier CP437 tests (TestCP437SingleNonAsciiByte
etc.) continue exercising the OEM-fallback path with explicit
Codepage=437.
FPC make smoke + full make test (36 targets) green.
https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
Both changelog files were stuck at the 2026-06-10 TUI / background- subagent / session_search / tool-RPC entries. ~7 weeks of merged PRs (#214 through #244) had landed without a corresponding entry in either file -- the git log was the only record of features like the knowledgebase, inbound MCP server mode, checkpoints + /undo, heartbeat daemon, shell backends (local | docker), send_message, promptware defense, Windows shell-output encoding fixes, fs_grep ripgrep optimisations, OpenTelemetry traces, and `pasclaw agent --quiet`. Added (newest first): 2026-06-13 #244 agent --quiet log-level clamp 2026-06-12 #243 dcc64 search path + agent --quiet / -q #242 OpenTelemetry traces (OTLP/HTTP+JSON) #241 fs_grep tier 5+6 (byte walker + Boyer-Moore-Horspool) #240 fs_grep tier 1-4 (deferred hash + skip-dirs + binary sniff + size cap) 2026-06-11 #239 Windows shell output: auto-detect UTF-8 vs OEM #238 Hashline fs_read header mojibake fix (¶ vs ¶) #237 Windows shell output: OEM codepage decode 2026-06-10 #233 Shell backends (local | docker via IShellBackend) #232 Heartbeat daemon + reversible condensation (CCR) #230 send_message tool + condenser families + promptware defense + task-aware MEMORY.md slicing 2026-06-09 #223 /goal Ralph loop + judge model + JSON-aware condenser #222 /undo wired into agent + sample dprojs #221 Checkpoints (auto-snapshot + TUI /undo) #220 mcp catalog: lenient hub projector #219 mcp: SSE / Streamable HTTP / stdio hub transports #218 Inbound MCP server mode #217 onboard: offer to index docs into the knowledgebase #214 Knowledgebase (RAG over operator-curated docs) Skipped from this changelog (build-system / Delphi-fix / FPC-fix PRs visible in the git log but not user-facing features): #215, #216, #224, #225, #226, #227, #228, #229, #231, #234, #236. Both files (README.md and docs/changelog.md) updated with the same entries -- intentional mirror so a reader on either surface sees the same history. Same prose in both; same footnote refs at the bottom. Validated: every [#N] reference in both files has a matching [#N]: URL footnote (no broken links).
Revert PR #237's review fix. Real-world Windows testing on Eli's box showed the
GetConsoleOutputCPchange actively regressed the bug it was supposed to address.What went wrong
Trace from Eli's session:
chcpconfirms CP437GetConsoleOutputCP()returns 65001DecodeShellOutputBytesrunsMultiByteToWideChar(65001, [..., 0x82, ...], ...)UTF8Encodeproduces the bytes 0xEF 0xBF 0xBD for eachr�sum�.txt(where�= U+FFFD) instead ofrésumé.txtSame mojibake as before, just routed through a different codepage mismatch.
Why the Codex P2 reasoning was wrong
The review argued "what if the operator ran
chcp 65001first?" The flaw:chcpdoesn't reliably switch cmd.exe's piped output encoding. cmd.exe writes redirected stdout in the OEM codepage regardless of the console's chcp setting (a documented Windows behavior). Our spawned cmd.exe children also don't inherit a parent shell's chcp — they're new processes attached to the same console.So
GetConsoleOutputCPreturns the parent shell's console CP (our pasclaw.exe's terminal — 65001 from Windows Terminal), but cmd.exe's piped output uses the OEM CP (CP437). Decoding 437 bytes as 65001 is the mojibake we just observed.GetOEMCPreturns the system OEM default — which is what cmd.exe's pipe output actually uses. The legitimate "I want UTF-8 cmd output" case can still be handled by the operator runningchcp 65001 &&inside theirshell_execcommand (that affects the spawned cmd.exe's own internal codepage handling for that specific invocation).Test plan
make smokeclean.make testgreen (36 targets).pasclaw agent -m "Run shell_exec to: cd %USERPROFILE%\.pasclaw && dir /b. Do you see résumé.txt?". Model should now seerésumé.txtclean, notr�sum�.txt.https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
Generated by Claude Code