Skip to content

Revert PR #237's GetConsoleOutputCP fix; cmd pipes use OEM, not console CP#239

Merged
FMXExpress merged 2 commits into
mainfrom
claude/revert-getconsoleoutputcp-use-oemcp
Jun 12, 2026
Merged

Revert PR #237's GetConsoleOutputCP fix; cmd pipes use OEM, not console CP#239
FMXExpress merged 2 commits into
mainfrom
claude/revert-getconsoleoutputcp-use-oemcp

Conversation

@FMXExpress

Copy link
Copy Markdown
Owner

Revert PR #237's review fix. Real-world Windows testing on Eli's box showed the GetConsoleOutputCP change actively regressed the bug it was supposed to address.

What went wrong

Trace from Eli's session:

  • His cmd console: chcp confirms CP437
  • His pasclaw.exe: launched from Windows Terminal which defaults to CP65001 (Win10+) → our process's GetConsoleOutputCP() returns 65001
  • DecodeShellOutputBytes runs MultiByteToWideChar(65001, [..., 0x82, ...], ...)
  • 0x82 is an invalid UTF-8 lead byte → Windows replaces with U+FFFD
  • UTF8Encode produces the bytes 0xEF 0xBF 0xBD for each
  • Model sees r�sum�.txt (where = U+FFFD) instead of résumé.txt

Same mojibake as before, just routed through a different codepage mismatch.

Why the Codex P2 reasoning was wrong

The review argued "what if the operator ran chcp 65001 first?" The flaw: chcp doesn't reliably switch cmd.exe's piped output encoding. cmd.exe writes redirected stdout in the OEM codepage regardless of the console's chcp setting (a documented Windows behavior). Our spawned cmd.exe children also don't inherit a parent shell's chcp — they're new processes attached to the same console.

So GetConsoleOutputCP returns the parent shell's console CP (our pasclaw.exe's terminal — 65001 from Windows Terminal), but cmd.exe's piped output uses the OEM CP (CP437). Decoding 437 bytes as 65001 is the mojibake we just observed.

GetOEMCP returns the system OEM default — which is what cmd.exe's pipe output actually uses. The legitimate "I want UTF-8 cmd output" case can still be handled by the operator running chcp 65001 && inside their shell_exec command (that affects the spawned cmd.exe's own internal codepage handling for that specific invocation).

Test plan

  • FPC make smoke clean.
  • Full make test green (36 targets).
  • Manual Windows: rebuild, run pasclaw agent -m "Run shell_exec to: cd %USERPROFILE%\.pasclaw && dir /b. Do you see résumé.txt?". Model should now see résumé.txt clean, not r�sum�.txt.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon


Generated by Claude Code

…le CP

Eli's Windows test showed PR #237's review fix actively regressed
the bug it was meant to address.

  cmd console at CP437 (his `chcp` confirms)
  pasclaw.exe launched from Windows Terminal -> our console CP=65001
  GetConsoleOutputCP returns 65001 in DecodeShellOutputBytes
  MultiByteToWideChar(65001, [0x82 from dir's CP437 output], ...)
  -> 0x82 is invalid UTF-8 lead byte
  -> replaces with U+FFFD
  -> UTF8Encode yields 0xEF 0xBF 0xBD per replaced byte
  -> model sees `r�sum�.txt` instead of `résumé.txt`

Exactly the original mojibake, just routed through a different
codepage mismatch.

The Codex P2 review reasoning ("operator ran chcp 65001 first") is
wrong about how cmd.exe handles piped output. `chcp` switches the
CONSOLE's codepage, but cmd.exe's redirected stdout uses the OEM
codepage regardless of the console's chcp setting. Our spawned
cmd.exe children also don't inherit a parent shell's chcp at all
(they're new processes attached to the same console). So
GetConsoleOutputCP returns the wrong CP for decoding piped cmd
output -- it returns OUR console's CP, not the OEM CP that cmd
actually used for its piped output.

GetOEMCP returns the system default (CP437 on US English) which is
what cmd.exe's pipe output ACTUALLY uses. The "I want UTF-8 output"
case (where the operator genuinely wants cmd in CP65001 mode) can
still be handled by putting `chcp 65001 &&` inside the shell_exec
command, since that affects the spawned cmd.exe's own internal
codepage handling.

FPC make smoke + full make test (36 targets) green. The fix needs
manual Windows verification: rebuild, run `dir /b` of a directory
with résumé.txt, model should see résumé.txt clean.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a56fc3771a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/pkg/platform/PasClaw.Platform.pas Outdated
running it inside their shell_exec command -- `cmd /c "chcp
65001 && dir /b > out.utf8 && type out.utf8"` -- but the
default decode path must match cmd.exe's pipe default. }
CP := GetOEMCP;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve UTF-8 output from PowerShell commands

When Windows shell_exec/execute_code runs a command that emits UTF-8 bytes, this unconditional OEM decode turns non-ASCII output into mojibake. A repo-visible path is Tool_ExecuteCodeImplBuildExecuteCodeArgv, which prefers pwsh on PATH for PowerShell scripts; Microsoft documents PowerShell 6+ as defaulting text output to UTF-8, so a script like Write-Output 'résumé' under pwsh produces UTF-8 bytes that are now decoded as CP437/CP OEM. The cmd.exe dir case may need OEM handling, but making the default decoder always GetOEMCP regresses UTF-8-producing children launched through the same RunOneShotWithEnv pipe.

Useful? React with 👍 / 👎.

Codex P2, legit. The straight revert to GetOEMCP fixed cmd.exe
pipe output (Eli's original bug) but regressed the pwsh case:
PowerShell 6+ defaults to UTF-8 stdout, so a script like
Write-Output 'résumé' from execute_code's pwsh branch produces
UTF-8 bytes that GetOEMCP would now decode as CP437 -> mojibake.

Auto-detection heuristic: try strict UTF-8 first
(MB_ERR_INVALID_CHARS), fall back to OEM only when validation
fails.

  Pure ASCII (most output)            -> valid UTF-8 (no-op)
  pwsh UTF-8 output (résumé etc.)     -> valid UTF-8, passes through
  cmd.exe CP437 non-ASCII (0x82 etc.) -> invalid UTF-8 lead byte,
                                         falls back to GetOEMCP=437

Robustness:
  - 0x82 from CP437 has binary 10000010, a UTF-8 continuation
    marker, NOT a valid lead byte -> reliably caught by
    MB_ERR_INVALID_CHARS
  - Multi-byte CP437 sequences for filenames similarly cluster
    bytes that don't form valid UTF-8 sequences
  - Edge case (OEM bytes that coincidentally form a valid UTF-8
    sequence) is vanishingly unlikely for filename / dir output

New test TestAutoDetectPrefersUTF8ForValidSequences pins the pwsh
case: 8 UTF-8 bytes for "résumé" pass through verbatim when
Codepage=0. The earlier CP437 tests (TestCP437SingleNonAsciiByte
etc.) continue exercising the OEM-fallback path with explicit
Codepage=437.

FPC make smoke + full make test (36 targets) green.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
@FMXExpress FMXExpress merged commit 84e61e6 into main Jun 12, 2026
FMXExpress pushed a commit that referenced this pull request Jun 13, 2026
Both changelog files were stuck at the 2026-06-10 TUI / background-
subagent / session_search / tool-RPC entries. ~7 weeks of merged
PRs (#214 through #244) had landed without a corresponding entry in
either file -- the git log was the only record of features like the
knowledgebase, inbound MCP server mode, checkpoints + /undo,
heartbeat daemon, shell backends (local | docker), send_message,
promptware defense, Windows shell-output encoding fixes, fs_grep
ripgrep optimisations, OpenTelemetry traces, and `pasclaw agent
--quiet`.

Added (newest first):

  2026-06-13
    #244  agent --quiet log-level clamp

  2026-06-12
    #243  dcc64 search path + agent --quiet / -q
    #242  OpenTelemetry traces (OTLP/HTTP+JSON)
    #241  fs_grep tier 5+6 (byte walker + Boyer-Moore-Horspool)
    #240  fs_grep tier 1-4 (deferred hash + skip-dirs + binary
          sniff + size cap)

  2026-06-11
    #239  Windows shell output: auto-detect UTF-8 vs OEM
    #238  Hashline fs_read header mojibake fix (¶ vs ¶)
    #237  Windows shell output: OEM codepage decode

  2026-06-10
    #233  Shell backends (local | docker via IShellBackend)
    #232  Heartbeat daemon + reversible condensation (CCR)
    #230  send_message tool + condenser families + promptware
          defense + task-aware MEMORY.md slicing

  2026-06-09
    #223  /goal Ralph loop + judge model + JSON-aware condenser
    #222  /undo wired into agent + sample dprojs
    #221  Checkpoints (auto-snapshot + TUI /undo)
    #220  mcp catalog: lenient hub projector
    #219  mcp: SSE / Streamable HTTP / stdio hub transports
    #218  Inbound MCP server mode
    #217  onboard: offer to index docs into the knowledgebase
    #214  Knowledgebase (RAG over operator-curated docs)

Skipped from this changelog (build-system / Delphi-fix / FPC-fix
PRs visible in the git log but not user-facing features):
#215, #216, #224, #225, #226, #227, #228, #229, #231, #234, #236.

Both files (README.md and docs/changelog.md) updated with the same
entries -- intentional mirror so a reader on either surface sees
the same history. Same prose in both; same footnote refs at the
bottom.

Validated: every [#N] reference in both files has a matching [#N]:
URL footnote (no broken links).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants