fix: emit real unified diff for remote workspace pending changes (#429)#430
Merged
Conversation
RemoteWorkspaceBackend::build_full_file_diff() was not a unified diff. It dumped the entire old content as '-' lines and the entire new content as '+' lines regardless of how small the actual change was. Agents that called workspace_git_diff to verify a surgical workspace_edit saw their whole file marked for removal and re-addition, and consistently misinterpreted this as 'whole-file line-ending weather' — concluding the edit had nuked the file even though the actual write to GitHub was correct. Real example: the world-of-wordpress agent hit this pattern three cycles in a row on the Observatory page in run 26050485830. A one-line <h2> heading change showed up as '@@ -1,250 +1,278 @@' followed by every line of the file removed then re-added, and the agent reverted its (correct) edit to avoid committing a noisy rewrite. This change replaces the fake whole-file diff with a real unified-diff generator: - Common-prefix/suffix trimming on the line arrays, so the O(ND) core only runs on the actually-different middle window. Typical 'surgical edit in large file' cases finish in O(N) instead of O(N^2). - Myers' O(ND) diff algorithm with V-array trace, walked backwards to reconstruct an ordered edit script. - Hunk grouping with three lines of context on each side, exactly the format git diff --no-color produces. - Proper @@ -start,count +start,count @@ markers, including the 0,0 edge cases for pure addition and pure deletion. Verified on seven representative cases: - two-line surgical (replaces phantom whole-file replace with single context line + one -/+ pair) - identical content (header only, no hunks) - pure addition (@@ -0,0 +1,N @@ matching git) - pure deletion (@@ -1,N +0,0 @@ matching git) - middle change in 10-line file (correct hunk with 3-line context) - two separate hunks (correctly split into two @@ blocks) - observatory-like 250-line file with one change at line 125 (7-line hunk instead of 528-line phantom diff) Existing smoke test passes unchanged (the -return 'old';/+return 'new'; assertions still match real unified diff output). Two new assertions lock in the new behavior: a real hunk header is emitted, and unchanged lines appear as space-prefixed context rather than as -/+ pairs. Closes #429. AI assistance: Yes - Claude Code (Sonnet 4.5) diagnosed the agent 'line-ending weather' pattern from the world-creator transcript, traced it to build_full_file_diff, implemented Myers' algorithm in PHP, and verified the output against representative cases including the Observatory-sized file. Chris confirmed the upstream-fix-first approach over agent-side workarounds.
Contributor
Homeboy Results —
|
Pre-existing phpstan finding surfaced by CI's --changed-since scoping when the diff generator overhaul touched this same file. $count > 0 at line 357 guarantees strpos cannot return false, but phpstan can't see that across the early-return. Make it explicit so static analysis is clean. No behavior change — the false branch falls back to the original content (which is unreachable anyway given the prior check).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RemoteWorkspaceBackend::build_full_file_diff()was not a unified diff. It dumped the entire old content as-lines and the entire new content as+lines regardless of how small the actual change was. Agents callingworkspace_git_diffto verify a surgicalworkspace_editsaw their whole file marked for removal and re-addition, and consistently misinterpreted this as "whole-file line-ending weather" — concluding the edit had nuked the file even though the actual write to GitHub was correct.Real example: the world-of-wordpress agent hit this pattern three cycles in a row on the Observatory page in run 26050485830. A one-line
<h2>heading change showed up as@@ -1,250 +1,278 @@followed by every line removed then re-added, and the agent reverted its (correct) edit to avoid committing a noisy rewrite. The agent's restraint was correct given what it saw — but it was based on a phantom diff.What this change does
Replaces the fake whole-file diff with a real unified-diff generator:
git diff --no-colorproduces.@@ -start,count +start,count @@markers, including the 0,0 edge cases for pure addition and pure deletion.No new dependencies. ~280 lines of well-commented PHP added to a single private helper cluster.
Before vs after
Before (Observatory-like 250-line file with one change at line 125):
501 diff lines for a 1-line change. Agent panics.
After:
7 diff lines, real context, agent sees the actual change.
Verification
Tested against seven representative cases:
return 'old'→return 'new')@@ -0,0 +1,N @@matching git@@ -1,N +0,0 @@matching git@@blockstests/smoke-remote-workspace-backend.phppasses 15/15 assertions:-return 'old';and+return 'new';)@@ -1,2 +1,2 @@is emitted<?phpline appears as space-prefixed context, not as a-/+pairhomeboy lint --changed-since origin/main→ 0 findings.homeboy test --changed-since origin/main→ passed.Why this is upstream-first
The world creator's daily memory describes this as "line-ending weather" — a real perception bug that has already shaped three days of agent decisions. Per the site's "fix upstream first, never paper over" rule, the right move is to fix DMC so every consumer of
workspace_git_diffsees real diffs, rather than papering it over in the world-creator bundle prompt with "ignore the diff, trust the edit return value."AI assistance
build_full_file_diff, implemented Myers' algorithm in PHP, and verified the output against representative cases including the Observatory-sized file. Chris confirmed the upstream-fix-first approach over agent-side workarounds.Closes #429.