Skip to content

Replace em dash in serialize-dialog.js comment w/ ASCII dash#2203

Merged
this-is-shivamsingh merged 1 commit into
percy:masterfrom
davidrunger:patch-1
May 27, 2026
Merged

Replace em dash in serialize-dialog.js comment w/ ASCII dash#2203
this-is-shivamsingh merged 1 commit into
percy:masterfrom
davidrunger:patch-1

Conversation

@davidrunger

@davidrunger davidrunger commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

The em dash (U+2014 EM DASH) that is being replaced in this change with a simple dash (U+002D HYPHEN-MINUS) causes a problem for me (and maybe/probably other people). The em dash ultimately causes Ruby's HTTP library (which I use via the percy-capybara gem) to interpret a certain HTTP response from the Percy server as having a binary encoding, rather than a UTF-8 encoding, which then causes a warning from the json gem (/home/runner/work/david_runger/david_runger/vendor/bundle/ruby/4.0.0/gems/json-2.19.4/lib/json/common.rb:445: warning: JSON.generate: UTF-8 string passed as BINARY, this will raise an encoding error in json 3.0).

Replacing the em dash with a simple ASCII dash avoids this issue.

The non-ASCII dash was introduced in the latest released Percy version (1.31.12) via #2185.

The em dash (U+2014 EM DASH) being replaced in this change with a simple dash (U+002D HYPHEN-MINUS) causes a problem for me (and maybe/probably other people). The em dash ultimately causes Ruby's HTTP library (which I use via the `percy-capybara` gem) to interpret a certain HTTP response from the Percy server as having a binary encoding, rather than a UTF-8 encoding, which then causes a warning from the `json` gem (`/home/runner/work/david_runger/david_runger/vendor/bundle/ruby/4.0.0/gems/json-2.19.4/lib/json/common.rb:445: warning: JSON.generate: UTF-8 string passed as BINARY, this will raise an encoding error in json 3.0`).

Replacing the em dash with a simple ASCII dash avoids this issue.

The non-ASCII dash was introduced in the latest released Percy version (1.31.12) via percy#2185.
@davidrunger davidrunger requested a review from a team as a code owner April 29, 2026 05:23
@davidrunger

davidrunger commented Apr 29, 2026

Copy link
Copy Markdown
Contributor Author

As a side note, I see that the bottom of the PR #2185 (cc author @yashmahamulkar-bs) that introduced the issue that this PR aims to correct states:

🤖 Generated with Claude Code

I certainly appreciate that transparency!

LLMs are well known for their fondness for em dashes.

If it is possible to take some measure to minimize the likelihood of LLM agents such as Claude Code from introducing em dashes into code (e.g. by adding an AGENTS.md file to this GitHub repo with instructions not to use non-ASCII characters in code (including code comments)), then that would probably be valuable, to avoid issues similar to that which this PR aims to correct from being introduced by Claude again in the future.

@this-is-shivamsingh

Copy link
Copy Markdown
Contributor

Hi, @davidrunger thanks for raising the issue and appreciate your efforts on fixing this. I have ran the spec and will merge it as the specs passes.
Approved

@github-actions

Copy link
Copy Markdown
Contributor

This PR is stale because it has been open for more than 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions Bot added the 🍞 stale Closed due to inactivity label May 26, 2026
@davidrunger

davidrunger commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

I'm commenting to try to remove the stale label. I will appreciate if this PR can be merged. Thank you!

@this-is-shivamsingh this-is-shivamsingh merged commit 19ae309 into percy:master May 27, 2026
41 checks passed
@this-is-shivamsingh

Copy link
Copy Markdown
Contributor

Hi, @davidrunger thank you for making this contribution, your changes are released in beta version: 1.32.0-beta.2. In our next stable release cycle it will be available in stable release v1.32.0

@davidrunger davidrunger deleted the patch-1 branch May 27, 2026 17:21
@davidrunger

davidrunger commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

@this-is-shivamsingh Thank you very much for the merge and for releasing this change in a beta version (and for planning it for inclusion in the next stable release)!

I tried out the 1.32.0-beta.2 version, expecting that it would resolve the issue that I have been seeing, but, alas, it does not. I think that this is because, since the time when I submitted this pull request about 1 month ago (when I think that the em dash that is changed in this PR to a simple ASCII dash was the only non-ASCII character in the compiled @percy/dom/dist/bundle.js output), approximately 49 other non-ASCII characters have been introduced via various PRs, thereby reintroducing the original problem that motivated this pull request.

Here are a few semi-randomly selected examples of recent PRs that have introduced more non-ASCII characters (linking to specific lines that introduce non-ASCII characters):

  1. https://github.com/percy/cli/pull/2177/changes#diff-7208a668693a331a06457a39ec1a774f48475c7e1abf808d992f34b464ff558fR85 ([...] percyId → cloneEl [...])
  2. https://github.com/percy/cli/pull/2184/changes#diff-9824ccc3183bab9098281b8ae310bcea8c81433684fc02806e735ed97c66fe3cR87 ([...] this — see [...])
  3. https://github.com/percy/cli/pull/2228/changes#diff-8e7308c37c36858cdf01165e0ca74b92d369fee465b761b9f424330a38bbce98R28 ([...] clone — that [...])

I strongly suspect that probably almost all of the non-ASCII characters being added to the codebase were generated by LLM(s). That's why I had suggested in this comment that it might be a good idea to, for example, add an AGENTS.md file telling LLMs not to use non-ASCII characters in code.

Additionally, in case the LLMs are not perfectly obedient to that instruction (or in case a human tries to add a non-ASCII character to a code file), it would probably also be a good idea to add a check to CI that there are no non-ASCII characters in code files in the codebase, and which will fail if there are any non-ASCII characters in code files in the codebase.

Alongside those changes (i.e. something like an AGENTS.md file instruction not to use non-ASCII characters and a CI check that fails if there are any non-ASCII characters in code files), I believe that probably all of these non-ASCII characters (all of which, I think, have been added within the past month or so) should be removed from code files (which, indeed, would need to be done in order for the aforementioned CI check to be able to pass).

I realize that I'm sort of presuming to make a somewhat non-trivial proposal, but I think that this (or some other fix) is worth doing. As mentioned in my PR description, currently I think that users of the percy-capybara Ruby gem (maybe only those using it in conjunction with cuprite?) are currently facing a printed warning that is ultimately caused by these non-ASCII characters, but in the json (Ruby gem) version 3.0, that warning will be changed to an exception, in which case this issue will change from being a mere annoyance to a more serious blocker.

There might be (indeed, I think, there probably is) some alternative way to avoid the issue that is caused by non-ASCII characters in the compiled JavaScript file(s?), such as perhaps by making some change to the percy-capybara Ruby gem. I have not explored that possibility. Very possibly it might be worth making both the changes that I suggested for this repo and also making some change to the percy-capybara Ruby gem to force/declare an encoding for the relevant string(s) that will avoid the warning (later to become an exception) that I mentioned in this PR's description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍞 stale Closed due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants