Address feedback from plenary by bakkot · Pull Request #77 · tc39/proposal-regex-escaping

bakkot · 2024-04-09T05:07:53Z

Commits should be reviewed individually. Summary:

use \$ etc where possibly; this is not possible for all punctuators, but it is for some
also escape line terminators; this was just an oversight on my part
use \n etc where possible
if given a lone surrogate, escape it, so that if there is someday a non-BMP whitespace character, and we get x-mode RegExps, and someone writes new RegExp(RegExp.escape('\uHEAD') + RegExp.escape('\uTAIL'), 'x') where \uHEAD\uTAIL encodes that non-BMP whitespace character, the resulting RegExp will match the character instead of being interpreted as whitespace and therefore (in x-mode) ignored
hex-escape ASCII letters at the start of the string so that new RegExp('\c' + RegExp.escape('Z')) will either be an error (in u- or v-mode RegExps) or match the string '\\cZ' (in other RegExps); this also has the effect that the result cannot combine with a preceding \x or \u when the first character is A-F or a-f. Fixes Which leading characters should be escaped? #66.

bakkot · 2024-04-09T05:08:31Z

cc @erights @waldemarhorwat @gibson042 Please take a look.

ljharb · 2024-04-09T05:12:55Z

          1. Let _otherPunctuators_ be the string-concatenation of *",-=<>#&!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK).
          1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_).
-          1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace| or |LineTerminator|, then
+          1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ is in the inclusive interval from U+D800 to U+DFFF (i.e. _c_ is a leading surrogate or trailing surrogate), then


1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ is a leading surrogate or a trailing surrogate, then

Strictly speaking, leading and trailing surrogates are defined to be code units, not code points. I figure it's close enough for the parenthetical, but not the actual algorithm step. I could change the parenthetical to be "c is the code point corresponding to a leading surrogate or trailing surrogate" but that's getting pretty wordy.

other than the oxford comma, this suggestion is a nit, so i'm ambivalent

erights · 2024-04-09T21:33:04Z

How do I see a rendered form of this?

bakkot · 2024-04-09T22:00:20Z

I've put up a rendering here.

erights · 2024-04-09T22:13:15Z

Thanks. FWIW LGTM, but I delegate an approval decision to @gibson042 who understands this better than I do.

erights · 2024-04-09T22:13:51Z

Thanks for the rendering. Cut down the review effort for me by a huge factor.

gibson042

Nice catch on LineTerminator! I still prefer the universally applicable \x…/\u… approach, but this does work (even if it could get stale).

gibson042 · 2024-04-12T04:09:47Z

+          1. If _c_ is matched by |SyntaxCharacter| or _c_ is U+002F (SOLIDUS), then
+            1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_).


This strikes me as a point-in-time snapshot, and I'm not a fan because it will be weird if e.g. \@ becomes a valid escape in the future. My preference remains the universally applicable \x…/\u… approach. But that said, there is no technical issue here; merely an aesthetic one.

jridgewell · 2024-04-12T22:53:06Z

-              1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence.
-              1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_.
+            1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or |AsciiLetter|, then
+              1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.


Nit: this sentence is too long to not use a parenthetical.

Suggested change

1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.

1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text, which may be used after a `\0` character escape or a |DecimalEscape| such as `\1`, and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.

Those commas read as ungrammatical to me. In particular the second comma cuts a clause in the middle - the pattern text "maybe used after \0 [...] and still match S". Open to other rephrasing here but I don't like this particular suggestion.

jridgewell · 2024-04-12T22:54:23Z

-              1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence.
-              1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_.
+            1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or |AsciiLetter|, then
+              1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.


Also, TIL \c0 is a valid escape. This also prevents new RegExp("\\" + escape('n')) from combining.

bakkot added 5 commits April 8, 2024 20:49

use identity escapes where possible

4221ff9

also escape line terminators

07fc1bd

use control escapes where possible

b51ecb4

also escape lone surrogates

eef281a

also escape leading ascii letters

27eee05

bakkot force-pushed the address-feedback branch 2 times, most recently from 7843108 to a839978 Compare April 9, 2024 05:13

ljharb reviewed Apr 9, 2024

View reviewed changes

ljharb requested review from erights, gibson042 and waldemarhorwat April 9, 2024 15:22

gibson042 approved these changes Apr 12, 2024

View reviewed changes

jridgewell approved these changes Apr 12, 2024

View reviewed changes

ljharb force-pushed the address-feedback branch from 82c6d6f to 27eee05 Compare May 13, 2024 05:03

ljharb approved these changes May 13, 2024

View reviewed changes

ljharb merged commit 2481aa8 into tc39:main May 13, 2024

bakkot deleted the address-feedback branch May 13, 2024 05:35

gibson042 mentioned this pull request May 16, 2024

Path to Stage 4! #58

Closed

33 tasks

BuloZB mentioned this pull request May 19, 2024

[Snyk] Upgrade core-js from 3.1.4 to 3.37.0 BuloZB/ClassicGuildBank#2

Open

dariosalvi78 mentioned this pull request May 19, 2024

[Snyk] Upgrade core-js from 3.36.1 to 3.37.0 Mobistudy/MobistudyWeb#408

Merged

cb-sl mentioned this pull request May 20, 2024

[Snyk-dev] Upgrade core-js from 3.22.8 to 3.37.0 cb-sl/cb-juice-shop#5

Open

julitabarelkowska mentioned this pull request May 20, 2024

[Snyk] Upgrade core-js from 3.15.1 to 3.37.0 julitabarelkowska/bridge#8

Open

vinit-aikido mentioned this pull request May 20, 2024

[Snyk] Upgrade core-js-bundle from 3.19.3 to 3.37.0 vinit-aikido/terragoat#37

Open

Shimizu-Kanaden-sys mentioned this pull request May 20, 2024

[Snyk] Upgrade core-js from 3.22.8 to 3.37.0 Shimizu-Kanaden-sys/juice-shop#10

Open

olokotoh mentioned this pull request May 21, 2024

[Snyk] Upgrade core-js from 3.23.2 to 3.37.0 olokotoh/semaphore#1

Open

nejidevelops mentioned this pull request Oct 2, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#86

Open

larkinwang20 mentioned this pull request Oct 3, 2024

[Snyk] Upgrade core-js from 3.13.1 to 3.38.1 larkinwang20/ssg--vue23-#264

Open

X-oss-byte mentioned this pull request Oct 3, 2024

[Snyk] Upgrade core-js from 3.6.5 to 3.38.1 X-oss-byte/renovate#789

Open

nejidevelops mentioned this pull request Oct 3, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#91

Open

larkinwang20 mentioned this pull request Oct 4, 2024

[Snyk] Upgrade core-js from 3.13.1 to 3.38.1 larkinwang20/ssg--vue23-#270

Open

nejidevelops mentioned this pull request Oct 4, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#96

Open

terrorizer1980 mentioned this pull request Oct 5, 2024

[Snyk] Upgrade core-js-pure from 3.6.5 to 3.38.1 terrorizer1980/Prebid.js#97

Closed

This was referenced Oct 6, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#101

Open

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#106

Open

arpitsourcefuse mentioned this pull request Oct 24, 2024

[Snyk] Upgrade core-js from 3.8.3 to 3.38.1 sourcefuse/superset#8

Open

snyk-io Bot mentioned this pull request Oct 25, 2024

[Snyk] Upgrade core-js from 3.35.0 to 3.38.1 Opetushallitus/oma-opintopolku-loki#386

Merged

larkinwang20 mentioned this pull request Oct 27, 2024

[Snyk] Upgrade core-js from 3.13.1 to 3.38.1 larkinwang20/ssg--vue23-#289

Open

X-oss-byte mentioned this pull request Oct 28, 2024

[Snyk] Upgrade core-js from 3.6.5 to 3.38.1 X-oss-byte/renovate#825

Open

nejidevelops mentioned this pull request Oct 28, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#131

Open

nfgallimore mentioned this pull request Oct 29, 2024

[Snyk] Upgrade core-js from 3.26.0 to 3.38.1 Gallimore-Software/gallimore-inventory-manager#10

Open

nejidevelops mentioned this pull request Oct 31, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#136

Open

samul-1 mentioned this pull request Oct 31, 2024

[Snyk] Upgrade core-js from 3.9.1 to 3.38.1 samul-1/sai_exams_frontend#186

Open

Fae-The-Tech-Witch mentioned this pull request Nov 3, 2024

[Snyk] Upgrade core-js from 3.24.1 to 3.38.1 Fae-The-Tech-Witch/quba-viewer#4

Open

larkinwang20 mentioned this pull request Nov 3, 2024

[Snyk] Upgrade core-js from 3.13.1 to 3.38.1 larkinwang20/ssg--vue23-#295

Open

X-oss-byte mentioned this pull request Nov 4, 2024

[Snyk] Upgrade core-js from 3.6.5 to 3.38.1 X-oss-byte/renovate#832

Open

This was referenced Nov 5, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#141

Open

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#146

Open

snykshane mentioned this pull request Nov 7, 2024

[Snyk] Upgrade core-js from 3.22.8 to 3.38.1 snykshane/juice-shop#22

Open

blacklight mentioned this pull request Nov 11, 2024

[Snyk] Upgrade core-js from 3.37.1 to 3.38.1 blacklight/platypush#455

Merged

X-oss-byte mentioned this pull request Nov 11, 2024

[Snyk] Upgrade core-js from 3.6.5 to 3.38.1 X-oss-byte/renovate#834

Open

nejidevelops mentioned this pull request Nov 12, 2024

[Snyk] Upgrade core-js from 3.12.0 to 3.38.1 nejidevelops/my-static-web-app-and-api#151

Open

ykkjobner mentioned this pull request Nov 12, 2024

[Snyk] Upgrade core-js from 3.6.5 to 3.38.1 ykkjobner/mattermost-plugin-github#2

Open

jondmarien mentioned this pull request Nov 13, 2024

[Snyk] Upgrade core-js from 3.11.2 to 3.38.1 jondmarien/vue-task-tracker#34

Closed

This was referenced Nov 26, 2024

Reference for stage 3 regex-escaping mdn/content#36928

Merged

Fix examples in readme #84

Merged

		1. If _c_ is matched by \|SyntaxCharacter\| or _c_ is U+002F (SOLIDUS), then
		1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_).

	1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a \|DecimalEscape\| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
	1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text, which may be used after a `\0` character escape or a \|DecimalEscape\| such as `\1`, and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.

Conversation

bakkot commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bakkot commented Apr 9, 2024

Uh oh!

ljharb Apr 9, 2024

Choose a reason for hiding this comment

Uh oh!

bakkot Apr 9, 2024

Choose a reason for hiding this comment

Uh oh!

ljharb Apr 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erights commented Apr 9, 2024

Uh oh!

bakkot commented Apr 9, 2024

Uh oh!

erights commented Apr 9, 2024

Uh oh!

erights commented Apr 9, 2024

Uh oh!

gibson042 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gibson042 Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

jridgewell Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

bakkot May 4, 2024

Choose a reason for hiding this comment

Uh oh!

jridgewell Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bakkot commented Apr 9, 2024 •

edited

Loading

gibson042 left a comment •

edited

Loading