Conversation
|
cc @erights @waldemarhorwat @gibson042 Please take a look. |
7843108 to
a839978
Compare
| 1. Let _otherPunctuators_ be the string-concatenation of *",-=<>#&!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK). | ||
| 1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_). | ||
| 1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace| or |LineTerminator|, then | ||
| 1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ is in the inclusive interval from U+D800 to U+DFFF (i.e. _c_ is a leading surrogate or trailing surrogate), then |
There was a problem hiding this comment.
1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ is a leading surrogate or a trailing surrogate, then
There was a problem hiding this comment.
Strictly speaking, leading and trailing surrogates are defined to be code units, not code points. I figure it's close enough for the parenthetical, but not the actual algorithm step. I could change the parenthetical to be "c is the code point corresponding to a leading surrogate or trailing surrogate" but that's getting pretty wordy.
There was a problem hiding this comment.
other than the oxford comma, this suggestion is a nit, so i'm ambivalent
|
How do I see a rendered form of this? |
|
I've put up a rendering here. |
|
Thanks. FWIW LGTM, but I delegate an approval decision to @gibson042 who understands this better than I do. |
|
Thanks for the rendering. Cut down the review effort for me by a huge factor. |
| 1. If _c_ is matched by |SyntaxCharacter| or _c_ is U+002F (SOLIDUS), then | ||
| 1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_). |
There was a problem hiding this comment.
This strikes me as a point-in-time snapshot, and I'm not a fan because it will be weird if e.g. \@ becomes a valid escape in the future. My preference remains the universally applicable \x…/\u… approach. But that said, there is no technical issue here; merely an aesthetic one.
| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. | ||
| 1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_. | ||
| 1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or |AsciiLetter|, then | ||
| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. |
There was a problem hiding this comment.
Nit: this sentence is too long to not use a parenthetical.
| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. | |
| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text, which may be used after a `\0` character escape or a |DecimalEscape| such as `\1`, and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. |
There was a problem hiding this comment.
Those commas read as ungrammatical to me. In particular the second comma cuts a clause in the middle - the pattern text "maybe used after \0 [...] and still match S". Open to other rephrasing here but I don't like this particular suggestion.
| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. | ||
| 1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_. | ||
| 1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or |AsciiLetter|, then | ||
| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. |
There was a problem hiding this comment.
Also, TIL \c0 is a valid escape. This also prevents new RegExp("\\" + escape('n')) from combining.
Commits should be reviewed individually. Summary:
\$etc where possibly; this is not possible for all punctuators, but it is for some\netc where possiblex-mode RegExps, and someone writesnew RegExp(RegExp.escape('\uHEAD') + RegExp.escape('\uTAIL'), 'x')where\uHEAD\uTAILencodes that non-BMP whitespace character, the resulting RegExp will match the character instead of being interpreted as whitespace and therefore (inx-mode) ignorednew RegExp('\c' + RegExp.escape('Z'))will either be an error (inu- orv-mode RegExps) or match the string'\\cZ'(in other RegExps); this also has the effect that the result cannot combine with a preceding\xor\uwhen the first character isA-Fora-f. Fixes Which leading characters should be escaped? #66.