Change representation of string cursors to just i32 by wingo · Pull Request #9 · wingo/stringrefs

wingo · 2021-10-22T08:55:17Z

See the discussion in #6.

Fixes #6 and #8.

See the discussion in #6. Fixes #6 and #8.

wingo · 2021-10-25T07:47:11Z

A couple questions still to address here:

Nondeterminism. We should avoid creating a situation where a WebAssembly program will behave differently on $browser1 versus $browser2. Should we go farther than encouraging embeddings to define cursor meanings, and formally require them to do so?
Programmer sloppiness. With opaque cursors, the set of valid cursors is only discoverable via string.advance / string.rewind / string.encode, and this behavior doesn't depend on the contents of the string. With i32 cursors, you could just increment the cursor from 0 and it would work for ASCII strings, for the suggested meanings of cursors (code units or byte offset), but trap for strings with codepoints above some limit. Need to discuss this.

jakobkummerow · 2021-10-25T16:29:37Z

I think I prefer this over the initial version, because it makes things more explicit, and allows for simpler engines.

To get opacity of cursor values, an option would be to introduce an opaque type for them, which (contrary to the "stringcursor is a 3-tuple" approach) is designed such that engines can implement it by using a plain integer under the hood; in other words: stick with the unboxing currently drafted here, but replace a few occurrences of i32 with stringcursor. (As a hacky alternative that gets by without a new type, externref could be used as cursor type -- it wouldn't actually be a reference, but it would be opaque.) I'm not sure whether that's worth it though: Wasm is intended to be tool-generated, not human-written, so we don't really need guardrails against human coding errors; there are plenty of existing precedents of plain i32 values that have specific semantic meaning in the context where they're used, e.g. performing random i32 arithmetic on memory offsets typically doesn't make much sense either.

The string.measure instruction, instead of returning a pair bytes:i32, valid:i32 which is either (num_bytes, 1) or (0, 0), could also return a single bytes:i32 where -1 indicates "invalid". Benefit: slightly simpler; disadvantage: makes it impossible to handle strings taking more than 2 GiB (whereas with the pair, the bytes value could be interpreted as u32, allowing up to 4 GiB strings; though it remains to be seen whether other constraints will get in the way, ruling that out anyway).

I am a bit worried about cursor validity checks, especially checking for cursors pointing at the second half of a surrogate pair. I expect that well-formed, bug-free modules will never use such cursors (so it would be sad if engines were forced to spend lots of CPU cycles on this check), but we do have to specify what happens if a module does create that situation. I think it would be best if we silently treated such a second-half-of-a-surrogate-pair like a lone surrogate (which is probably the behavior that would arise from an implementation that doesn't specifically check for this case).

wingo · 2021-10-26T08:07:56Z

Thanks for the feedback @jakobkummerow ! I think given the general OK, I will take the opaque-cursor, string-measure, and validity-check-cost questions to the separate issues -- they certainly need a good answer!

Change representation of string cursors to just i32

d0f53ad

See the discussion in #6. Fixes #6 and #8.

wingo linked an issue Oct 22, 2021 that may be closed by this pull request

Provide "current codepoint" accessor? #8

Closed

wingo merged commit 111b0ee into main Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change representation of string cursors to just i32#9

Change representation of string cursors to just i32#9
wingo merged 1 commit into
mainfrom
no-cursors

wingo commented Oct 22, 2021

Uh oh!

wingo commented Oct 25, 2021

Uh oh!

jakobkummerow commented Oct 25, 2021

Uh oh!

wingo commented Oct 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wingo commented Oct 22, 2021

Uh oh!

wingo commented Oct 25, 2021

Uh oh!

jakobkummerow commented Oct 25, 2021

Uh oh!

wingo commented Oct 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants