In #9, we decided to avoid reference-typed string cursors, for efficiency. However this opens up the door to user-fabricated cursor values, perhaps even users that attempt to use integer cursor values from 0 to the string length. For the standard cursor meanings that we anticipate for UTF-8 and UTF-16 strings, this can work for some strings but not for others. This also opens up a hazard in that the same WebAssembly program can behave differently on different implementations.
Possible solutions:
Opaque but scalar cursor type
As @jakobkummerow notes on #9:
To get opacity of cursor values, an option would be to introduce an opaque type for them, which (contrary to the "stringcursor is a 3-tuple" approach) is designed such that engines can implement it by using a plain integer under the hood; in other words: stick with the unboxing currently drafted here, but replace a few occurrences of i32 with stringcursor. (As a hacky alternative that gets by without a new type, externref could be used as cursor type -- it wouldn't actually be a reference, but it would be opaque.) I'm not sure whether that's worth it though: Wasm is intended to be tool-generated, not human-written, so we don't really need guardrails against human coding errors; there are plenty of existing precedents of plain i32 values that have specific semantic meaning in the context where they're used, e.g. performing random i32 arithmetic on memory offsets typically doesn't make much sense either.
Specified cursor meanings for different embeddings
One big hazard to avoid is that a program behaves differently on $browser1 versus $browser2. But, perhaps it's not a huge deal if a program might behave differently on a server-side embedding. Maybe the important thing to do is to specify the meaning of cursors in different embeddings. For example, on the web, a cursor could be defined to be a code unit offset. Some aspects of doing this:
- This opens up the door for JS users to communicate positions in strings to WebAssembly programs via JS string positions.
- It restricts implementation freedom (boo).
- It reduces implementation-defined behavior (yay).
- How should the core spec make this request? Perhaps this requirement is just implicit from the core point of view and made explicit only as part of the JS embedding spec.
In #9, we decided to avoid reference-typed string cursors, for efficiency. However this opens up the door to user-fabricated cursor values, perhaps even users that attempt to use integer cursor values from 0 to the string length. For the standard cursor meanings that we anticipate for UTF-8 and UTF-16 strings, this can work for some strings but not for others. This also opens up a hazard in that the same WebAssembly program can behave differently on different implementations.
Possible solutions:
Opaque but scalar cursor type
As @jakobkummerow notes on #9:
Specified cursor meanings for different embeddings
One big hazard to avoid is that a program behaves differently on $browser1 versus $browser2. But, perhaps it's not a huge deal if a program might behave differently on a server-side embedding. Maybe the important thing to do is to specify the meaning of cursors in different embeddings. For example, on the web, a cursor could be defined to be a code unit offset. Some aspects of doing this: