Skip to content

[jsapi] inconsistent utf-8 decoding #915

@xtuc

Description

@xtuc

Two UTF-8 decoders are specified:

From the WHATWG algorythm, the rules

  1. Set UTF-8 lower boundary to 0x80 and UTF-8 upper boundary to 0xBF.
  1. Set UTF-8 code point to (UTF-8 code point << 6) | (byte & 0x3F)

are not used in Wasm.

My understanding is that U+DC01 and U+FFFD should be equal in the JS API, as tested here

assert_sections(WebAssembly.Module.customSections(module, "na\uFFFDme"), [
bytes,
]);
assert_sections(WebAssembly.Module.customSections(module, "na\uDC01me"), [
bytes,
While in Wasm they would be considered as two different sections, which could cause sublte mismatchs.

Note that this is the only occurence of UTF-8 decoding in the JS spec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions