Skip to content

Escaped unicode characters are encoded incorrectly #338

@fsoikin

Description

@fsoikin

In strings with espaced Unicode characters (i.e. "\uABCD"), characters in the range from D800 to DFFF always get encoded as FFFD (65533 decimal).

Here's from the F# Interactive:

    > "\uD500".[0] |> uint16 ;;
    val it : uint16 = 54528us
    > "\uD700".[0] |> uint16 ;;
    val it : uint16 = 55040us
    > "\uD800".[0] |> uint16 ;;
    val it : uint16 = 65533us
    > "\uD900".[0] |> uint16 ;;
    val it : uint16 = 65533us
    > "\uE000".[0] |> uint16 ;;
    val it : uint16 = 57344us

Here's an SO question that prompted me to fiddle with it: http://stackoverflow.com/questions/29359408/surrogate-pair-detection-fails

Here's another issue I've (mistakenly?) opened in the other repo: fsharp/fsharp#399

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions