Right now string.measure returns a value in [0,231–1] on success, and -1 otherwise. This conflates failure-to-encode because e.g. a bad USV sequence with failure-to-encode because the result would be longer than 2^31 bytes (admittedly rare and even impossible on some hosts; see #12 (comment)). Should we differentiate the two cases?
We could use in fact use any negative value to indicate which codepoint couldn't be encoded. A return value of -cursor could indicate the cursor after the codepoint at which the error occurred. It fits because a string with 0 codepoints can't fail, so there are only 231 cursor values to encode. It would still be a bit gnarly to distinguish overflow from can't-encode-codepoint, but perhaps that's OK.
But, perhaps it's overkill!
cc @lars-t-hansen
Right now
string.measurereturns a value in [0,231–1] on success, and -1 otherwise. This conflates failure-to-encode because e.g. a bad USV sequence with failure-to-encode because the result would be longer than 2^31 bytes (admittedly rare and even impossible on some hosts; see #12 (comment)). Should we differentiate the two cases?We could use in fact use any negative value to indicate which codepoint couldn't be encoded. A return value of -cursor could indicate the cursor after the codepoint at which the error occurred. It fits because a string with 0 codepoints can't fail, so there are only 231 cursor values to encode. It would still be a bit gnarly to distinguish overflow from can't-encode-codepoint, but perhaps that's OK.
But, perhaps it's overkill!
cc @lars-t-hansen