Skip to content

Simplify UTF-16 validation Vector128 codepath#121981

Merged
tannergooding merged 1 commit into
dotnet:mainfrom
ylpoonlg:github-utf16-validation
Dec 2, 2025
Merged

Simplify UTF-16 validation Vector128 codepath#121981
tannergooding merged 1 commit into
dotnet:mainfrom
ylpoonlg:github-utf16-validation

Conversation

@ylpoonlg

Copy link
Copy Markdown
Contributor

Re-attempt at #121383.

Refactor the vectorized code path by combining the SSE2 "intrinsified" path with the original Vector128 algorithm. There are still some platform specific code (for AdvSimd), as it is difficult to fully rely on Vector128 APIs without sacrificing performance too much. The main issue is the lack of an instruction for Vector128.ExtractMostSignificantBits on Arm, so it is significantly slower when trying to force it to use the same mask format as the SSE2 algorithm. I have looked into the possibility of using IndexOf and Count etc, but they also use ExtractMostSignificantBits so it poses the same problem.
This PR tries to encapsulate this difference in a few helper methods so they can share the same code path for the main algorithm.

Performance wise, there is not as much improvements, but hopefully the code will be easier to maintain.

Arm Neoverse-V2:

Method Input Version Mean Error Ratio
GetByteCount EnglishAllAscii Before 4.437 us 0.0437 us 1.000
GetByteCount EnglishAllAscii After 4.475 us 0.1618 us 1.009
GetByteCount EnglishMostlyAscii Before 20.387 us 0.1744 us 1.000
GetByteCount EnglishMostlyAscii After 19.941 us 0.1079 us 0.978
GetByteCount Chinese Before 9.145 us 0.0072 us 1.000
GetByteCount Chinese After 8.992 us 0.0069 us 0.983
GetByteCount Cyrillic Before 7.936 us 0.0095 us 1.000
GetByteCount Cyrillic After 7.812 us 0.0056 us 0.984
GetByteCount Greek Before 10.077 us 0.0106 us 1.000
GetByteCount Greek After 9.952 us 0.0120 us 0.988

Intel Sapphire Rapids:

Method Input Version Mean Error Ratio
GetByteCount EnglishAllAscii Before 8.144 us 0.3398 us 1.000
GetByteCount EnglishAllAscii After 8.126 us 0.2759 us 0.998
GetByteCount EnglishMostlyAscii Before 22.971 us 0.4046 us 1.000
GetByteCount EnglishMostlyAscii After 22.155 us 0.9902 us 0.964
GetByteCount Chinese Before 10.582 us 0.3425 us 1.000
GetByteCount Chinese After 10.048 us 0.2135 us 0.950
GetByteCount Cyrillic Before 9.222 us 0.1874 us 1.000
GetByteCount Cyrillic After 9.100 us 0.2704 us 0.987
GetByteCount Greek Before 11.802 us 0.3551 us 1.000
GetByteCount Greek After 11.224 us 0.3505 us 0.951

Combine the SSE2 codepath with a more generic Vector128 algorithm.

AdvSimd is handled slightly differently to avoid using Vector128
ExtractMostSignificantBits, because there is no such equivalent
instruction on Arm so the performance would be very slow otherwise.
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Nov 26, 2025
@github-actions github-actions Bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 26, 2025
@ylpoonlg

Copy link
Copy Markdown
Contributor Author

cc @dotnet/arm64-contrib @a74nh @SwapnilGaikwad @tannergooding @EgorBo

@ylpoonlg ylpoonlg marked this pull request as ready for review November 26, 2025 10:15
@tannergooding tannergooding merged commit ac7db14 into dotnet:main Dec 2, 2025
144 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 2, 2026
@EgorBo

EgorBo commented Jan 13, 2026

Copy link
Copy Markdown
Member

Improvements: dotnet/perf-autofiling-issues#67360

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

community-contribution Indicates that the PR has been added by a community member needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants