Skip to content

Improve Adler32 vectorization#125191

Open
saucecontrol wants to merge 3 commits intodotnet:mainfrom
saucecontrol:adler32
Open

Improve Adler32 vectorization#125191
saucecontrol wants to merge 3 commits intodotnet:mainfrom
saucecontrol:adler32

Conversation

@saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Mar 4, 2026

This replaces the vectorized Adler32 implementation added in #124409

Major Differences

  • Removes the Vector512 implementation, which was about 20% slower than the Vector256 implementation on compatible hardware.
  • Improves the performance of the Vector256 implementation by taking better advantage of pipelining to compensate for high-latency instructions.
  • Handles smaller-than-vector tails with SIMD, avoiding potentially long scalar loops.
  • Avoids dropping to scalar every NMax bytes, speeding large input processing.
  • Adds an Armv8.2 DP implementation

In all, this amounts to a roughly 2x perf increase on large inputs, and even more on small inputs that are not an even multiple of vector size.

Benchmark Summary

x64

Method InputLength Mean Error StdDev Ratio RatioSD Code Size
Main 16384 323.930 ns 1.1096 ns 1.0379 ns 1.00 0.00 682 B
PR 16384 176.340 ns 0.3327 ns 0.2778 ns 0.54 0.00 801 B

Arm64

Method InputLength Mean Error StdDev Ratio RatioSD
Main 16384 868.077 ns 10.2150 ns 9.5551 ns 1.00 0.02
PR 16384 425.371 ns 1.2299 ns 1.0903 ns 0.49 0.01

Detailed Benchmark Results

-----> In Here <-----

AMD AVX-512 (Zen 5)


BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7462/25H2/2025Update/HudsonValley2)
AMD Ryzen AI 9 HX 370 w/ Radeon 890M 2.00GHz, 1 CPU, 24 logical and 12 physical cores
.NET SDK 10.0.102
  [Host]     : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4
  DefaultJob : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4


Method InputLength Mean Error StdDev Ratio RatioSD Code Size
Main 16 4.416 ns 0.0152 ns 0.0134 ns 1.00 0.00 281 B
PR 16 3.357 ns 0.0155 ns 0.0145 ns 0.76 0.00 761 B
Main 24 5.775 ns 0.0136 ns 0.0114 ns 1.00 0.00 284 B
PR 24 3.992 ns 0.0134 ns 0.0126 ns 0.69 0.00 752 B
Main 31 7.000 ns 0.0310 ns 0.0290 ns 1.00 0.01 284 B
PR 31 3.968 ns 0.0080 ns 0.0066 ns 0.57 0.00 752 B
Main 32 4.072 ns 0.0115 ns 0.0096 ns 1.00 0.00 518 B
PR 32 3.318 ns 0.0095 ns 0.0084 ns 0.81 0.00 761 B
Main 48 8.376 ns 0.0165 ns 0.0154 ns 1.00 0.00 520 B
PR 48 3.520 ns 0.0137 ns 0.0121 ns 0.42 0.00 752 B
Main 64 4.789 ns 0.0079 ns 0.0070 ns 1.00 0.00 674 B
PR 64 4.331 ns 0.0059 ns 0.0049 ns 0.90 0.00 781 B
Main 95 14.783 ns 0.2521 ns 0.3366 ns 1.00 0.03 675 B
PR 95 5.316 ns 0.0160 ns 0.0150 ns 0.36 0.01 763 B
Main 127 22.506 ns 0.0464 ns 0.0434 ns 1.00 0.00 1,033 B
PR 127 5.541 ns 0.0118 ns 0.0099 ns 0.25 0.00 754 B
Main 128 5.551 ns 0.0170 ns 0.0151 ns 1.00 0.00 674 B
PR 128 4.899 ns 0.0076 ns 0.0059 ns 0.88 0.00 781 B
Main 224 11.186 ns 0.0167 ns 0.0157 ns 1.00 0.00 1,031 B
PR 224 5.712 ns 0.0098 ns 0.0087 ns 0.51 0.00 768 B
Main 1000 28.755 ns 0.5615 ns 0.5252 ns 1.00 0.02 1,041 B
PR 1000 13.703 ns 0.0371 ns 0.0347 ns 0.48 0.01 783 B
Main 1024 20.507 ns 0.0764 ns 0.0715 ns 1.00 0.00 682 B
PR 1024 13.087 ns 0.0274 ns 0.0257 ns 0.64 0.00 801 B
Main 4096 80.564 ns 1.6367 ns 2.2944 ns 1.00 0.04 682 B
PR 4096 45.615 ns 0.1160 ns 0.1085 ns 0.57 0.02 801 B
Main 16384 323.930 ns 1.1096 ns 1.0379 ns 1.00 0.00 682 B
PR 16384 176.340 ns 0.3327 ns 0.2778 ns 0.54 0.00 801 B

Arm64 (Windows Dev Kit 2023)


BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7840/25H2/2025Update/HudsonValley2)
Snapdragon Compute Platform 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK 10.0.200-preview.0.26103.119
  [Host]     : .NET 10.0.3 (10.0.3, 10.0.326.7603), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 10.0.3 (10.0.3, 10.0.326.7603), Arm64 RyuJIT armv8.0-a


Method InputLength Mean Error StdDev Ratio RatioSD
Main 16 9.931 ns 0.1664 ns 0.1557 ns 1.00 0.02
PR 16 2.778 ns 0.0080 ns 0.0071 ns 0.28 0.00
Main 24 11.030 ns 0.0148 ns 0.0132 ns 1.00 0.00
PR 24 4.248 ns 0.0078 ns 0.0065 ns 0.39 0.00
Main 31 14.650 ns 0.0238 ns 0.0222 ns 1.00 0.00
PR 31 4.386 ns 0.0044 ns 0.0039 ns 0.30 0.00
Main 32 5.378 ns 0.0230 ns 0.0215 ns 1.00 0.01
PR 32 4.162 ns 0.0100 ns 0.0084 ns 0.77 0.00
Main 48 11.665 ns 0.0128 ns 0.0120 ns 1.00 0.00
PR 48 4.811 ns 0.0137 ns 0.0114 ns 0.41 0.00
Main 64 6.777 ns 0.0133 ns 0.0125 ns 1.00 0.00
PR 64 5.436 ns 0.0094 ns 0.0088 ns 0.80 0.00
Main 95 20.609 ns 0.0270 ns 0.0252 ns 1.00 0.00
PR 95 7.929 ns 0.0173 ns 0.0162 ns 0.38 0.00
Main 127 21.668 ns 0.0204 ns 0.0181 ns 1.00 0.00
PR 127 9.041 ns 0.0157 ns 0.0131 ns 0.42 0.00
Main 128 9.553 ns 0.0149 ns 0.0140 ns 1.00 0.00
PR 128 7.310 ns 0.0162 ns 0.0144 ns 0.77 0.00
Main 224 14.551 ns 0.0183 ns 0.0162 ns 1.00 0.00
PR 224 9.784 ns 0.0134 ns 0.0118 ns 0.67 0.00
Main 1000 57.058 ns 0.1198 ns 0.1062 ns 1.00 0.00
PR 1000 32.267 ns 0.0572 ns 0.0535 ns 0.57 0.00
Main 1024 58.382 ns 0.1277 ns 0.1195 ns 1.00 0.00
PR 1024 30.900 ns 0.0654 ns 0.0579 ns 0.53 0.00
Main 4096 218.426 ns 2.2794 ns 2.0207 ns 1.00 0.01
PR 4096 111.126 ns 0.6393 ns 0.5668 ns 0.51 0.01
Main 16384 868.077 ns 10.2150 ns 9.5551 ns 1.00 0.02
PR 16384 425.371 ns 1.2299 ns 1.0903 ns 0.49 0.01

Intel AVX2 (Skylake)


BenchmarkDotNet v0.15.8, Windows 10 (10.0.19045.6456/22H2/2022Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 10.0.103
  [Host]     : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v3


Method InputLength Mean Error StdDev Ratio RatioSD Code Size
Main 16 7.880 ns 0.1897 ns 0.2258 ns 1.00 0.04 252 B
PR 16 6.551 ns 0.0438 ns 0.0409 ns 0.83 0.02 790 B
Main 24 10.550 ns 0.0987 ns 0.0875 ns 1.00 0.01 252 B
PR 24 8.161 ns 0.0735 ns 0.0574 ns 0.77 0.01 781 B
Main 31 13.016 ns 0.0548 ns 0.0513 ns 1.00 0.01 252 B
PR 31 8.073 ns 0.0281 ns 0.0249 ns 0.62 0.00 781 B
Main 32 8.517 ns 0.0444 ns 0.0393 ns 1.00 0.01 486 B
PR 32 6.719 ns 0.0363 ns 0.0322 ns 0.79 0.01 790 B
Main 48 13.106 ns 0.2872 ns 0.2949 ns 1.00 0.03 488 B
PR 48 8.041 ns 0.0837 ns 0.0654 ns 0.61 0.01 781 B
Main 64 9.075 ns 0.0403 ns 0.0377 ns 1.00 0.01 486 B
PR 64 9.077 ns 0.0366 ns 0.0342 ns 1.00 0.01 814 B
Main 95 22.044 ns 0.4640 ns 0.5157 ns 1.00 0.03 488 B
PR 95 11.108 ns 0.0211 ns 0.0187 ns 0.50 0.01 796 B
Main 127 20.746 ns 0.1126 ns 0.1054 ns 1.00 0.01 488 B
PR 127 11.440 ns 0.0833 ns 0.0650 ns 0.55 0.00 787 B
Main 128 10.805 ns 0.0351 ns 0.0293 ns 1.00 0.00 486 B
PR 128 10.097 ns 0.0329 ns 0.0291 ns 0.93 0.00 814 B
Main 224 13.447 ns 0.2395 ns 0.2123 ns 1.00 0.02 486 B
PR 224 11.717 ns 0.0990 ns 0.0773 ns 0.87 0.01 805 B
Main 1000 33.921 ns 0.1759 ns 0.1646 ns 1.00 0.01 498 B
PR 1000 26.133 ns 0.1194 ns 0.1059 ns 0.77 0.00 815 B
Main 1024 32.486 ns 0.1178 ns 0.0919 ns 1.00 0.00 502 B
PR 1024 27.179 ns 0.1521 ns 0.1187 ns 0.84 0.00 833 B
Main 4096 112.877 ns 1.5260 ns 1.2743 ns 1.00 0.02 502 B
PR 4096 72.805 ns 0.4396 ns 0.4112 ns 0.65 0.01 833 B
Main 16384 442.728 ns 8.2479 ns 7.3115 ns 1.00 0.02 502 B
PR 16384 271.680 ns 1.1137 ns 1.0418 ns 0.61 0.01 833 B

Copilot AI review requested due to automatic review settings March 4, 2026 21:27
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 4, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io-hashing, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Adler-32’s SIMD implementation in System.IO.Hashing to a new strategy-based vectorized core, updates tests to stress delayed-modulo overflow scenarios, and wires the new SIMD source file into the build.

Changes:

  • Added a new SIMD implementation (Adler32Simd.cs) with AVX2 / SSSE3 / Arm64 (incl. DP) selection and shared vectorized update core.
  • Updated Adler32 to route vectorized updates through the new implementation and adjusted constant visibility.
  • Modified Adler32 tests to better stress overflow safety and expanded length coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/libraries/System.IO.Hashing/tests/Adler32Tests.cs Updates large-input overflow-stress test and expands length coverage; removes a previous all-0xFF reference test.
src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32Simd.cs Introduces the new SIMD implementation and strategy abstractions for vectorized Adler32 updates.
src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs Simplifies vectorization gating and delegates SIMD updating to the new implementation; exposes ModBase internally.
src/libraries/System.IO.Hashing/src/System.IO.Hashing.csproj Includes the new Adler32Simd.cs for .NETCoreApp builds.

@saucecontrol saucecontrol marked this pull request as ready for review March 4, 2026 23:10
Copilot AI review requested due to automatic review settings March 4, 2026 23:10
{
data[i] = (byte)('a' + (i % 26));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test (and the other removed test) didn't check the actual boundary condition. For example, if NMax is changed to 8192 in the Adler32 implementation, the tests still pass.

The updated test correctly breaks if NMax is set as small as 5553.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

@saucecontrol
Copy link
Member Author

cc @tannergooding @stephentoub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.IO.Hashing community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants