Conversation
|
Tagging subscribers to this area: @dotnet/area-system-io-hashing, @bartonjs, @vcsjones |
There was a problem hiding this comment.
Pull request overview
This PR refactors Adler-32’s SIMD implementation in System.IO.Hashing to a new strategy-based vectorized core, updates tests to stress delayed-modulo overflow scenarios, and wires the new SIMD source file into the build.
Changes:
- Added a new SIMD implementation (
Adler32Simd.cs) with AVX2 / SSSE3 / Arm64 (incl. DP) selection and shared vectorized update core. - Updated
Adler32to route vectorized updates through the new implementation and adjusted constant visibility. - Modified Adler32 tests to better stress overflow safety and expanded length coverage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/libraries/System.IO.Hashing/tests/Adler32Tests.cs |
Updates large-input overflow-stress test and expands length coverage; removes a previous all-0xFF reference test. |
src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32Simd.cs |
Introduces the new SIMD implementation and strategy abstractions for vectorized Adler32 updates. |
src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs |
Simplifies vectorization gating and delegates SIMD updating to the new implementation; exposes ModBase internally. |
src/libraries/System.IO.Hashing/src/System.IO.Hashing.csproj |
Includes the new Adler32Simd.cs for .NETCoreApp builds. |
src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32Simd.cs
Outdated
Show resolved
Hide resolved
| { | ||
| data[i] = (byte)('a' + (i % 26)); | ||
| } | ||
|
|
There was a problem hiding this comment.
This test (and the other removed test) didn't check the actual boundary condition. For example, if NMax is changed to 8192 in the Adler32 implementation, the tests still pass.
The updated test correctly breaks if NMax is set as small as 5553.
This replaces the vectorized Adler32 implementation added in #124409
Major Differences
Vector512implementation, which was about 20% slower than theVector256implementation on compatible hardware.Vector256implementation by taking better advantage of pipelining to compensate for high-latency instructions.NMaxbytes, speeding large input processing.In all, this amounts to a roughly 2x perf increase on large inputs, and even more on small inputs that are not an even multiple of vector size.
Benchmark Summary
x64
Arm64
Detailed Benchmark Results
-----> In Here <-----
AMD AVX-512 (Zen 5)
Arm64 (Windows Dev Kit 2023)
Intel AVX2 (Skylake)