Clean up and optimize byte<->float and Rgba32 <-> Vector4 conversion#742
Clean up and optimize byte<->float and Rgba32 <-> Vector4 conversion#742antonfirsov merged 24 commits intomasterfrom
Conversation
in BulkConvertByteToNormalizedFloat() and BulkConvertNormalizedFloatToByteClampOverflows()
Codecov Report
@@ Coverage Diff @@
## master #742 +/- ##
=========================================
Coverage ? 89.33%
=========================================
Files ? 972
Lines ? 42891
Branches ? 3038
=========================================
Hits ? 38318
Misses ? 3889
Partials ? 684
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #742 +/- ##
=========================================
Coverage ? 89.31%
=========================================
Files ? 973
Lines ? 42979
Branches ? 3047
=========================================
Hits ? 38386
Misses ? 3911
Partials ? 682
Continue to review full report at Codecov.
|
JimBobSquarePants
left a comment
There was a problem hiding this comment.
I'll freely admit. There's some stuff here I don't fully understand but what I do seems sensible. Just a few questions.
| } | ||
|
|
||
| [MethodImpl(InliningOptions.ShortMethod)] | ||
| public static float Clamp(float x, float min, float max) => Math.Min(max, Math.Max(min, x)); |
There was a problem hiding this comment.
You might wanna benchmark this. The IComparableExtensions version should be a good bit faster.
I do, however want to ditch the extension method for more clear MathUtils.Clamp*** methods.
There was a problem hiding this comment.
Good point, will do compare!
There was a problem hiding this comment.
The one in ComparableExtensions is faster, I'm removing this!
| /// http://lolengine.net/blog/2011/3/20/understanding-fast-float-integer-conversions | ||
| /// http://stackoverflow.com/a/536278 | ||
| /// </summary> | ||
| internal static void BulkConvertByteToNormalizedFloat(ReadOnlySpan<byte> source, Span<float> dest) |
There was a problem hiding this comment.
Could this perhaps be private so it cannot be called without sanitation?
There was a problem hiding this comment.
These methods are all unit tested separately (so coverage is independent from current HW configuration), so we need them as internal.
I can improve the input checking a bit however.
There was a problem hiding this comment.
Ah right... Makes sense then. Carry on!
| { | ||
| public static bool IsAvailable { get; } = | ||
| #if NETCOREAPP2_1 | ||
| // TODO: Also available in .NET 4.7.2, we need to add a build target! |
There was a problem hiding this comment.
Add one if you need one.
There was a problem hiding this comment.
I'm planning to do this in a separate PR, or open an up-for-grabs issue. @iamcarbon is the champion of this stuff! 😄
| s *= maxBytes; | ||
| s += half; | ||
|
|
||
| // I'm not sure if Vector4.Clamp() is properly implemented with intrinsics. |
There was a problem hiding this comment.
I do recall reading somewhere it wasn't.
|
|
||
| public override string ToString() | ||
| { | ||
| return $"[{this.V0},{this.V1},{this.V2},{this.V3},{this.V4},{this.V5},{this.V6},{this.V7}]"; |
There was a problem hiding this comment.
I'm favoring the TypeName(field, field, field) format now for ToString constituency.
| using System.Runtime.InteropServices; | ||
|
|
||
| using SixLabors.ImageSharp.Common.Tuples; | ||
| using SixLabors.ImageSharp.Tuples; |
There was a problem hiding this comment.
Would these come under our Primitives namespace?
There was a problem hiding this comment.
Definitely a better place!
| internal virtual void PackFromVector4(ReadOnlySpan<Vector4> sourceVectors, Span<TPixel> destinationColors, int count) | ||
| { | ||
| GuardSpans(sourceVectors, nameof(sourceVectors), destinationColors, nameof(destinationColors), count); | ||
| ReadOnlySpan<Vector4> sourceVectors1 = sourceVectors; |
There was a problem hiding this comment.
How come these are reassigned?
There was a problem hiding this comment.
Good catch! Result of undoing a refactor with R#. Will fix it.
| return ImageMaths.ModuloP2(this.value, this.m); | ||
| } | ||
|
|
||
| // RESULTS: |
There was a problem hiding this comment.
I need to make sure I add the results like this when benchmarking. Really good idea.
| } | ||
|
|
||
| [MethodImpl(InliningOptions.ShortMethod)] | ||
| public static float Clamp(float x, float min, float max) => Math.Min(max, Math.Max(min, x)); |
| Guard.MustBeSizedAtLeast(sourceColors, count, nameof(sourceColors)); | ||
| Guard.MustBeSizedAtLeast(destinationVectors, count, nameof(destinationVectors)); | ||
|
|
||
| if (count < 256 || !Vector.IsHardwareAccelerated) |
There was a problem hiding this comment.
Do we no longer need the count check?
There was a problem hiding this comment.
It was an optimization for small buffers, but the new logic made it obsolete.
There was a problem hiding this comment.
Okay 👍 Just wanted to make sure that you not missed this.
| } | ||
|
|
||
| [Benchmark(Baseline = true)] | ||
| //[Benchmark] |
There was a problem hiding this comment.
Should this still be commented?
There was a problem hiding this comment.
The benchmark code serves as an information, but the execution is unnecessary, unless someone wants to evaluate that specific method in future investigations.
I wish there was some better way to Skip benchmarks without completely dropping their code.
| } | ||
|
|
||
| [Benchmark(Baseline = true)] | ||
| //[Benchmark] |
There was a problem hiding this comment.
Should this still be commented?
|
@JimBobSquarePants @dlemstra all findings were addressed. Gonna merge this as soon as the compilation is finished so we can go on with #729. |
|
Awesome! |
Prerequisites
Description
Span<byte>->Span<float>(and opposite) conversion methods inSimdUtilswhich would be useful for Epic: ResizeProcessor performance improvements (Memory & CPU) #733.Rgba32.PixelOperationsto consume these uniformized converters in bulk conversions to/fromVector4Span<byte>->Span<float>(thusSpan<Rgba32>->Span<Vector4>) is 3x faster than the current implementation onmainBenchmark results
Detailed benchmark results can be found in comments in benchmark code (ToVector4_Rgba32, PackFromVector4_Rgba32).
Here are the interesting bits:
ToVector4_Rgba32PackFromVector4_Rgba32