Clean up and optimize byte<->float and Rgba32 <-> Vector4 conversion by antonfirsov · Pull Request #742 · SixLabors/ImageSharp

antonfirsov · 2018-10-20T23:54:31Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

Uniformized Span<byte> -> Span<float> (and opposite) conversion methods in SimdUtils which would be useful for Epic: ResizeProcessor performance improvements (Memory & CPU) #733.
Changed Rgba32.PixelOperations to consume these uniformized converters in bulk conversions to/from Vector4
New implementations for the conversions based on dotnet/corefx#15957. These are only accelerated by the newest RyuJIT deployed with .NET Core 2.1 and .NET Framework 4.7.2. Currently the latter is not being targeted directly by ImageSharp, so the new implementations are only used on .NET Core 2.1
- Span<byte> -> Span<float> (thus Span<Rgba32> -> Span<Vector4>) is 3x faster than the current implementation on main
- The opposite direction is ~2x faster

Benchmark results

Detailed benchmark results can be found in comments in benchmark code (ToVector4_Rgba32, PackFromVector4_Rgba32).

Here are the interesting bits:

ToVector4_Rgba32

      FallbackIntrinsics128 |    Core |  2048 | 5,017.89 ns | 4,021.533 ns | 227.2241 ns |   1.24 |     0.05 |      - |       0 B |
         BasicIntrinsics256 |    Core |  2048 | 4,046.51 ns | 1,150.390 ns |  64.9992 ns |   1.00 |     0.00 |      - |       0 B |
         ExtendedIntrinsics |    Core |  2048 | 1,130.59 ns |   832.588 ns |  47.0427 ns |!! 0.28 |     0.01 |      - |       0 B |
       PixelOperations_Base |    Core |  2048 | 6,752.68 ns |   272.820 ns |  15.4148 ns |   1.67 |     0.02 |      - |      24 B |
PixelOperations_Specialized |    Core |  2048 | 1,126.13 ns |    79.192 ns |   4.4745 ns |!! 0.28 |     0.00 |      - |       0 B |

PackFromVector4_Rgba32

      FallbackIntrinsics128 |    Core |  2048 |  6,644.65 ns | 2,677.090 ns | 151.2605 ns |   1.69 |     0.05 |      - |       0 B |
         BasicIntrinsics256 |    Core |  2048 |  3,923.70 ns | 1,971.760 ns | 111.4081 ns |   1.00 |     0.00 |      - |       0 B |
          ExtendedIntrinsic |    Core |  2048 |  2,092.32 ns |   375.657 ns |  21.2253 ns |!! 0.53 |     0.01 |      - |       0 B |
       PixelOperations_Base |    Core |  2048 | 16,875.73 ns | 1,271.957 ns |  71.8679 ns |   4.30 |     0.10 |      - |      24 B |
PixelOperations_Specialized |    Core |  2048 |  2,129.92 ns |   262.888 ns |  14.8537 ns |!! 0.54 |     0.01 |      - |       0 B |

in BulkConvertByteToNormalizedFloat() and BulkConvertNormalizedFloatToByteClampOverflows()

codecov · 2018-10-21T19:00:24Z

Codecov Report

❗ No coverage uploaded for pull request base (master@7e76506). Click here to learn what that means.
The diff coverage is 95.67%.

@@            Coverage Diff            @@
##             master     #742   +/-   ##
=========================================
  Coverage          ?   89.33%           
=========================================
  Files             ?      972           
  Lines             ?    42891           
  Branches          ?     3038           
=========================================
  Hits              ?    38318           
  Misses            ?     3889           
  Partials          ?      684

Impacted Files	Coverage Δ
...olorConverters/JpegColorConverter.FromYCbCrSimd.cs	`84.9% <ø> (ø)`
...ents/Decoder/ColorConverters/JpegColorConverter.cs	`89.85% <ø> (ø)`
...Converters/JpegColorConverter.FromYCbCrSimdAvx2.cs	`91.66% <ø> (ø)`
src/ImageSharp/Common/Tuples/Vector4Pair.cs	`62.5% <ø> (ø)`
....Tests/TestUtilities/Tests/TestEnvironmentTests.cs	`63.63% <ø> (ø)`
...mageSharp.Tests/TestUtilities/TestDataGenerator.cs	`100% <100%> (ø)`
.../ImageSharp/PixelFormats/Rgba32.PixelOperations.cs	`100% <100%> (ø)`
...ImageSharp/PixelFormats/PixelOperations{TPixel}.cs	`100% <100%> (ø)`
src/ImageSharp/Common/Tuples/Octet.cs	`90% <90%> (ø)`
...arp/Common/Helpers/SimdUtils.BasicIntrinsics256.cs	`92.2% <92.2%> (ø)`
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e76506...bf7c933. Read the comment docs.

codecov · 2018-10-21T19:00:24Z

Codecov Report

❗ No coverage uploaded for pull request base (master@7e76506). Click here to learn what that means.
The diff coverage is 93.32%.

@@            Coverage Diff            @@
##             master     #742   +/-   ##
=========================================
  Coverage          ?   89.31%           
=========================================
  Files             ?      973           
  Lines             ?    42979           
  Branches          ?     3047           
=========================================
  Hits              ?    38386           
  Misses            ?     3911           
  Partials          ?      682

Impacted Files	Coverage Δ
...ents/Decoder/ColorConverters/JpegColorConverter.cs	`89.85% <ø> (ø)`
...olorConverters/JpegColorConverter.FromYCbCrSimd.cs	`84.9% <ø> (ø)`
...Converters/JpegColorConverter.FromYCbCrSimdAvx2.cs	`91.66% <ø> (ø)`
src/ImageSharp/Common/Tuples/Vector4Pair.cs	`62.5% <ø> (ø)`
....Tests/TestUtilities/Tests/TestEnvironmentTests.cs	`63.63% <ø> (ø)`
tests/ImageSharp.Tests/Helpers/ImageMathsTests.cs	`100% <100%> (ø)`
.../Common/Helpers/SimdUtils.FallbackIntrinsics128.cs	`100% <100%> (ø)`
src/ImageSharp/Common/Helpers/ImageMaths.cs	`86.66% <100%> (ø)`
...mageSharp.Tests/TestUtilities/TestDataGenerator.cs	`100% <100%> (ø)`
.../ImageSharp/PixelFormats/Rgba32.PixelOperations.cs	`100% <100%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e76506...90c7153. Read the comment docs.

JimBobSquarePants

I'll freely admit. There's some stuff here I don't fully understand but what I do seems sensible. Just a few questions.

JimBobSquarePants · 2018-10-21T19:21:36Z

src/ImageSharp/Common/Helpers/ImageMaths.cs

+        }
+
+        [MethodImpl(InliningOptions.ShortMethod)]
+        public static float Clamp(float x, float min, float max) => Math.Min(max, Math.Max(min, x));


You might wanna benchmark this. The IComparableExtensions version should be a good bit faster.

I do, however want to ditch the extension method for more clear MathUtils.Clamp*** methods.

Good point, will do compare!

Do we need the duplication?

The one in ComparableExtensions is faster, I'm removing this!

JimBobSquarePants · 2018-10-21T19:30:50Z

src/ImageSharp/Common/Helpers/SimdUtils.BasicIntrinsics256.cs

+            /// http://lolengine.net/blog/2011/3/20/understanding-fast-float-integer-conversions
+            /// http://stackoverflow.com/a/536278
+            /// </summary>
+            internal static void BulkConvertByteToNormalizedFloat(ReadOnlySpan<byte> source, Span<float> dest)


Could this perhaps be private so it cannot be called without sanitation?

These methods are all unit tested separately (so coverage is independent from current HW configuration), so we need them as internal.

I can improve the input checking a bit however.

Ah right... Makes sense then. Carry on!

JimBobSquarePants · 2018-10-21T19:34:51Z

src/ImageSharp/Common/Helpers/SimdUtils.ExtendedIntrinsics.cs

+        {
+            public static bool IsAvailable { get; } =
+#if NETCOREAPP2_1
+                // TODO: Also available in .NET 4.7.2, we need to add a build target!


Add one if you need one.

I'm planning to do this in a separate PR, or open an up-for-grabs issue. @iamcarbon is the champion of this stuff! 😄

JimBobSquarePants · 2018-10-21T19:36:22Z

src/ImageSharp/Common/Helpers/SimdUtils.FallbackIntrinsics128.cs

+                    s *= maxBytes;
+                    s += half;
+
+                    // I'm not sure if Vector4.Clamp() is properly implemented with intrinsics.


I do recall reading somewhere it wasn't.

JimBobSquarePants · 2018-10-21T19:38:49Z

src/ImageSharp/Common/Tuples/Octet.cs

+
+            public override string ToString()
+            {
+                return $"[{this.V0},{this.V1},{this.V2},{this.V3},{this.V4},{this.V5},{this.V6},{this.V7}]";


I'm favoring the TypeName(field, field, field) format now for ToString constituency.

JimBobSquarePants · 2018-10-21T19:39:28Z

...harp/Formats/Jpeg/Components/Decoder/ColorConverters/JpegColorConverter.FromYCbCrSimdAvx2.cs

 using System.Runtime.InteropServices;

-using SixLabors.ImageSharp.Common.Tuples;
+using SixLabors.ImageSharp.Tuples;


Would these come under our Primitives namespace?

Definitely a better place!

JimBobSquarePants · 2018-10-21T19:40:31Z

src/ImageSharp/PixelFormats/PixelOperations{TPixel}.cs

        internal virtual void PackFromVector4(ReadOnlySpan<Vector4> sourceVectors, Span<TPixel> destinationColors, int count)
        {
-            GuardSpans(sourceVectors, nameof(sourceVectors), destinationColors, nameof(destinationColors), count);
+            ReadOnlySpan<Vector4> sourceVectors1 = sourceVectors;


How come these are reassigned?

Good catch! Result of undoing a refactor with R#. Will fix it.

JimBobSquarePants · 2018-10-21T19:42:36Z

tests/ImageSharp.Benchmarks/General/BasicMath/ModuloPowerOfTwoVariable.cs

+            return ImageMaths.ModuloP2(this.value, this.m);
+        }
+
+        // RESULTS:


I need to make sure I add the results like this when benchmarking. Really good idea.

dlemstra · 2018-10-21T21:45:11Z

src/ImageSharp/Common/Helpers/ImageMaths.cs

+        }
+
+        [MethodImpl(InliningOptions.ShortMethod)]
+        public static float Clamp(float x, float min, float max) => Math.Min(max, Math.Max(min, x));


Do we need the duplication?

dlemstra · 2018-10-21T21:50:46Z

src/ImageSharp/PixelFormats/Rgba32.PixelOperations.cs

                Guard.MustBeSizedAtLeast(sourceColors, count, nameof(sourceColors));
                Guard.MustBeSizedAtLeast(destinationVectors, count, nameof(destinationVectors));

-                if (count < 256 || !Vector.IsHardwareAccelerated)


Do we no longer need the count check?

It was an optimization for small buffers, but the new logic made it obsolete.

Okay 👍 Just wanted to make sure that you not missed this.

dlemstra · 2018-10-21T21:51:48Z

tests/ImageSharp.Benchmarks/Color/Bulk/PackFromVector4.cs

        }

-        [Benchmark(Baseline = true)]
+        //[Benchmark]


Should this still be commented?

The benchmark code serves as an information, but the execution is unnecessary, unless someone wants to evaluate that specific method in future investigations.

I wish there was some better way to Skip benchmarks without completely dropping their code.

dlemstra · 2018-10-21T21:52:10Z

tests/ImageSharp.Benchmarks/Color/Bulk/ToVector4.cs

        }

-        [Benchmark(Baseline = true)]
+        //[Benchmark]


Should this still be commented?

antonfirsov · 2018-10-21T22:14:52Z

@JimBobSquarePants @dlemstra all findings were addressed. Gonna merge this as soon as the compilation is finished so we can go on with #729.

JimBobSquarePants · 2018-10-21T22:48:38Z

Awesome!

antonfirsov added 20 commits October 14, 2018 22:44

BulkConvertByteToNormalizedFloat

260a8f8

SIMD byte -> float conversion: BulkConvertByteToNormalizedFloatFast

af7d96d

move tests

281c527

uniformize conversion code

3e5325e

BulkConvertNormalizedFloatToByteClampOverflows

df87a68

disappointing benchmark results

b8b411b

todo notes

a471420

Merge remote-tracking branch 'origin/master' into af/simd-conversion

708c3d2

cleanup

0f4f822

benchmark conversion steps separately

0e06eb6

fixed benchmarks and optimized implementation

0f538ff

fix accuracy issues

664d838

cleanup benchmarks

10afe65

Bulk conversion of arbitrary-sized Span-s of scalars

17f6dcc

fix benchmarks

34ab918

simplify Rgba32.PixelOperations, include benchmark results

2fcda3c

cleanup code and comments

cb8b48d

Merge remote-tracking branch 'origin/master' into af/simd-conversion

82faeec

FallbackIntrinsics128 + ImageMaths.Modulo* implementations

d1d52a7

minimize ceremonial overhead

bf7c933

in BulkConvertByteToNormalizedFloat() and BulkConvertNormalizedFloatToByteClampOverflows()

fix comment

520c6fc

antonfirsov requested review from JimBobSquarePants, dlemstra and tocsoft October 21, 2018 19:10

JimBobSquarePants reviewed Oct 21, 2018

View reviewed changes

address review findings + some more cleanup

5c687fa

dlemstra reviewed Oct 21, 2018

View reviewed changes

drop slow Clamp() implementation

54ccf05

remove useless reassignment in PixelOperations{TPixel}

90c7153

antonfirsov mentioned this pull request Oct 21, 2018

#718 #730: Add Gray8 and Gray16 Pixel Formats and clean up IPixel #729

Merged

4 tasks

antonfirsov merged commit 6f5ebbb into master Oct 21, 2018

This was referenced Oct 22, 2018

Cross target .NET Framework 4.7.2 #743

Closed

Epic: ResizeProcessor performance improvements (Memory & CPU) #733

Closed

antonfirsov deleted the af/simd-conversion branch October 30, 2018 21:10

Uh oh!

Conversation

antonfirsov commented Oct 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Description

Benchmark results

Uh oh!

codecov bot commented Oct 21, 2018

Codecov Report

Uh oh!

codecov bot commented Oct 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov commented Oct 21, 2018

Uh oh!

JimBobSquarePants commented Oct 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antonfirsov commented Oct 20, 2018 •

edited

Loading

codecov bot commented Oct 21, 2018 •

edited

Loading