[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145
[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145iremyux wants to merge 62 commits intodotnet:mainfrom
Conversation
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/ref/System.IO.Compression.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibDecoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
| /// <returns>One of the enumeration values that describes the status with which the operation finished.</returns> | ||
| public OperationStatus Flush(Span<byte> destination, out int bytesWritten) | ||
| { | ||
| return Compress(ReadOnlySpan<byte>.Empty, destination, out _, out bytesWritten, isFinalBlock: false); |
There was a problem hiding this comment.
does this force writing output (if available), I think this should lead to FlushCode.SyncFlush to the native API
| /// <param name="source">A read-only span of bytes containing the source data to compress.</param> | ||
| /// <param name="destination">When this method returns, a span of bytes where the compressed data is stored.</param> | ||
| /// <param name="bytesWritten">When this method returns, the total number of bytes that were written to <paramref name="destination"/>.</param> | ||
| /// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param> |
There was a problem hiding this comment.
We should be more clear which default we mean.
| /// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param> | |
| /// <param name="compressionLevel">A number representing compression level. -1 means implementation default, 0 is no compression, 1 is best speed, 9 is best compression.</param> |
| CompressionLevel.Fastest => ZLibNative.CompressionLevel.BestSpeed, | ||
| CompressionLevel.NoCompression => ZLibNative.CompressionLevel.NoCompression, | ||
| CompressionLevel.SmallestSize => ZLibNative.CompressionLevel.BestCompression, | ||
| _ => throw new ArgumentOutOfRangeException(nameof(compressionLevel)), |
There was a problem hiding this comment.
This would fail on valid native compression levels not covered by the CompressionLevel enum. Instead I think it should check if the value is is < -1 or > 9 to throw out of range instead.
There was a problem hiding this comment.
Also to add on to the above, now those who want compression levels that just happen to == a value in the CompressionLevel enum will now not be able to use those compression levels either. Perhaps a solution to this is to expose a version of the ctor with CompressionLevel and a version with int that gets casted to ZLibNative.CompressionLevel after a range check.
src/libraries/System.IO.Compression/tests/DeflateZLibGZipEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/GZipEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZLibCompressionOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZLibEncoder.cs
Outdated
Show resolved
Hide resolved
|
|
||
| /// <summary> | ||
| /// Defines the compression algorithms that can be used for <see cref="DeflateStream"/>, <see cref="GZipStream"/> or <see cref="ZLibStream"/>. | ||
| /// Defines the compression algorithms that can be used for <see cref="DeflateStream"/>, <see cref="GZipStream"/> or <see cref="ZLibStream"/>. |
There was a problem hiding this comment.
The sentence should have "Defines" capitalized consistently with the rest of the documentation, and there's an extra space after "Defines". Should be "Defines the compression algorithms..."
…ux/dotnet-runtime into 62113-zlib-encoder-decoder
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| ArgumentOutOfRangeException.ThrowIfNegative(inputLength); | ||
| ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue); | ||
|
|
||
| return (long)Interop.ZLib.compressBound((uint)inputLength); |
There was a problem hiding this comment.
DeflateEncoder.GetMaxCompressedLength relies on Interop.ZLib.compressBound returning a 32-bit value. For inputLength close to uint.MaxValue, zlib's compressBound calculation can exceed 4GiB, so truncating to uint can produce an undersized bound (and the method currently permits inputLength == uint.MaxValue). Consider changing the native/interop contract to return a 64-bit size (e.g., ulong/nuint) and adding an overflow check (throw) when the bound can't be represented safely.
| return (long)Interop.ZLib.compressBound((uint)inputLength); | |
| // Compute the same upper bound as zlib's compressBound macro using 64-bit arithmetic: | |
| // bound = sourceLen + (sourceLen >> 12) + (sourceLen >> 14) + (sourceLen >> 25) + 13 | |
| ulong length = (ulong)inputLength; | |
| ulong bound = length | |
| + (length >> 12) | |
| + (length >> 14) | |
| + (length >> 25) | |
| + 13; | |
| if (bound > long.MaxValue) | |
| { | |
| throw new ArgumentOutOfRangeException(nameof(inputLength)); | |
| } | |
| return (long)bound; |
| Calculates and returns an upper bound on the compressed size after deflate compressing sourceLen bytes. | ||
| This is a worst-case estimate that accounts for incompressible data and zlib wrapper overhead. | ||
| The actual compressed size will typically be smaller. | ||
|
|
||
| Returns the maximum number of bytes the compressed output could require. | ||
| */ | ||
| FUNCTIONEXPORT uint32_t FUNCTIONCALLINGCONVENTION CompressionNative_CompressBound(uint32_t sourceLen); |
There was a problem hiding this comment.
CompressionNative_CompressBound is declared as returning uint32_t, but compressBound can compute values larger than 4GiB for large inputs. Since the managed API exposes GetMaxCompressedLength(long), consider returning a 64-bit size (uint64_t/size_t) from the native export (and updating interop) or tightening the accepted input range so the bound never overflows.
| _state = ZLibNative.ZLibStreamHandle.CreateForDeflate( | ||
| (ZLibNative.CompressionLevel)quality, | ||
| windowBits, | ||
| ZLibNative.Deflate_DefaultMemLevel, | ||
| ZLibNative.CompressionStrategy.DefaultStrategy); |
There was a problem hiding this comment.
DeflateEncoder always passes ZLibNative.Deflate_DefaultMemLevel to CreateForDeflate, even when quality/options specify no compression (level 0). DeflateStream adjusts memLevel for NoCompression (uses Deflate_NoCompressionMemLevel=7), so aligning DeflateEncoder with that behavior would reduce memory usage and keep the span-based API consistent with the stream implementation.
| /// <summary> | ||
| /// Gets or sets the base-2 logarithm of the window size for a compression stream. | ||
| /// </summary> | ||
| /// <exception cref="ArgumentOutOfRangeException">The value is less than -1 or greater than 15, or between 0 and 7.</exception> | ||
| /// <remarks> | ||
| /// Can accept -1 or any value between 8 and 15 (inclusive). Larger values result in better compression at the expense of memory usage. | ||
| /// -1 requests the default window log which is currently equivalent to 15 (32KB window). The default value is -1. | ||
| /// </remarks> | ||
| public int WindowLog | ||
| { | ||
| get => _windowLog; | ||
| set | ||
| { | ||
| if (value != -1) | ||
| { | ||
| ArgumentOutOfRangeException.ThrowIfLessThan(value, ZLibNative.MinWindowLog); | ||
| ArgumentOutOfRangeException.ThrowIfGreaterThan(value, ZLibNative.MaxWindowLog); | ||
| } | ||
|
|
||
| _windowLog = value; | ||
| } |
There was a problem hiding this comment.
ZLibCompressionOptions gained the WindowLog property (including range validation and new Min/Max/Default window log APIs), but the existing ZLibCompressionOptionsUnitTests don't cover default value, valid assignments, or out-of-range cases for WindowLog. Adding targeted tests would help prevent regressions in the new option surface area.
| /// <exception cref="ArgumentOutOfRangeException"><paramref name="inputLength"/> is negative or exceeds <see cref="uint.MaxValue"/>.</exception> | ||
| public static long GetMaxCompressedLength(long inputLength) | ||
| { | ||
| ArgumentOutOfRangeException.ThrowIfNegative(inputLength); | ||
| ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue); | ||
|
|
||
| // compressBound() returns the upper bound for zlib-wrapped deflate output. | ||
| // For raw deflate (no header/trailer) this slightly overestimates, which is safe. | ||
| return (long)Interop.ZLib.compressBound((uint)inputLength); |
There was a problem hiding this comment.
The new encoders expose GetMaxCompressedLength(long), but these APIs are described as matching the BrotliEncoder pattern and are span-based (Span length is int). Consider aligning with BrotliEncoder by taking an int input size and throwing when the computed bound would exceed int.MaxValue; otherwise callers can receive bounds they cannot use to allocate a Span-backed buffer.
| /// <exception cref="ArgumentOutOfRangeException"><paramref name="inputLength"/> is negative or exceeds <see cref="uint.MaxValue"/>.</exception> | |
| public static long GetMaxCompressedLength(long inputLength) | |
| { | |
| ArgumentOutOfRangeException.ThrowIfNegative(inputLength); | |
| ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue); | |
| // compressBound() returns the upper bound for zlib-wrapped deflate output. | |
| // For raw deflate (no header/trailer) this slightly overestimates, which is safe. | |
| return (long)Interop.ZLib.compressBound((uint)inputLength); | |
| /// <exception cref="ArgumentOutOfRangeException"> | |
| /// <paramref name="inputLength"/> is negative or the computed maximum compressed length exceeds <see cref="int.MaxValue"/>. | |
| /// </exception> | |
| public static int GetMaxCompressedLength(int inputLength) | |
| { | |
| ArgumentOutOfRangeException.ThrowIfNegative(inputLength); | |
| // compressBound() returns the upper bound for zlib-wrapped deflate output. | |
| // For raw deflate (no header/trailer) this slightly overestimates, which is safe. | |
| uint bound = Interop.ZLib.compressBound((uint)inputLength); | |
| if (bound > int.MaxValue) | |
| { | |
| throw new ArgumentOutOfRangeException(nameof(inputLength)); | |
| } | |
| return (int)bound; |
| public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten, bool isFinalBlock) { throw null; } | ||
| public void Dispose() { } | ||
| public System.Buffers.OperationStatus Flush(System.Span<byte> destination, out int bytesWritten) { throw null; } | ||
| public static long GetMaxCompressedLength(long inputLength) { throw null; } |
There was a problem hiding this comment.
GetMaxCompressedLength is part of the public surface area here and currently uses a long input/return type. If the intent is to follow the existing BrotliEncoder shape, consider using an int input size and throwing when the bound would exceed int.MaxValue (matching BrotliEncoder.GetMaxCompressedLength). This keeps the API consistent with Span-based limits and avoids returning sizes that can't back a Span.
| public static long GetMaxCompressedLength(long inputLength) { throw null; } | |
| public static int GetMaxCompressedLength(int inputLength) { throw null; } |
This PR introduces new span-based, streamless compression and decompression APIs for Deflate, ZLib, and GZip formats, matching the existing
BrotliEncoder/BrotliDecoderpattern.New APIs
DeflateEncoder/DeflateDecoderZLibEncoder/ZLibDecoderGZipEncoder/GZipDecoderThese classes provide:
Compress(),Decompress(), andFlush()TryCompress() andTryDecompress()for simple scenariosGetMaxCompressedLength()to calculate buffer sizesCloses #62113
Closes #39327
Closes #44793