Add APIs to BlobBuilder for customizing the underlying byte array et al.#115294
Add APIs to BlobBuilder for customizing the underlying byte array et al.#115294teo-tsirpanis wants to merge 35 commits intodotnet:mainfrom
BlobBuilder for customizing the underlying byte array et al.#115294Conversation
|
Note regarding the |
1 similar comment
|
Note regarding the |
There was a problem hiding this comment.
Pull Request Overview
This PR adds new APIs to BlobBuilder for customizing the underlying byte array and updates related encoding and buffer-handling logic across metadata and core libraries. Key changes include replacing legacy UTF-8 encoding code with calls to the new System.Text.Unicode.Utf8 APIs, updating BlobBuilder’s API surface (including new constructors and properties), and adding NET-specific intrinsics support across several core modules.
Reviewed Changes
Copilot reviewed 31 out of 36 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| System/Reflection/Internal/Utilities/StreamExtensions.cs | Removed obsolete TryReadAll overload for Span to rely on newer API paths. |
| System/Reflection/Internal/Utilities/BlobUtilities.cs | Rewrote WriteUtf8 to use Utf8.FromUtf16 for encoding, replacing manual UTF-8 encoding logic. |
| System/Reflection/Metadata.cs | Added new BlobBuilder constructors, properties, and APIs including ReadOnlySpan/WriteBytes overloads. |
| System.Private.CoreLib (various files) | Updated intrinsics and preprocessor conditions (#if NET, #if SYSTEM_PRIVATE_CORELIB) for newer vectorized and ASCII helper routines. |
| Microsoft.Bcl.Memory (PACKAGE.md and others) | Updated documentation and type forwarding to include UTF-8 APIs for NET platforms. |
Files not reviewed (5)
- src/libraries/Microsoft.Bcl.Memory/src/Microsoft.Bcl.Memory.csproj: Language not supported
- src/libraries/Microsoft.Bcl.Memory/tests/Microsoft.Bcl.Memory.Tests.csproj: Language not supported
- src/libraries/System.Reflection.Metadata/System.Reflection.Metadata.sln: Language not supported
- src/libraries/System.Reflection.Metadata/src/Resources/Strings.resx: Language not supported
- src/libraries/System.Reflection.Metadata/src/System.Reflection.Metadata.csproj: Language not supported
...braries/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BlobUtilities.cs
Show resolved
Hide resolved
src/libraries/System.Reflection.Metadata/ref/System.Reflection.Metadata.cs
Show resolved
Hide resolved
Is this needed to introduce the new APIs? System.Text.Unicode.Utf8 change introduces a new dependency for System.Reflection.Metadata on .NET Framework that will be an extra work to push through the system. It would be better to avoid bundling the two changes together in a single PR. |
The other PR is approved and ready to merge #111292. After the merge and rebase this branch against main, those commits will disappear. |
The change that introduces System.Reflection.Metadata dependency on Microsoft.Bcl.Memory won't disappear. |
0638427 to
b3c957d
Compare
|
Switched back to the old Tests pass locally. This is ready for review. |
What''s the performance regression introduced by this rewrite on .NET Framework? Our primary interest in removing unsafe code is on latest .NET. It is fine to keep unsafe code for .NET Framework if it is required for decent performance. |
|
I updated the function to use unsafe code and wrote a benchmark to compare it with my initial safe edition. We cannot compare it with the existing unsafe implementation since the functions don't have the same signature and semantics. The numbers look promising so I switched to the
Benchmark code// See https://aka.ms/new-console-template for more information
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.InteropServices;
using System.Text;
BenchmarkRunner.Run<Utf8Bench>();
public class Utf8Bench
{
public string TestString = null!;
public byte[] TestBytes = new byte[2048];
[Params(16, 128)]
public int N { get; set; }
[GlobalSetup]
public void Setup()
{
var sb = new StringBuilder();
for (int i = 0; i < N; i++)
{
sb.Append('a');
}
for (int i = 0; i < N; i++)
{
sb.Append('Θ');
}
for (int i = 0; i < N; i++)
{
sb.Append("😂");
}
TestString = sb.ToString();
TestBytes = new byte[2048];
}
[Benchmark(Baseline = true)]
public int Safe()
{
WriteUtf8Safe(TestString.AsSpan(), TestBytes, out int charsRead, out int bytesWritten, true);
return charsRead + bytesWritten;
}
[Benchmark]
public int Unsafe()
{
WriteUtf8Unsafe(TestString.AsSpan(), TestBytes, out int charsRead, out int bytesWritten, true);
return charsRead + bytesWritten;
}
public static void WriteUtf8Safe(ReadOnlySpan<char> source, Span<byte> destination, out int charsRead, out int bytesWritten, bool allowUnpairedSurrogates)
{
// Copy from PR
}
public static unsafe void WriteUtf8Unsafe(ReadOnlySpan<char> source, Span<byte> destination, out int charsRead, out int bytesWritten, bool allowUnpairedSurrogates)
{
// Copy from PR
}
} |
f0e47b1 to
a5921ee
Compare
| /// <summary> | ||
| /// Changes the size of the byte array underpinning the <see cref="BlobBuilder"/>. | ||
| /// Derived types can override this method to control the allocation strategy. | ||
| /// </summary> | ||
| /// <param name="capacity">The array's new size.</param> | ||
| /// <seealso cref="Capacity"/> | ||
| protected virtual void SetCapacity(int capacity) | ||
| { | ||
| Array.Resize(ref _buffer, Math.Max(MinChunkSize, capacity)); | ||
| } |
There was a problem hiding this comment.
I don't understand how subclasses are supposed to override this. They will reassign the Buffer property, but according to a comment in #100418, reassigning Buffer clears the head chunk.
What are the semantics? Should Capacity's setter have additional logic?
There was a problem hiding this comment.
I came up with an alternative design that removes the need for the SetCapacity method. A buffer is resized by allocating a new BlobBuilder, swapping the buffers of this and the new BlobBuilder (while copying any data), and freeing the new BlobBuilder that now contains the old buffer.
This results in a simplified set of methods to extend, with the caveat that linking pooled and non-pooled BlobBuilders will not be supported. Applications will have to ensure that they always use pooled builders, and override OnLinking to throw if that's not the case. I can imagine this is possible to guarantee in Roslyn.
Let me know what you think.
a5921ee to
b0d2baa
Compare
Co-authored-by: Jared Parsons <jared@paranoidcoding.org>
|
I would like to see the numbers for the implementation in this PR and including walk clock time. |
|
It sounds like there's an assumption that we need to bring this new API surface to the .NET Framework polyfill. Why is that the case? |
This is a question for Roslyn (@jaredpar). These APIs are designed to optimize Roslyn. If Roslyn is fine with missing these APIs on netfx, we can certainly drop them there. |
I wrote a benchmark of .NET Framework
.NET 10
Sources are attached. I manually changed the TFM, the |
...braries/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BlobUtilities.cs
Outdated
Show resolved
Hide resolved
...braries/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BlobUtilities.cs
Outdated
Show resolved
Hide resolved
| set | ||
| { | ||
| if (value is null) | ||
| { | ||
| Throw.ArgumentNull(nameof(value)); | ||
| } | ||
|
|
||
| if (!IsHead) | ||
| { | ||
| Throw.InvalidOperationBuilderAlreadyLinked(); | ||
| } | ||
|
|
||
| _buffer = value; | ||
| _length = 0; | ||
| } |
There was a problem hiding this comment.
The Buffer property setter doesn't validate that the new buffer is at least MinChunkSize (16 bytes) long. This validation is performed in the constructor that accepts a buffer parameter, but not in the property setter. If a derived type sets Buffer to an array smaller than 16 bytes, subsequent operations may fail or cause undefined behavior.
There was a problem hiding this comment.
Added comment on why this is by design.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs
Outdated
Show resolved
Hide resolved
...s/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/MetadataBuilder.Heaps.cs
Show resolved
Hide resolved
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs
Show resolved
Hide resolved
… in place or expanding it. This makes the `BlobBuilder` API simpler to extend, at the cost of requiring `OnLinking` to check that blob builders with differing buffer management strategies cannot be mixed.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs
Outdated
Show resolved
Hide resolved
| private void ResizeChunk(int newLength) | ||
| { | ||
| BlobBuilder newChunk = AllocateChunkHelper(newLength); | ||
| OnLinking(this, newChunk); |
There was a problem hiding this comment.
ResizeChunk calls OnLinking(this, newChunk) even though the operation is just a buffer resize/reallocation (not truly linking two builders). The newChunk returned from AllocateChunk is never added to the chain—its buffer is stolen and it's immediately freed. This will trigger the OnLinking override (intended for detecting linking between different BlobBuilder types) on every capacity resize or Capacity set, which is semantically incorrect. Derived types that override OnLinking to enforce pool consistency will get spurious notifications. Consider separating the "allocate a new buffer" concern from the "link two builders" notification.
| OnLinking(this, newChunk); |
There was a problem hiding this comment.
We are calling OnLinking out of abundance of caution, to enforce pool consistency between the newly allocated chunk, because we are going to swap their buffers. Added comment explaining it.
Fixes #99244
Fixes #100418
This PR builds on top of @jaredpar's branch to add APIs for customizing the underlying buffer of a
BlobBuilder. The chunking logic ofBlobBuilderwas updated to allocate multiple additional chunks with a user-customizable maximum size each. As part of this, we use APIs fromSystem.Text.Unicode.Utf8to encode UTF-8 strings, which increases performance and safety, and reduces duplicate code.