Partition some pools by core#40476
Conversation
|
Should probably measure the two different pool changes independently and then together. |
|
@BrennanConroy The Sockets change is only measurable when the memory block one is fixed, otherwise it's hidden. I can run one benchmark with memory / memory + socket at least to show it matters, but will do add the extended benchmarks only using the two changes, otherwise it's too many combinations. |
I see, makes sense 👍 |
|
How does the benchmark results look like? |
|
@sebastienros - should we have the YARP benchmarks running on this hardware? |
adamsitnik
left a comment
There was a problem hiding this comment.
I would love the see the JSON Platform results using these changes (it's the best benchmark for testing max throughput)
|
|
||
| public PinnedBlockMemoryPool() | ||
| { | ||
| _queues = new ConcurrentQueue<MemoryPoolBlock>[Environment.ProcessorCount]; |
There was a problem hiding this comment.
similarly, to https://github.com/dotnet/runtime/blob/07e87bc4cb0358f57e7116e047f7d2017b049cf9/src/libraries/System.Private.CoreLib/src/System/Buffers/TlsOverPerCoreLockedStacksArrayPool.cs#L301 I think it would be good to have some upper limit
|
|
||
| var partition = Thread.GetCurrentProcessorId() % _queues.Length; | ||
|
|
||
| if (_queues[partition].TryDequeue(out var block)) |
There was a problem hiding this comment.
What if Thread.GetCurrentProcessorId() changes between Rent and Return operations and the pool gets starved?
should it stop on first failure or try to dequeue from another partition?
There was a problem hiding this comment.
It likely would be better to assign a pooled item with a partition and return it to the same partition it was taken from. I had seen this issue before and recently again in testing with thread pool changes along with this change, where there is still a fair bit of allocation even with this change. The issue is that currently, some threads rent buffers and in many cases other threads return them, so the threads renting buffers will eventually run out and will have to allocate. Returning the buffers (or pooled items similarly for SocketSenderPool) to the same queue that it came from would maintain some balance between the caches, based on what I've seen it solves the allocation issues. The issue I saw with this change may not exist or be visible currently, but it's looming and I don't think there's much reason to not fix it.
| private const int MaxQueueSize = 1024; // REVIEW: Is this good enough? | ||
|
|
||
| private readonly ConcurrentQueue<SocketSender> _queue = new(); | ||
| private readonly ConcurrentQueue<SocketSender>[] _queues; |
There was a problem hiding this comment.
would it be possible to just disable the SocketSenders pooling for machines with large amount of cpu cores?
|
@sebastienros, any update on this PR? Would like to have this for .NET 7 for further experiments. |
|
Will come back to it very soon, as soon as I am done with output caching (#41037) |
|
Replaced by #42237 |
Benchmarks to follow