JIT: Check for potential store-to-load forwarding before reordering ldr -> ldp#105695
Conversation
…dr -> ldp Very targeted fix for dotnet#93401 and dotnet#101437: before reordering two indirections, check if there is a potential store in the same loop that looks like it could end up being a candidate for store-to-load forwarding into one of those indirections. Some hardware does not handle store-to-load forwarding with the same fidelity when `stp`/`ldp` is involved compared to multiple `str`/`ldr`. If we detect the situation then avoid doing the reordering.
|
@EgorBot -arm64 --disasm --envvars "DOTNET_JitDisasm:List" // Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using BenchmarkDotNet.Attributes;
namespace System.Collections
{
public class AddGivenSize
{
private int[] _uniqueValues;
public int Size = 512;
[GlobalSetup]
public void Setup() => _uniqueValues = Enumerable.Range(0, Size).ToArray();
[Benchmark]
public List<int> List()
{
var collection = new List<int>(Size);
var uniqueValues = _uniqueValues;
for (int i = 0; i < uniqueValues.Length; i++)
collection.Add(uniqueValues[i]);
return collection;
}
}
} |
Benchmark results on Arm64
|
|
cc @dotnet/jit-contrib PTAL @EgorBo @AndyAyersMS Diffs. Regressions as one would expect. I had feared TP regressions would be bad, but apparently not so. I wonder if similar implementation could be used to allow if-conversion in loops, or if we should perhaps try for some more general loop-carried dependency analysis. |
AndyAyersMS
left a comment
There was a problem hiding this comment.
I suspect this only really matters for small loops where the loads are on the critical path, so a very small budget might suffice.
| } | ||
| }; | ||
|
|
||
| pushPreds(m_block); |
There was a problem hiding this comment.
Lower runs in linear block order, is there any concern that we might not have yet lowered some of these preds?
There was a problem hiding this comment.
I presume it's not important since it's just a heuritics and STORIND for primitives look generally the same before & after lower (except the addressing modes)
There was a problem hiding this comment.
Yeah, I don't think there is a concern there. gtPeelOffsets that this is using works both on pre and post lowered forms of address modes.
| } | ||
| }; | ||
|
|
||
| pushPreds(m_block); |
There was a problem hiding this comment.
I presume it's not important since it's just a heuritics and STORIND for primitives look generally the same before & after lower (except the addressing modes)
|
Improvements from this PR: |
Very targeted fix for #93401 and #101437: before reordering two indirections, check if there is a potential store in the same loop that looks like it could end up being a candidate for store-to-load forwarding into one of those indirections. Some hardware does not handle store-to-load forwarding with the same fidelity when
stp/ldpis involved compared to multiplestr/ldr.The detection is done by a graph walk that starts at the indirection and then walks backwards until it finds a store that would reach the indirection. The walk is limited to stay within the same loop as the indirections, and also limited by a budget of 100 nodes visited (for a large loop we expect this to not be as important).
If we detect the situation then avoid doing the reordering.
Not so happy with the complexity. I would probably rather disable this transformation entirely, but this is much more surgical.
Fix #93401
Fix #101437