ARM64-SVE: Implement IF_SVE_DW_2A, IF_SVE_DW_2B, IF_SVE_EB_1B by amanasifkhalid · Pull Request #97800 · dotnet/runtime

amanasifkhalid · 2024-02-01T02:57:01Z

Part of #94549. Note that the preferred disassembly for FMOV <Zd>.<T>, #0.0 is MOV <Zd>.<T>, 0, per the SVE docs: FMOV is a pseudo-instruction of DUP, which is aliased by MOV.

cstool output:

00C03825  mov   z0.b, #0
01C07825  mov   z1.h, #0
02C0B825  mov   z2.s, #0
03C0F825  mov   z3.d, #0
10702025  pext  p0.b, pn8[0]
31716025  pext  p1.h, pn9[1]
5272A025  pext  p2.s, pn0xA[2]
7373E025  pext  p3.d, pn0xB[3]
98742025  pext  { p8.b, p9.b }, pn0xC[0]
B9756025  pext  { p9.h, p10.h }, pn0xD[1]
DA74A025  pext  { p10.s, p11.s }, pn0xE[0]
FB75E025  pext  { p11.d, p12.d }, pn0xF[1]

JitDisasm output:

mov     z0.b, #0
mov     z1.h, #0
mov     z2.s, #0
mov     z3.d, #0
pext    p0.b, pn8[0]
pext    p1.h, pn9[1]
pext    p2.s, pn10[2]
pext    p3.d, pn11[3]
pext    { p8.b, p9.b }, pn12[0]
pext    { p9.h, p10.h }, pn13[1]
pext    { p10.s, p11.s }, pn14[0]
pext    { p11.d, p12.d }, pn15[1]

cc @dotnet/arm64-contrib

ghost · 2024-02-01T02:57:13Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Note that the preferred disassembly for FMOV <Zd>.<T>, #0.0 is MOV <Zd>.<T>, 0, per the SVE docs: FMOV is a pseudo-instruction of DUP, which is aliased by MOV.

cstool output:

00C03825  mov   z0.b, #0
01C07825  mov   z1.h, #0
02C0B825  mov   z2.s, #0
03C0F825  mov   z3.d, #0
10702025  pext  p0.b, pn8[0]
31716025  pext  p1.h, pn9[1]
5272A025  pext  p2.s, pn0xA[2]
7373E025  pext  p3.d, pn0xB[3]
98742025  pext  { p8.b, p9.b }, pn0xC[0]
B9756025  pext  { p9.h, p10.h }, pn0xD[1]
DA74A025  pext  { p10.s, p11.s }, pn0xE[0]
FB75E025  pext  { p11.d, p12.d }, pn0xF[1]

JitDisasm output:

mov     z0.b, #0
mov     z1.h, #0
mov     z2.s, #0
mov     z3.d, #0
pext    p0.b, pn8[0]
pext    p1.h, pn9[1]
pext    p2.s, pn10[2]
pext    p3.d, pn11[3]
pext    { p8.b, p9.b }, pn12[0]
pext    { p9.h, p10.h }, pn13[1]
pext    { p10.s, p11.s }, pn14[0]
pext    { p11.d, p12.d }, pn15[1]

cc @dotnet/arm64-contrib

Author:	amanasifkhalid
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

ryujit-bot · 2024-02-01T05:49:44Z

Diff results for #97800

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	+0.01%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.01%
libraries_tests.run.linux.arm64.Release.mch	+0.01%

MinOpts (+0.00% to +0.04%)

Collection	PDIFF
coreclr_tests.run.linux.arm64.checked.mch	+0.02%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.02%
libraries_tests.run.linux.arm64.Release.mch	+0.02%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.02%
libraries.pmi.linux.arm64.checked.mch	+0.01%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.02%
benchmarks.run.linux.arm64.checked.mch	+0.02%
realworld.run.linux.arm64.checked.mch	+0.04%

Details here

a74nh

LGTM

a74nh · 2024-02-01T14:54:53Z

src/coreclr/jit/emitarm64.cpp

        }

+        case IF_SVE_EB_1B: // ........xx...... ...........ddddd -- SVE broadcast integer immediate (unpredicated)
+            // ins is MOV for this encoding, as it is the preferred disassembly, so pass FMOV to emitInsCodeSve


This is annoying, and slightly different to aliasing in other instructions.

Alternatively, keep it as FMOV in emitIns_R() and then covert to MOV in emitDispInsHelp().

I'm not sure either way is better. Happy to keep as is in here.

Alternatively, keep it as FMOV in emitIns_R() and then covert to MOV in emitDispInsHelp().

I'd like to keep the instruction as FMOV in emitIns_R, but in emitDispInsHelp, in order to print it as a MOV, we would have to add a check for this case at the beginning of the method so we don't print the instruction before modifying it. My current approach is hacky, but it allows us to contain this behavior to the relevant switch case.

This is fine to me. Comment describes why so it's all good.

amanasifkhalid · 2024-02-01T16:54:15Z

@kunalspathak this change does have some TP impact. I suspect this is from me adding the insScalableOpts parameter to emitIns_R_R_I. Just pointing this out if we're looking for reasons to split out the SVE instructions into separate emitIns methods.

ryujit-bot · 2024-02-01T19:51:00Z

Diff results for #97800

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%

Details here

ryujit-bot · 2024-02-01T20:51:09Z

Diff results for #97800

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.00% to +0.01%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	+0.01%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%

MinOpts (+0.00% to +0.04%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.02%
libraries_tests.run.linux.arm64.Release.mch	+0.02%
realworld.run.linux.arm64.checked.mch	+0.04%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.02%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.02%
libraries.pmi.linux.arm64.checked.mch	+0.01%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.02%
coreclr_tests.run.linux.arm64.checked.mch	+0.02%

Details here

TIHan

LGTM.

I wouldn't worry too much about the TP regressions as I think I solved a chunk of it in #97739 by separating the SVE format cases in emitOutputInstr into their own function.

We could also separate out the SVE instruction cases in the emitIns functions into their own emitInsSve which could further help mitigate any TP regressions, but we don't have to do that now.

amanasifkhalid · 2024-02-02T21:17:13Z

Thanks for the reviews!

I wouldn't worry too much about the TP regressions as I think I solved a chunk of it in #97739 by separating the SVE format cases in emitOutputInstr into their own function.

I didn't realize we'd accumulated that much TP overhead from the SVE switch cases; >1% TP improvement is a lot...

We could also separate out the SVE instruction cases in the emitIns functions into their own emitInsSve which could further help mitigate any TP regressions, but we don't have to do that now.

emitOutputInstr is probably the worst offender, but I'd be curious to see how much more we can claw back by separating out the emitIns methods.

TIHan · 2024-02-02T21:29:49Z

I didn't realize we'd accumulated that much TP overhead from the SVE switch cases; >1% TP improvement is a lot...

The 1% TP improvement may not be accurate because it's running the ARM64 path on windows/x64 and it still counts the instructions generated for x64 too.

The TP regressions to really look at is the linux/arm64 running on linux/arm64, which improved modestly.

I'd be curious to see how much more we can claw back by separating out the emitIns methods.

I'd be curious as well. I already did it to emitIns_R_R_R_R in the PR, but it didn't change the TP, probably because emitIns_R_R_R_R is not common.

amanasifkhalid added 2 commits January 31, 2024 21:36

Implement IF_SVE_DW_2A, IF_SVE_DW_2B, IF_SVE_EB_1B

8859500

Fix comment

042b0cb

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 1, 2024

ghost assigned amanasifkhalid Feb 1, 2024

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 1, 2024

build-analysis bot mentioned this pull request Feb 1, 2024

System.Net.Security.Tests.SslStreamCertificateContextOcspLinuxTests.RefreshOcspResponse_BeforeExpiration test failure #97779

Closed

a74nh approved these changes Feb 1, 2024

View reviewed changes

Fix comment in emitarm64.h

0793edd

build-analysis bot mentioned this pull request Feb 1, 2024

System.Net.Security.Tests.SslStreamCertificateContextOcspLinuxTests.FetchOcspResponse_FirstInvalidThenValid test failure #97836

Closed

amanasifkhalid requested a review from kunalspathak February 1, 2024 21:30

TIHan approved these changes Feb 2, 2024

View reviewed changes

amanasifkhalid merged commit f568f42 into dotnet:main Feb 2, 2024

amanasifkhalid deleted the sve-pext branch February 2, 2024 21:42

amanasifkhalid mentioned this pull request Feb 2, 2024

Arm64: Implement SVE encodings #94549

Closed

github-actions bot locked and limited conversation to collaborators Mar 4, 2024

Conversation

amanasifkhalid commented Feb 1, 2024

Uh oh!

ghost commented Feb 1, 2024

Uh oh!

ryujit-bot commented Feb 1, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Uh oh!

a74nh left a comment

Choose a reason for hiding this comment

Uh oh!

a74nh Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

TIHan Feb 2, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Feb 1, 2024

Uh oh!

ryujit-bot commented Feb 1, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Uh oh!

ryujit-bot commented Feb 1, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Uh oh!

TIHan left a comment

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Feb 2, 2024

Uh oh!

TIHan commented Feb 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TIHan commented Feb 2, 2024 •

edited

Loading