Skip to content

Fix #74020: Optimize consecutive shifts in JIT Morph#122533

Closed
csa7mdm wants to merge 3 commits intodotnet:mainfrom
csa7mdm:fix/74020-jit-shift-opt
Closed

Fix #74020: Optimize consecutive shifts in JIT Morph#122533
csa7mdm wants to merge 3 commits intodotnet:mainfrom
csa7mdm:fix/74020-jit-shift-opt

Conversation

@csa7mdm
Copy link

@csa7mdm csa7mdm commented Dec 14, 2025

Description

This PR addresses issue #74020 by adding a peephole optimization in the JIT Morph phase (fgMorphSmpOpOptional) to combine consecutive right shift operations (GT_RSH/GT_RSZ) with constant shift amounts.

Per reviewer feedback from @tannergooding and @JulieLeeMSFT, the optimization was moved from Lowering to Morph so downstream passes (VN, CSE, LSRA) benefit from the simplified IR.

Changes

src/coreclr/jit/morph.cpp - Added case GT_RSH / case GT_RSZ to fgMorphSmpOpOptional:

  • Fold (x shift c1) shift c2 into x shift (c1+c2) for same-operation shifts
  • Handle RSH overshift with saturation to bitWidth - 1
  • Handle RSZ overshift by folding to zero constant
  • Guarded by fgGlobalMorph

Verification

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 14, 2025
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 14, 2025
@csa7mdm csa7mdm force-pushed the fix/74020-jit-shift-opt branch 2 times, most recently from 0b10860 to cca8115 Compare December 15, 2025 12:43
@vcsjones vcsjones added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 17, 2025
Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I do still think this would be better to handle in morph and that it would be good to track RSH(x, n) for 0 < n < bitWidth as being "never negative", but will defer to other input.

CC. @dotnet/jit-contrib for secondary review

@JulieLeeMSFT
Copy link
Member

LGTM. I do still think this would be better to handle in morph and that it would be good to track RSH(x, n) for 0 < n < bitWidth as being "never negative", but will defer to other input.

@csa7mdm, I also think it would be better to handle it in morph. Do you want to try that?

@JulieLeeMSFT JulieLeeMSFT added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 26, 2026
@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 26, 2026
@csa7mdm
Copy link
Author

csa7mdm commented Jan 27, 2026

n workflow(s) awaiting approva

@csa7mdm
Copy link
Author

csa7mdm commented Jan 27, 2026

n workflow(s) awaiting approval

@JulieLeeMSFT
Copy link
Member

Approved workflow to run.

@adamperlin
Copy link
Contributor

@csa7mdm it looks like the latest CI run failed if you want to take a look?

@dotnet-policy-service dotnet-policy-service bot added the linkable-framework Issues associated with delivering a linker friendly framework label Feb 24, 2026
@csa7mdm csa7mdm force-pushed the fix/74020-jit-shift-opt branch from b9cbf68 to fc342c4 Compare February 24, 2026 00:59
@csa7mdm csa7mdm changed the title Fix #74020: Optimize consecutive shifts in JIT Lowering Fix #74020: Optimize consecutive shifts in JIT Morph Feb 24, 2026
@csa7mdm
Copy link
Author

csa7mdm commented Feb 25, 2026

The two CI failures are unrelated to this change and are a known flaky infrastructure issue:

  • runtime (Build android-x64 Release AllSubsets_CoreCLR_Smoke) failed on System.Net.Http.Functional.Tests (Android emulator network certificate issue)
  • runtime (top-level) failed because it aggregates the above

The Build Analysis check (merge gate) passed with success, and all 57 JIT/CoreCLR-specific checks (superpmi-diffs, hardware-intrinsics, CoreCLR builds across all platforms) passed. This is flagged as a Known test error by the CI system.

@EgorBo
Copy link
Member

EgorBo commented Feb 25, 2026

Also, the PR seems to have 0 diffs
so it needs a good test coverage (because it's currently basically a dead code for CI) and, probably, some justification on where such patterns may appear.

Add peephole optimization in fgMorphSmpOpOptional to fold consecutive
same-operation right shifts with constant amounts:
  (x shift c1) shift c2 -> x shift (c1 + c2)

Handles overshift: RSH saturates to bitWidth-1, RSZ folds to zero.
Guarded by fgGlobalMorph, side-effect checks, and multi-reg checks.

Includes comprehensive tests in ShiftCombining.cs covering RSH/RSZ
for int32/int64, overshift saturation, boundary cases, and the
original division pattern from issue dotnet#74020.
@csa7mdm csa7mdm force-pushed the fix/74020-jit-shift-opt branch from 4d4423c to b90ca38 Compare February 25, 2026 23:37
@csa7mdm
Copy link
Author

csa7mdm commented Feb 25, 2026

Thanks @EgorBo for the review!

Addressed both concerns:

  1. Side-effects: Added !(op1->gtFlags & (GTF_ALL_EFFECT | GTF_SET_FLAGS)) and !op1->IsMultiRegNode() guards. The folding now bails out entirely if the inner shift subtree has any side effects, so op1.op1 side effects are preserved.

  2. Test coverage: Added ShiftCombining.cs in src/tests/JIT/opt/InstructionCombining/ with comprehensive tests covering:

Justification: The pattern appears when C# code does repeated integer division by powers of two (x / c1 / c2), which the JIT lowers to consecutive right shifts. Without this optimization, each division produces a separate shr instruction.

@csa7mdm
Copy link
Author

csa7mdm commented Feb 26, 2026

Thanks @EgorBo for the review!

Addressed both concerns:

  1. Side-effects: Added !(op1->gtFlags & (GTF_ALL_EFFECT | GTF_SET_FLAGS)) and !op1->IsMultiRegNode() guards. The folding now bails out if the inner shift subtree has any side effects.

  2. Test coverage: Added ShiftCombining.cs in src/tests/JIT/opt/InstructionCombining/ with tests covering RSH/RSZ combining (int32/int64), overshift saturation/zero, boundary cases, and the original x / 4 / 8 division pattern from Unnecessary shr emitted for repeated multiplication #74020.

Justification: The pattern appears when C# code does repeated integer division by powers of two (x / c1 / c2), which lowers to consecutive right shifts.

@csa7mdm
Copy link
Author

csa7mdm commented Mar 2, 2026

Thanks @EgorBo! Based on your detailed review and looking at this from a broader architectural perspective, I've implemented the following refactoring:

1. Stripped Overshift Path

Upon deeper root cause analysis, I agree with your assessment - handling over-shifts (shifting beyond bitWidth) in this specific peephole optimization is fragile. Since the result of an over-shift depends heavily on sign characteristics (0 vs AllBitsSet) and bit boundaries, it's safer and structurally cleaner to let other dedicated constant-folding passes or cns 0 simplifications tackle those edge cases.

2. Side Effects Handling

Given that we are removing the over-shift transformation (which previously would aggressively eliminate the inner shift's operand entirely and drop its side-effects), we no longer need the complex gtWrapWithSideEffects mitigation or conservative GTF flag checks. Instead, we safely reparent op1->gtGetOp1() directly into the outer shift, which structurally preserves any side-effects within the tree naturally.

3. JIT API Correctness & Consistency

Addressed all specific architectural nits to adhere strictly to JIT engineering guidelines:

  • Replaced raw assignments with safe API usage (BashToConst(combined)).
  • Added fgUpdateConstTreeValueNumber(op2) to maintain value-numbering consistency.
  • Purged raw field accesses (AsOp()->gtOp2) in favor of encapsulation (gtGetOp1()/gtGetOp2()).
  • Removed extraneous diagnostic logging overhead (OpName from JITDUMP).
  • Removed redundant fgGlobalMorph condition.

4. Semantic Testing Verification

Rewrote the test coverage. Using [MethodImpl(MethodImplOptions.NoInlining)] directly on the assertion blocks effectively hid the shift tree from the JIT, because the inner operands were constant-folded prior to Morph execution. The tests now pass unknown arguments down into [NoInlining] shift-operation methods (x >> c1 >> c2), strictly verifying runtime IL transformation rather than pre-compilation constant folding.

@csa7mdm csa7mdm closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member linkable-framework Issues associated with delivering a linker friendly framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants