Skip to content

[Wasm RyuJIT] Block Stores pt. 2#124846

Merged
kg merged 29 commits intodotnet:mainfrom
kg:wasm-blockstore-2
Mar 2, 2026
Merged

[Wasm RyuJIT] Block Stores pt. 2#124846
kg merged 29 commits intodotnet:mainfrom
kg:wasm-blockstore-2

Conversation

@kg
Copy link
Member

@kg kg commented Feb 25, 2026

  • Implement genCodeForCpObj
  • Fix two NIY scenarios in the stackifier
  • Fix generated stores not being lowered after RewriteLocalStackStore
  • Implement isContainableMemoryOp
  • Expand and add comments to ease development

This is sufficient to compile the following to valid WASM (It crashes when attempting to call the write barrier helper):

    [StructLayout(LayoutKind.Explicit)]
    public struct S2 {
        [FieldOffset(0)]
        public string A;
        [FieldOffset(8)]
        public int b;
        [FieldOffset(16)]
        public string C;
        [FieldOffset(24)]
        public int d;
    }

    [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
    static unsafe void copyStructWithRefs (ref S2 a, ref S2 b) {
        a = b;
    }

And the following just plain works, since copies to the stack don't need a write barrier:

    [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
    static unsafe void loadStructWithRefs (ref S2 src) {
        S2 local = src;
    }

The following now works thanks to implementing isContainableMemoryOp and fixing a related bug:

    [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
    static unsafe void fillStructFromLocal (ref S2 dest) {
        S2 local = default;
        local.b = 37;
        dest = local;
    }

Fixes #124903

@kg kg added the arch-wasm WebAssembly architecture label Feb 25, 2026
Copilot AI review requested due to automatic review settings February 25, 2026 05:52
@kg kg added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 25, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Reorder invariant nodes in simple scenarios in stackifier

Jitdump when moving nodes in stackifier

When regallocwasm creates a new store node, lower it
Apply regallocwasm fix from andy

Checkpoint

Checkpoint

Add comment

Speculatively implement the dstOnStack optimization (code that hits it doesn't compile yet)
@kg kg force-pushed the wasm-blockstore-2 branch from 867f769 to 4c8682d Compare February 25, 2026 05:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the WASM RyuJIT backend’s block-store support by adding cpobj codegen, addressing a stackifier NIY case, and ensuring newly generated stores are lowered post-rewrite to keep the pipeline consistent.

Changes:

  • Implement CodeGen::genCodeForCpObj for GT_STORE_BLK cpobj unrolling on WASM.
  • Extend WASM regalloc to track multi-use operands for GT_STORE_BLK and to re-lower stores created by RewriteLocalStackStore.
  • Improve lowering/stackifier behavior: mark cpobj operands as multiply-used and relax one stackifier NIY by moving invariant nodes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/coreclr/jit/regallocwasm.h Adds CollectReferencesForBlockStore declaration for block store multi-use tracking.
src/coreclr/jit/regallocwasm.cpp Implements block-store reference collection and re-lowers stores created during local-store rewrite.
src/coreclr/jit/lowerwasm.cpp Marks cpobj operands as MultiplyUsed and adjusts stackifier handling for invariant nodes.
src/coreclr/jit/gentree.cpp Improves WASM dump text for native block-store opcodes (memory.copy vs memory.fill).
src/coreclr/jit/compiler.h Adds WasmRegAlloc friend access for invoking lowering post-rewrite.
src/coreclr/jit/codegenwasm.cpp Implements cpobj unrolled copying sequence (load/store vs helper call) and adds pointer-sized instruction aliases.
Comments suppressed due to low confidence (5)

src/coreclr/jit/codegenwasm.cpp:2368

  • The call to genEmitHelperCall(CORINFO_HELP_ASSIGN_BYREF, ...) is missing the required SP argument. CodeGen::genEmitHelperCall explicitly notes that for WASM helper calls, the stack-pointer argument must be first on the value stack (below any other args). Not pushing SP here would mismatch the helper signature and can explain the reported crash when calling the write barrier helper.

Suggestion: push GetStackPointerReg() (via local.get) as the first argument before pushing the destination/source byrefs, for each helper call site (or refactor to a small helper that emits the correct argument sequence).

        case PackOperAndType(GT_LE, TYP_DOUBLE):
            ins = INS_f64_le;
            break;
        case PackOperAndType(GT_GE, TYP_FLOAT):

src/coreclr/jit/codegenwasm.cpp:2324

  • genCodeForCpObj calls genConsumeRegs(cpObjNode), which only updates liveness for the GT_STORE_BLK node itself and does not consume/update liveness for its operands. This differs from patterns like genCodeForStoreInd, and can lead to incorrect liveness (and thus wrong code) for Addr()/Data().

Suggestion: consume the destination address and source address/value explicitly (e.g., using genConsumeAddress(dstAddr) and genConsumeRegs(...) in the correct execution order, or genConsumeOperands if appropriate) before emitting the copy sequence, and then do the usual life update for the store node.

        // So we can re-express say GT_GE (UN) as !GT_LT
        //

src/coreclr/jit/codegenwasm.cpp:2373

  • gcPtrCount is initialized to the total GC pointer slot count, but it is only decremented in the write-barrier path. When dstOnStack is true, GC pointer slots are copied via the non-WB load/store path and gcPtrCount will never reach 0, causing the final assert(gcPtrCount == 0) to fire in debug builds.

Suggestion: either decrement gcPtrCount whenever layout->IsGCPtr(i) (regardless of dstOnStack), or move the gcPtrCount accounting + assert under the !dstOnStack branch (similar to other target implementations).

    instruction ins;
    switch (PackOperAndType(op, treeNode->gtOp1->TypeGet()))
    {
        case PackOperAndType(GT_EQ, TYP_FLOAT):
            ins = INS_f32_eq;
            break;
        case PackOperAndType(GT_EQ, TYP_DOUBLE):
            ins = INS_f64_eq;
            break;
        case PackOperAndType(GT_NE, TYP_FLOAT):
            ins = INS_f32_ne;
            break;
        case PackOperAndType(GT_NE, TYP_DOUBLE):
            ins = INS_f64_ne;
            break;
        case PackOperAndType(GT_LT, TYP_FLOAT):
            ins = INS_f32_lt;
            break;
        case PackOperAndType(GT_LT, TYP_DOUBLE):
            ins = INS_f64_lt;
            break;
        case PackOperAndType(GT_LE, TYP_FLOAT):
            ins = INS_f32_le;
            break;
        case PackOperAndType(GT_LE, TYP_DOUBLE):
            ins = INS_f64_le;
            break;
        case PackOperAndType(GT_GE, TYP_FLOAT):
            ins = INS_f32_ge;
            break;
        case PackOperAndType(GT_GE, TYP_DOUBLE):
            ins = INS_f64_ge;
            break;

src/coreclr/jit/codegenwasm.cpp:2364

  • The offset computation for the write-barrier helper path hard-codes INS_i32_const/INS_i32_add, but the address locals may be i64 when TARGET_64BIT (and you already abstracted other pointer-sized ops via INS_I_*). This will produce invalid wasm or incorrect values on 64-bit.

Suggestion: use the pointer-sized const/add instructions (INS_I_const/INS_I_add) and the appropriate emitAttr for the constant (or select i32 vs i64 based on TARGET_64BIT) so the address arithmetic matches the address type.

        case PackOperAndType(GT_LT, TYP_DOUBLE):
            ins = INS_f64_lt;
            break;
        case PackOperAndType(GT_LE, TYP_FLOAT):
            ins = INS_f32_le;
            break;

src/coreclr/jit/codegenwasm.cpp:2310

  • noway_assert(source->IsLocal()) / noway_assert(dstAddr->IsLocal()) will hard-fail compilation in non-DEBUG builds if either operand is not a local. Nothing in LowerBlockStore guarantees Addr() is a local, and other targets handle arbitrary address expressions here.

Suggestion: remove these noway_asserts and rely on GetMultiUseOperandReg (with MultiplyUsed marking where needed) to support non-local address expressions; if some shapes are truly unsupported for now, prefer an explicit NYI/IMPL_LIMITATION gate instead of a noway_assert in release builds.


    genTreeOps op          = treeNode->OperGet();

Copilot AI review requested due to automatic review settings February 25, 2026 15:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings February 25, 2026 18:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

@kg
Copy link
Member Author

kg commented Feb 25, 2026

@dotnet/jit-contrib Not ready to merge but I don't think it can reach ready without human review. Thanks to Single for walking me through a lot of the tricky parts.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

@SingleAccretion SingleAccretion self-requested a review February 26, 2026 20:31
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 27, 2026 16:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Copilot AI review requested due to automatic review settings February 27, 2026 17:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

Copy link
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo SetMultiplyUsed nit.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a few formatting nits you can fix in a follow-up PR.

@kg kg merged commit b0bc48e into dotnet:main Mar 2, 2026
131 of 133 checks passed
kg added a commit that referenced this pull request Mar 3, 2026
SingleAccretion pushed a commit to SingleAccretion/runtime that referenced this pull request Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-wasm WebAssembly architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Wasm RyuJIT] Local field/var stores are broken

5 participants