Fix compute->host visibility in HybridForwardPass MoE path (#2) by pekkah · Pull Request #12 · pekkah/SharpInference

pekkah · 2026-04-29T11:00:52Z

Summary

Partial fix for issue #2 (Hybrid GPU+CPU MoE path producing NaN/garbled output). The
documented repro (-g 1 --tq on Qwen3-Coder 30B-A3B) now decodes coherent text at
~14.7 t/s instead of 0 NaN tokens.

Root cause (Bug 1, fixed)

HybridForwardPass.GpuMoeFfn writes the post-RmsNorm hidden state to a host-coherent
BAR buffer (_gpuPinnedNorm) via RecordComputeCopy, then submits and waits on a
fence so the CPU expert-fallback path can read it via MapPinned. On RTX 4070 Ti,
fence completion alone did not make the compute-shader writes visible to host reads
— MapPinned returned stale data, the CPU fallback consumed bogus normPtr values,
the resulting _cpuFallbackBuf was ~1000x out of range, and the residual carried the
corruption through every later layer as the garbled tokens reported in the issue.

Adding an explicit compute -> host pipeline barrier (SHADER_WRITE -> HOST_READ on
COMPUTE_SHADER -> HOST stages) immediately before EndRecordAndSubmit fixes it. The
new helper is VulkanBackend.RecordComputeToHostBarrier().

Residual issue (Bug 2, still open under #2)

Beyond ~-g 9 GPU layers on Qwen3-Coder 30B-A3B, the prefetcher-cached GPU expert
path still produces wrong output even with the host barrier. Disabling the prefetcher
entirely (forcing all experts through CPU fallback) restores correctness across the
full -g 1..-1 range, which points at the GPU-expert MatMul reading prefetched
weights — most likely descriptor-set reuse across multiple recorded dispatches in
ComputePipeline._reusableDs. That fix is a deeper rework and intentionally out of
scope here.

The CLI guard from PR #5 therefore stays in place by default. Set
SHARPI_ALLOW_BROKEN_MOE_HYBRID=1 to bypass the guard for further work on Bug 2.

Test plan

Original issue repro: SHARPI_ALLOW_BROKEN_MOE_HYBRID=1 ... -g 1 --tq -p \"Hello\"
produces coherent decode (was: 0 NaN tokens)
Sweep -g 1..9 with prefetcher-on: all produce coherent output
Sweep -g 1..-1 with prefetcher disabled (debug only): all produce coherent output
Default behavior unchanged: guard still refuses MoE on hybrid path with the
issue Hybrid GPU+CPU path broken for MoE models (GpuMoeFfn) #2 message
Full test suite green (excluding 2 pre-existing Vulkan failures unrelated to MoE)

🤖 Generated with Claude Code

`GpuMoeFfn` writes the post-RmsNorm hidden state to a host-coherent BAR buffer (`_gpuPinnedNorm`) so the CPU expert-fallback path can read it via `MapPinned` after the mid-FFN submit. Fence completion alone did not make the compute-shader writes visible to host reads on RTX 4070 Ti — MapPinned returned stale data and the CPU fallback produced wildly out-of-range output (~1062 magnitudes), which propagated through residuals as the garbled MoE tokens reported in issue #2. Add `VulkanBackend.RecordComputeToHostBarrier` (a SHADER_WRITE -> HOST_READ pipeline barrier on COMPUTE_SHADER -> HOST stages) and call it in `GpuMoeFfn` immediately before `EndRecordAndSubmit`. With this barrier the issue's exact repro (`-g 1 --tq` on Qwen3-Coder 30B-A3B) now decodes coherent text at ~14.7 t/s instead of 0 NaN tokens. Add `SHARPI_ALLOW_BROKEN_MOE_HYBRID=1` CLI escape hatch so the not-yet-fixed prefetcher path can still be exercised for further investigation; the existing guard remains in place because the prefetcher GPU-expert path still corrupts output beyond ~-g 9 (likely descriptor-set reuse in `ComputePipeline._reusableDs` across multiple recorded dispatches; out of scope for this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pekkah and others added 2 commits April 29, 2026 14:00

README: note partial fix and SHARPI_ALLOW_BROKEN_MOE_HYBRID for issue #2

c5d9c6a

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pekkah merged commit ea9ff3f into master Apr 29, 2026
1 check passed

pekkah deleted the fix/issue-2-moe-hybrid-host-barrier branch April 29, 2026 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compute->host visibility in HybridForwardPass MoE path (#2)#12

Fix compute->host visibility in HybridForwardPass MoE path (#2)#12
pekkah merged 2 commits into
masterfrom
fix/issue-2-moe-hybrid-host-barrier

pekkah commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pekkah commented Apr 29, 2026

Summary

Root cause (Bug 1, fixed)

Residual issue (Bug 2, still open under #2)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant