win: fix cuda build by dhiltgen · Pull Request #3204 · ml-explore/mlx

dhiltgen · 2026-03-04T19:18:41Z

Adjust recent CUDA changes to build on windows.

Proposed changes

HEAD on main currently fails to build on Windows with CUDA enabled. This gets the build working.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

mlx/backend/cuda/quantized/qmm/qmm.cpp

zcbenz

Thanks!

Adjust recent cuda changes to build on windows.

When testing with a Go client, eval can be called on different OS threads on windows, which causes crashes from the CUDA device not being properly initialized. This manifests as sporadic crashing during JIT.

dhiltgen · 2026-03-07T22:40:06Z

While testing more deeply I noticed sporadic JIT crashes on Windows CUDA and traced it down to missing device initialization, which I've fixed in a discrete commit. Let me know if you want that split out to a different PR.

zcbenz · 2026-03-08T02:30:20Z

The new change looks good to me, thanks for fixing it! Still need to wait for the CI environment to get fixed before I can merge.

cuDNN creates a new graph each time which is expensive with WDDM.

dhiltgen · 2026-03-08T03:49:26Z

I fixed another performance issue. I can move it to another PR as well if needed. It looks like cuDNN should be disabled by default on Windows due to how it interacts with WDDM. Using mlx-community/Qwen3-0.6B-4bit to test with an RTX 6000 Ada on Win 11:

  ┌───────────┬─────────────────┬───────────────────┐
  │   Mode    │ Prefill (p2048) │ Generation (g128) │
  ├───────────┼─────────────────┼───────────────────┤
  │ cuDNN ON  │ 2,022 tok/s     │ 10.4 tok/s        │
  ├───────────┼─────────────────┼───────────────────┤
  │ cuDNN OFF │ 4,822 tok/s     │ 480 tok/s         │
  └───────────┴─────────────────┴───────────────────┘

zcbenz · 2026-03-08T07:15:04Z

I'm good disabling cuDNN SDPA for Windows, but can you check with NVIDIA whether this is something that can be fixed? Our fallback kernel is not good for prefill and long context decoding.

zcbenz reviewed Mar 4, 2026

View reviewed changes

mlx/backend/cuda/quantized/qmm/qmm.cpp Outdated Show resolved Hide resolved

zcbenz approved these changes Mar 5, 2026

View reviewed changes

dhiltgen added 3 commits March 7, 2026 11:06

win: fix cuda build

95eb9ef

Adjust recent cuda changes to build on windows.

review comments

a16274b

new fixes for rebase, fix compiler unknown flag warnings

7c3b2f8

dhiltgen force-pushed the fix_win_build branch from eb2a561 to 7c3b2f8 Compare March 7, 2026 20:39

Fix threading bug on JIT compilation

333762d

When testing with a Go client, eval can be called on different OS threads on windows, which causes crashes from the CUDA device not being properly initialized. This manifests as sporadic crashing during JIT.

disable cuDNN on windows by default

2580ff3

cuDNN creates a new graph each time which is expensive with WDDM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

win: fix cuda build#3204

win: fix cuda build#3204
dhiltgen wants to merge 5 commits intoml-explore:mainfrom
dhiltgen:fix_win_build

dhiltgen commented Mar 4, 2026

Uh oh!

Uh oh!

zcbenz left a comment

Uh oh!

dhiltgen commented Mar 7, 2026

Uh oh!

zcbenz commented Mar 8, 2026

Uh oh!

dhiltgen commented Mar 8, 2026

Uh oh!

zcbenz commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhiltgen commented Mar 4, 2026

Proposed changes

Checklist

Uh oh!

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

dhiltgen commented Mar 7, 2026

Uh oh!

zcbenz commented Mar 8, 2026

Uh oh!

dhiltgen commented Mar 8, 2026

Uh oh!

zcbenz commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants