Conversation
Adjust recent cuda changes to build on windows.
When testing with a Go client, eval can be called on different OS threads on windows, which causes crashes from the CUDA device not being properly initialized. This manifests as sporadic crashing during JIT.
|
While testing more deeply I noticed sporadic JIT crashes on Windows CUDA and traced it down to missing device initialization, which I've fixed in a discrete commit. Let me know if you want that split out to a different PR. |
|
The new change looks good to me, thanks for fixing it! Still need to wait for the CI environment to get fixed before I can merge. |
cuDNN creates a new graph each time which is expensive with WDDM.
|
I fixed another performance issue. I can move it to another PR as well if needed. It looks like cuDNN should be disabled by default on Windows due to how it interacts with WDDM. Using mlx-community/Qwen3-0.6B-4bit to test with an RTX 6000 Ada on Win 11: |
|
I'm good disabling cuDNN SDPA for Windows, but can you check with NVIDIA whether this is something that can be fixed? Our fallback kernel is not good for prefill and long context decoding. |
Adjust recent CUDA changes to build on windows.
Proposed changes
HEAD on main currently fails to build on Windows with CUDA enabled. This gets the build working.
Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes