Skip to content

Conversation

@CarlGao4
Copy link
Contributor

@CarlGao4 CarlGao4 commented Dec 7, 2025

Fixes #1061, Fixes #1060

Current status: running CI on my fork


Changes:

  1. Updated Jimver/cuda-toolkit to 0.2.22
  2. Updated CUDA compiler to 12.8.1
  3. Fixes single CUDA architecture issue

Current supported GPUs:

  1. 6.1: Tesla P40 generation, GTX 10 Series, Quadro Pascal generation
  2. 7.0: V100
  3. 7.5: Tesla T4, GTX 16 Series, RTX 20 Series, Quadro Turing Series
  4. 8.0: A100
  5. 8.6: A40, Workstation Ampere Series, RTX 30 Series
  6. 8.9: L40, Workstation Ada Series, RTX 40 Series
  7. 9.0: H100, H200
  8. 10.0: B200
  9. 12.0: Workstation Blackwell Series, RTX 50 Series

@leejet
Copy link
Owner

leejet commented Dec 7, 2025

LGTM. Thanks.

@leejet leejet merged commit 0392273 into leejet:master Dec 7, 2025
@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 7, 2025

However, with this multiple-kernel CUDA support, build time would increase significantly, so I suggest adding a cache mechanism. I've used ccache on Linux C++ project and upload the cache to github, but I'm not sure whether it works with Windows and nvcc

@LostRuins
Copy link
Contributor

tbh this seems a bit overkill considering GGML does not utilize any new features after Ada Lovelace.
I'd probably recommend stopping at Hopper and just build PTX for anything after.

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 8, 2025

But it's sure that there are users using RTX 50 series (I'm using 4060, but one of my friends is using 5070)

@leejet leejet mentioned this pull request Dec 8, 2025
@LostRuins
Copy link
Contributor

Yes the way PTX targets work (e.g. 80-virtual) is that they are forward compatible. It will work on 5070 and future GPUs as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Windows CUDA build only contains sm_90 kernel Not supported by GPU

3 participants