Skip to content

fix(sgemm): destroy cublas handle to avoid alloc failed#415

Merged
DefTruth merged 1 commit intoxlite-dev:mainfrom
lnxtree:fix/sgemm-cublas-handle-leak
Apr 5, 2026
Merged

fix(sgemm): destroy cublas handle to avoid alloc failed#415
DefTruth merged 1 commit intoxlite-dev:mainfrom
lnxtree:fix/sgemm-cublas-handle-leak

Conversation

@lnxtree
Copy link
Copy Markdown
Contributor

@lnxtree lnxtree commented Apr 5, 2026

  • What
    Fix cuBLAS handle leak in kernels/sgemm/sgemm_cublas.cu by destroying handle after GEMM, and add status checks for cuBLAS API calls.

  • Why
    Repeated benchmark calls can accumulate unreleased cuBLAS handles, then torch.matmul may fail with CUBLAS_STATUS_ALLOC_FAILED when creating a new handle.

  • Changes
    Added CHECK_CUBLAS macro for cuBLAS status checking
    cublas_sgemm: create -> setMathMode -> gemm -> destroy
    cublas_sgemm_tf32: create -> setMathMode -> gemm -> destroy

  • Repro error
    RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

  • Link issue
    Fixes SGEMM benchmark may hit CUBLAS_STATUS_ALLOC_FAILED due to cublas handle leak in kernels/sgemm/sgemm_cublas.cu #414

  • Environment
    GPU: NVIDIA H200 NVL (143771 MiB)
    Driver: 580.126.09
    CUDA Toolkit (nvcc): 12.8 (Build cuda_12.8.r12.8/compiler.35583870_0)
    PyTorch: 2.5.0+cu124
    PyTorch CUDA runtime: 12.4
    CUDA available in torch: True
    Compute capability: (9, 0)

@lnxtree
Copy link
Copy Markdown
Contributor Author

lnxtree commented Apr 5, 2026

Hi @DefTruth, could you please help review this PR when you have time?
It fixes a cuBLAS handle leak in SGEMM that may cause CUBLAS_STATUS_ALLOC_FAILED in [torch.matmul]. Thanks!

Copy link
Copy Markdown
Member

@DefTruth DefTruth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~ Thanks for this fix!

@DefTruth DefTruth merged commit 0b0a4f5 into xlite-dev:main Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SGEMM benchmark may hit CUBLAS_STATUS_ALLOC_FAILED due to cublas handle leak in kernels/sgemm/sgemm_cublas.cu

2 participants