Skip to content
Change the repository type filter

All

    Repositories list

    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      3.7k21k6371kUpdated Dec 9, 2025Dec 9, 2025
    • This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.
      HTML
      249281Updated Dec 9, 2025Dec 9, 2025
    • sgl-kernel-npu

      Public
      SGLang kernel library for NPU
      C++
      58821218Updated Dec 9, 2025Dec 9, 2025
    • sglang-jax

      Public
      JAX backend for SGL
      Python
      391906519Updated Dec 9, 2025Dec 9, 2025
    • sgl-kernel-xpu

      Public
      SGLang kernel library for Intel XPU
      Python
      1315013Updated Dec 9, 2025Dec 9, 2025
    • SpecForge

      Public
      Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
      Python
      1155344617Updated Dec 9, 2025Dec 9, 2025
    • whl

      Public
      Kernel Library Wheel for SGLang
      HTML
      31610Updated Dec 8, 2025Dec 8, 2025
    • ome

      Public
      OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
      Go
      493273015Updated Dec 7, 2025Dec 7, 2025
    • rbg

      Public
      A workload for deploying LLM inference services on Kubernetes
      Go
      34126116Updated Dec 3, 2025Dec 3, 2025
    • sgl-learning-materials

      Public
      Materials for learning SGLang
      4867800Updated Dec 1, 2025Dec 1, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      7712101Updated Nov 30, 2025Nov 30, 2025
    • genai-bench

      Public
      Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
      Python
      38237410Updated Nov 28, 2025Nov 28, 2025
    • The test files for SGLang.
      2101Updated Nov 22, 2025Nov 22, 2025
    • FlashMLA

      Public
      FlashMLA: Efficient Multi-head Latent Attention Kernels
      C++
      912000Updated Nov 20, 2025Nov 20, 2025
    • sgl-flash-attn

      Public
      Fast and memory-efficient exact attention
      Python
      2.2k1400Updated Nov 18, 2025Nov 18, 2025
    • fast-hadamard-transform

      Public
      Fast Hadamard transform in CUDA, with a PyTorch interface
      C
      49000Updated Oct 15, 2025Oct 15, 2025
    • sgl-whl

      Public
      SGLang wheels for multiple platforms
      11110Updated Oct 13, 2025Oct 13, 2025