📚 [Paper] | 🤗 [Checkpoints] 🐠 [Blog (coming soon)]
This repository contains the reference code for the paper Sparser, Faster, Lighter Transformer Language Models. It includes sparse training code and our custom CUDA kernels designed for H100 GPUs for sparse models, leveraging the TwELL packing format.
The repository expects a CUDA 12.8+ environment:
git clone https://github.com/SakanaAI/sparser-faster-llms.git
cd sparser-faster-llms
bash scripts/install.sh
# or uv
# python -m venv .venv
# source .venv/bin/activate
# bash scripts/install.sh --uv.
├── accelerate_configs/ # Accelerate + DeepSpeed launch configs
├── benchmark_inference.py # Minimal torch vs TwELL inference benchmark
├── benchmark_base.py # Small benchmark helpers
├── cfgs/ # Hydra configs for model/data/training
├── custom_data/ # Pretraining dataset utilities
├── custom_models/
│ ├── sparse_models.py # Sparse model definitions
│ ├── sparse_testing_utils.py # Sparse -> HF / TwELL conversion helpers
│ └── twell_modules/ # TwELL CUDA kernels
├── energy_utils.py # Optional GPU energy measurement helpers
├── launch.sh # Main multi-GPU training entrypoint
├── load_dataset.py # Dataset loading glue
├── scripts/install.sh # Minimal installation script
├── train.py # Hydra training entrypoint
└── trainers/
└── logging_trainer.py # Trainer used by the public training path
- Sparse model training code
- TwELL inference kernels
- Efficient TwELL training kernels
We release pretrained sparse checkpoints on the Hugging Face Hub at:
SakanaAI/SparseLM0.5BSakanaAI/SparseLM1BSakanaAI/SparseLM1.5BSakanaAI/SparseLM2B
You can benchmark our kernels against the Hugging Face PyTorch reference with our benchmarking scripts benchmark_inference.py, e.g.:
python benchmark_inference.py \
--model-path SakanaAI/SparseLM1.5B \
--reps 500 \
--warmup-reps 5 \
--out-csv results/benchmark_inference/SparseLM1.5B.csvWe provide two implementations of the TwELL kernels: the default twell implementation, and a twell-flex variant that is expected to be slightly faster in cases of non-uniform sparsity patterns (differences are still expected to be only less than 0.1% overall). You can enable the twell-flex variant with the --flex-kernels flag, e.g.:
python benchmark_inference.py \
--model-path SakanaAI/SparseLM1.5B \
--reps 500 \
--warmup-reps 5 \
--out-csv results/benchmark_inference/SparseLM1.5B-flex.csv
--flex-kernelsTo also measure GPU energy during the benchmark loop:
python benchmark_inference.py \
--model-path SakanaAI/SparseLM1.5B \
--reps 500 \
--warmup-reps 5 \
--measure-energy \
--out-csv results/benchmark_inference/SparseLM1.5B_energy.csvYou can benchmark your own local sparse models by overriding --model-path /path/to/local/checkpoint_dir.
We provide simple functionality using standard PyTorch for Sparse training:
./launch.sh <num_gpus> <run_cfg> [zero1|offload|offload_optim] [hydra overrides...]We provide premade Hydra configs to obtain sparse models at different sizes:
sparsity_gated_0p5bsparsity_gated_1bsparsity_gated_1p5bsparsity_gated_2b
For training on H100 GPUs, we recommend the default settings, zero1 optimization, and no parameter offloading:
./launch.sh 8 sparsity_gated_1p5b zero1The default logging functionality saves results both locally and to Weights & Biases. To disable Weights & Biases logging, please modify the provided configuration files with:
report_to: nullIf you find our work or this repository useful and want to cite our paper, you can use the following:
@article{sakanaXnvidia2026sparser,
title={Sparser, Faster, Lighter Transformer Language Models},
author={Cetin, Edoardo and Peluchetti, Stefano and Castillo, Emilio and Naruse, Akira and Murakami, Mana and Jones, Llion},
journal={arXiv preprint arXiv:2603.23198},
year={2026}
}