80 lines (61 loc) · 2.32 KB

quant.cpp Roadmap

Vision

quant.cpp is the SQLite of LLM inference.

Not the fastest. Not the most feature-complete. The most embeddable, the most readable, and the only engine that compresses KV cache 7x without quality loss.

Positioning

Need speed?        → llama.cpp
Need throughput?   → vLLM
Need to embed LLM in your app with one file? → quant.cpp
Need 7x longer context on the same hardware? → quant.cpp

Direction 1: Embedding Engine ("LLM의 SQLite")

The world's simplest way to add LLM to a C/C++ project.

Done

quant.h single header (15K LOC, 628KB)
6-function API (load, new, generate, ask, free_ctx, free_model)
WASM build (192KB binary)
MSVC/MinGW Windows support
Zero external dependencies

In Progress

API documentation (docs/api.md)
quant.h sync with latest source
Embedding examples (minimal, chat, KV compare)

Planned

Direction 2: KV Compression Research Platform

The reference implementation for KV cache quantization research.

Done

7 quantization types (Polar, QJL, Turbo, Uniform, TurboKV)
Delta compression (P-frame encoding)
QK-norm aware compression
Plugin architecture (3 functions to add new type)
34 unit tests

In Progress

"Add Your Own Type" tutorial (docs/custom-quantization.md)
Arxiv tech report

Planned

llama.cpp KV type PR (ggml type registration)
vLLM KV compression plugin
Benchmarking suite (PPL across models × KV types)
Learned codebook quantization
Per-head adaptive bit allocation

Non-Goals

❌ GPU speed competition with llama.cpp (requires tensor graph IR)
❌ Batch serving (vLLM's domain)
❌ Training support
❌ 100+ model coverage

Architecture Principles

One file forward pass: tq_transformer.c contains the entire inference loop
Plugin quantization: Add types via tq_traits.c registration
Zero dependencies: libc + pthreads only (+ Metal on macOS)
CPU-first: NEON/AVX2 optimized, GPU as optional accelerator
Embeddable: quant.h works anywhere a C compiler does

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quant.cpp Roadmap

Vision

Positioning

Direction 1: Embedding Engine ("LLM의 SQLite")

Done

In Progress

Planned

Direction 2: KV Compression Research Platform

Done

In Progress

Planned

Non-Goals

Architecture Principles

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

quant.cpp Roadmap

Vision

Positioning

Direction 1: Embedding Engine ("LLM의 SQLite")

Done

In Progress

Planned

Direction 2: KV Compression Research Platform

Done

In Progress

Planned

Non-Goals

Architecture Principles