Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dotnet run --project src/SharpInference.Cli -c Release -- image \
--vae models/z-image-turbo/vae \
--qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
--qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
--upscaler models/RealESRGAN_x4plus.pth \
--upscaler models/RealESRGAN_x4plus.safetensors \
--upscale-blend 0.8 \
-p "a serene mountain lake at sunrise" -W 512 -H 512 --steps 4 -o out.png

Expand Down
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Pekka Heikura

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
36 changes: 35 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ Runs GGUF models on CPU (AVX2/AVX-512 SIMD) and GPU (Vulkan compute shaders or C
Includes an OpenAI- and Anthropic-compatible API server and native pipelines for
[Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) and FLUX.1 image generation.

> **Status: spike.** A quick experiment to see how LLM tooling can be built
> from scratch in .NET. Things may be broken or not work as advertised. No warranty — see [LICENSE](LICENSE).

## Prerequisites

### Required
Expand All @@ -18,7 +21,7 @@ Includes an OpenAI- and Anthropic-compatible API server and native pipelines for
|---------|-----------|-------|
| **Faster batched GEMM (CPU)** | [OpenBLAS](https://github.com/OpenMathLib/OpenBLAS/releases) | Place `libopenblas.dll` in `tools/openblas/` or system PATH. Auto-detected at startup; silently skipped if absent. |
| **GPU inference (Vulkan)** | Vulkan-capable GPU + drivers | Works on AMD/Intel/NVIDIA. No extra install on Windows — just up-to-date GPU drivers. The `VULKAN_SDK` env var is used for shader recompilation only. |
| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll`, `cudart64_110.dll`, and `nvrtc64_11*.dll` on PATH. NVIDIA GPU only. Used for image generation pipelines. |
| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll` and `cudart64_110.dll` on PATH (CUDA 11 runtime). NVRTC resolver additionally tries `nvrtc64_120_0.dll` (CUDA 12.x), then `nvrtc64_112_0.dll`, then `nvrtc64_11*.dll`. NVIDIA GPU only. Used for image generation pipelines. |
| **Image upscaling (RRDBNet)** | CUDA (above) | Real-ESRGAN ×2/×4 upscaler. Falls back to bicubic if CUDA is unavailable. |

## Getting Models
Expand Down Expand Up @@ -227,6 +230,10 @@ Prints all GGUF metadata key/value pairs in a table (architecture, context lengt

## API Server

> **Note:** The ASP.NET host hasn't been exercised end-to-end — it builds and the
> endpoint handlers have unit tests, but running against real clients has not been
> validated. Expect it to need fixes.

Starts an HTTP server compatible with OpenAI and Anthropic clients. Defaults to `http://localhost:5000`.

```bash
Expand Down Expand Up @@ -317,6 +324,29 @@ dotnet publish src/SharpInference.Server -c Release -r win-x64
dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*'
```

## Helper Scripts

The `scripts/` directory contains optional helpers for development and validation. The PowerShell scripts target Windows; the Python scripts require [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python).

| Script | Purpose |
|--------|---------|
| `download-model.ps1` | Downloads GGUF models into `models/` from Hugging Face. Accepts `-Model <name>` for any of `smollm2`, `qwen3-8b`, `llama31-70b`, `qwen3-coder-30b-a3b`, `llama4-scout`, `z-image-turbo`, `z-image-turbo-q8`, `realesrgan-x4`. Skips files already present. |
| `setup-openblas.ps1` | Downloads OpenBLAS (default `0.3.28`) and installs `libopenblas.dll` into `tools/openblas/` for the optional CPU GEMM acceleration path. |
| `setup-llamacpp.ps1` | Downloads prebuilt llama.cpp binaries into `tools/llama.cpp/`. Variants: `cpu` (default), `vulkan`, `cuda-12.4`, `cuda-13.1`. Used as the reference implementation for forward-pass validation. |
| `generate-reference-logits.ps1` | Runs llama.cpp with `--logits-all` on a fixed prompt and writes reference logits to `tests/reference-data/` for comparison against the SharpInference forward pass. Requires `setup-llamacpp.ps1` and `download-model.ps1 -Model smollm2` to have been run first. |
| `compare_tokens.py` | Python helper that tokenizes a chat prompt with `llama-cpp-python` and prints top-5 logits at each step. Used to debug divergence against Llama 4 Scout. |
| `extract_reference.py` | Python helper that prints model metadata (`n_vocab`, `n_ctx_train`, `n_embd`) and token IDs for prompt fragments. Useful when investigating tokenizer disagreements. |

Typical first-time setup on Windows:

```powershell
# From repo root
.\scripts\setup-openblas.ps1 # optional, enables OpenBLAS GEMM
.\scripts\download-model.ps1 -Model smollm2 # fetch a small test model
.\scripts\setup-llamacpp.ps1 # optional, for reference validation
.\scripts\generate-reference-logits.ps1 # optional, regenerates tests/reference-data/
```

## Projects

| Project | Description |
Expand All @@ -336,3 +366,7 @@ dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*'

See [docs/SharpInference-Design.md](docs/SharpInference-Design.md).

## License

Released under the [MIT License](LICENSE).

2 changes: 1 addition & 1 deletion docs/SharpInference-Design.md
Original file line number Diff line number Diff line change
Expand Up @@ -2252,7 +2252,7 @@ dotnet run --project src/SharpInference.Cli -- image \
--vae models/z-image-turbo/vae \
--qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
--qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
--upscaler models/RealESRGAN_x4plus.pth \
--upscaler models/RealESRGAN_x4plus.safetensors \
--upscale-blend 0.8 \
-p "photorealistic woman with red lipstick" -W 512 -H 512 -o out.png
```
Expand Down