pekkah · pekkah · Apr 15, 2026 · Apr 15, 2026 · Apr 15, 2026 · Apr 15, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -39,7 +39,7 @@ dotnet run --project src/SharpInference.Cli -c Release -- image \
   --vae models/z-image-turbo/vae \
   --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
   --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
-  --upscaler models/RealESRGAN_x4plus.pth \
+  --upscaler models/RealESRGAN_x4plus.safetensors \
   --upscale-blend 0.8 \
   -p "a serene mountain lake at sunrise" -W 512 -H 512 --steps 4 -o out.png
 

diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Pekka Heikura
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -5,6 +5,9 @@ Runs GGUF models on CPU (AVX2/AVX-512 SIMD) and GPU (Vulkan compute shaders or C
 Includes an OpenAI- and Anthropic-compatible API server and native pipelines for
 [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) and FLUX.1 image generation.
 
+> **Status: spike.** A quick experiment to see how LLM tooling can be built
+> from scratch in .NET. Things may be broken or not work as advertised. No warranty — see [LICENSE](LICENSE).
+
 ## Prerequisites
 
 ### Required
@@ -18,7 +21,7 @@ Includes an OpenAI- and Anthropic-compatible API server and native pipelines for
 |---------|-----------|-------|
 | **Faster batched GEMM (CPU)** | [OpenBLAS](https://github.com/OpenMathLib/OpenBLAS/releases) | Place `libopenblas.dll` in `tools/openblas/` or system PATH. Auto-detected at startup; silently skipped if absent. |
 | **GPU inference (Vulkan)** | Vulkan-capable GPU + drivers | Works on AMD/Intel/NVIDIA. No extra install on Windows — just up-to-date GPU drivers. The `VULKAN_SDK` env var is used for shader recompilation only. |
-| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll`, `cudart64_110.dll`, and `nvrtc64_11*.dll` on PATH. NVIDIA GPU only. Used for image generation pipelines. |
+| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll` and `cudart64_110.dll` on PATH (CUDA 11 runtime). NVRTC resolver additionally tries `nvrtc64_120_0.dll` (CUDA 12.x), then `nvrtc64_112_0.dll`, then `nvrtc64_11*.dll`. NVIDIA GPU only. Used for image generation pipelines. |
 | **Image upscaling (RRDBNet)** | CUDA (above) | Real-ESRGAN ×2/×4 upscaler. Falls back to bicubic if CUDA is unavailable. |
 
 ## Getting Models
@@ -227,6 +230,10 @@ Prints all GGUF metadata key/value pairs in a table (architecture, context lengt
 
 ## API Server
 
+> **Note:** The ASP.NET host hasn't been exercised end-to-end — it builds and the
+> endpoint handlers have unit tests, but running against real clients has not been
+> validated. Expect it to need fixes.
+
 Starts an HTTP server compatible with OpenAI and Anthropic clients. Defaults to `http://localhost:5000`.
 
 ```bash
@@ -317,6 +324,29 @@ dotnet publish src/SharpInference.Server -c Release -r win-x64
 dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*'
 ```
 
+## Helper Scripts
+
+The `scripts/` directory contains optional helpers for development and validation. The PowerShell scripts target Windows; the Python scripts require [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python).
+
+| Script | Purpose |
+|--------|---------|
+| `download-model.ps1` | Downloads GGUF models into `models/` from Hugging Face. Accepts `-Model <name>` for any of `smollm2`, `qwen3-8b`, `llama31-70b`, `qwen3-coder-30b-a3b`, `llama4-scout`, `z-image-turbo`, `z-image-turbo-q8`, `realesrgan-x4`. Skips files already present. |
+| `setup-openblas.ps1` | Downloads OpenBLAS (default `0.3.28`) and installs `libopenblas.dll` into `tools/openblas/` for the optional CPU GEMM acceleration path. |
+| `setup-llamacpp.ps1` | Downloads prebuilt llama.cpp binaries into `tools/llama.cpp/`. Variants: `cpu` (default), `vulkan`, `cuda-12.4`, `cuda-13.1`. Used as the reference implementation for forward-pass validation. |
+| `generate-reference-logits.ps1` | Runs llama.cpp with `--logits-all` on a fixed prompt and writes reference logits to `tests/reference-data/` for comparison against the SharpInference forward pass. Requires `setup-llamacpp.ps1` and `download-model.ps1 -Model smollm2` to have been run first. |
+| `compare_tokens.py` | Python helper that tokenizes a chat prompt with `llama-cpp-python` and prints top-5 logits at each step. Used to debug divergence against Llama 4 Scout. |
+| `extract_reference.py` | Python helper that prints model metadata (`n_vocab`, `n_ctx_train`, `n_embd`) and token IDs for prompt fragments. Useful when investigating tokenizer disagreements. |
+
+Typical first-time setup on Windows:
+
+```powershell
+# From repo root
+.\scripts\setup-openblas.ps1                  # optional, enables OpenBLAS GEMM
+.\scripts\download-model.ps1 -Model smollm2   # fetch a small test model
+.\scripts\setup-llamacpp.ps1                  # optional, for reference validation
+.\scripts\generate-reference-logits.ps1       # optional, regenerates tests/reference-data/
+```
+
 ## Projects
 
 | Project | Description |
@@ -336,3 +366,7 @@ dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*'
 
 See [docs/SharpInference-Design.md](docs/SharpInference-Design.md).
 
+## License
+
+Released under the [MIT License](LICENSE).
+
diff --git a/docs/SharpInference-Design.md b/docs/SharpInference-Design.md
@@ -2252,7 +2252,7 @@ dotnet run --project src/SharpInference.Cli -- image \
   --vae models/z-image-turbo/vae \
   --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
   --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
-  --upscaler models/RealESRGAN_x4plus.pth \
+  --upscaler models/RealESRGAN_x4plus.safetensors \
   --upscale-blend 0.8 \
   -p "photorealistic woman with red lipstick" -W 512 -H 512 -o out.png
 ```