From 240be373915d34282fad8044ecea9d7e410bd52b Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 15 Apr 2026 13:58:09 +0000
Subject: [PATCH 1/5] docs: add MIT LICENSE and prep for public release

- Add MIT LICENSE (required before flipping repo to public)
- README: link to LICENSE; clarify CUDA DLL requirements (NVRTC resolver
  also supports CUDA 12.x / 11.2+, cublas/cudart remain CUDA 11)
- CLAUDE.md, design doc: fix upscaler example file extension
  (.pth -> .safetensors; RRDBNet.Load only supports safetensors)
---
 CLAUDE.md                     |  2 +-
 LICENSE                       | 21 +++++++++++++++++++++
 README.md                     |  6 +++++-
 docs/SharpInference-Design.md |  2 +-
 4 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 LICENSE

diff --git a/CLAUDE.md b/CLAUDE.md
index 4ae9068..ea00f4b 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -39,7 +39,7 @@ dotnet run --project src/SharpInference.Cli -c Release -- image \
   --vae models/z-image-turbo/vae \
   --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
   --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
-  --upscaler models/RealESRGAN_x4plus.pth \
+  --upscaler models/RealESRGAN_x4plus.safetensors \
   --upscale-blend 0.8 \
   -p "a serene mountain lake at sunrise" -W 512 -H 512 --steps 4 -o out.png
 
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..c3538a8
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Pekka Heikura
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
index d996cc8..d1bb382 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ Includes an OpenAI- and Anthropic-compatible API server and native pipelines for
 |---------|-----------|-------|
 | **Faster batched GEMM (CPU)** | [OpenBLAS](https://github.com/OpenMathLib/OpenBLAS/releases) | Place `libopenblas.dll` in `tools/openblas/` or system PATH. Auto-detected at startup; silently skipped if absent. |
 | **GPU inference (Vulkan)** | Vulkan-capable GPU + drivers | Works on AMD/Intel/NVIDIA. No extra install on Windows — just up-to-date GPU drivers. The `VULKAN_SDK` env var is used for shader recompilation only. |
-| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll`, `cudart64_110.dll`, and `nvrtc64_11*.dll` on PATH. NVIDIA GPU only. Used for image generation pipelines. |
+| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll` and `cudart64_110.dll` on PATH (CUDA 11 runtime). NVRTC resolver additionally tries `nvrtc64_120_0.dll` (CUDA 12.x), then `nvrtc64_112_0.dll`, then `nvrtc64_11*.dll`. NVIDIA GPU only. Used for image generation pipelines. |
 | **Image upscaling (RRDBNet)** | CUDA (above) | Real-ESRGAN ×2/×4 upscaler. Falls back to bicubic if CUDA is unavailable. |
 
 ## Getting Models
@@ -336,3 +336,7 @@ dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*'
 
 See [docs/SharpInference-Design.md](docs/SharpInference-Design.md).
 
+## License
+
+Released under the [MIT License](LICENSE).
+
diff --git a/docs/SharpInference-Design.md b/docs/SharpInference-Design.md
index b8a667a..438b09e 100644
--- a/docs/SharpInference-Design.md
+++ b/docs/SharpInference-Design.md
@@ -2252,7 +2252,7 @@ dotnet run --project src/SharpInference.Cli -- image \
   --vae models/z-image-turbo/vae \
   --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
   --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
-  --upscaler models/RealESRGAN_x4plus.pth \
+  --upscaler models/RealESRGAN_x4plus.safetensors \
   --upscale-blend 0.8 \
   -p "photorealistic woman with red lipstick" -W 512 -H 512 -o out.png
 ```

From 9dab550acb01c38d2276796a264d1a41248dec9e Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 15 Apr 2026 14:01:29 +0000
Subject: [PATCH 2/5] docs: document helper scripts in scripts/ directory

Adds a Helper Scripts section to README describing download-model.ps1,
setup-openblas.ps1, setup-llamacpp.ps1, generate-reference-logits.ps1,
and the two Python helpers, with a typical first-time-setup snippet.
---
 README.md | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/README.md b/README.md
index d1bb382..9d155cb 100644
--- a/README.md
+++ b/README.md
@@ -317,6 +317,29 @@ dotnet publish src/SharpInference.Server -c Release -r win-x64
 dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*'
 ```
 
+## Helper Scripts
+
+The `scripts/` directory contains optional helpers for development and validation. The PowerShell scripts target Windows; the Python scripts require [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python).
+
+| Script | Purpose |
+|--------|---------|
+| `download-model.ps1` | Downloads GGUF models into `models/` from Hugging Face. Accepts `-Model <name>` for any of `smollm2`, `qwen3-8b`, `llama31-70b`, `qwen3-coder-30b-a3b`, `llama4-scout`, `z-image-turbo`, `z-image-turbo-q8`, `realesrgan-x4`. Skips files already present. |
+| `setup-openblas.ps1` | Downloads OpenBLAS (default `0.3.28`) and installs `libopenblas.dll` into `tools/openblas/` for the optional CPU GEMM acceleration path. |
+| `setup-llamacpp.ps1` | Downloads prebuilt llama.cpp binaries into `tools/llama.cpp/`. Variants: `cpu` (default), `vulkan`, `cuda-12.4`, `cuda-13.1`. Used as the reference implementation for forward-pass validation. |
+| `generate-reference-logits.ps1` | Runs llama.cpp with `--logits-all` on a fixed prompt and writes reference logits to `tests/reference-data/` for comparison against the SharpInference forward pass. Requires `setup-llamacpp.ps1` and `download-model.ps1 -Model smollm2` to have been run first. |
+| `compare_tokens.py` | Python helper that tokenizes a chat prompt with `llama-cpp-python` and prints top-5 logits at each step. Used to debug divergence against Llama 4 Scout. |
+| `extract_reference.py` | Python helper that prints model metadata (`n_vocab`, `n_ctx_train`, `n_embd`) and token IDs for prompt fragments. Useful when investigating tokenizer disagreements. |
+
+Typical first-time setup on Windows:
+
+```powershell
+# From repo root
+.\scripts\setup-openblas.ps1                  # optional, enables OpenBLAS GEMM
+.\scripts\download-model.ps1 -Model smollm2   # fetch a small test model
+.\scripts\setup-llamacpp.ps1                  # optional, for reference validation
+.\scripts\generate-reference-logits.ps1       # optional, regenerates tests/reference-data/
+```
+
 ## Projects
 
 | Project | Description |

From 626f00e109141806fb02c1deed1f11dfa820ef9c Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 15 Apr 2026 14:04:31 +0000
Subject: [PATCH 3/5] docs: add experimental/WIP status disclaimer to README

---
 README.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/README.md b/README.md
index 9d155cb..615b03a 100644
--- a/README.md
+++ b/README.md
@@ -5,6 +5,12 @@ Runs GGUF models on CPU (AVX2/AVX-512 SIMD) and GPU (Vulkan compute shaders or C
 Includes an OpenAI- and Anthropic-compatible API server and native pipelines for
 [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) and FLUX.1 image generation.
 
+> ⚠️ **Status: experimental / work-in-progress.** This is a personal research project
+> shared as-is. Features documented here may be incomplete, subtly broken, or behave
+> differently than advertised on your hardware. Expect rough edges, breaking changes
+> without notice, and the occasional crash. Bug reports and PRs are welcome, but
+> please don't depend on this for anything important. No warranty — see [LICENSE](LICENSE).
+
 ## Prerequisites
 
 ### Required

From 47e426845b13638dee6f1fd5e787608fdda72712 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 15 Apr 2026 14:05:30 +0000
Subject: [PATCH 4/5] docs: reframe README disclaimer as a spike

---
 README.md | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 615b03a..67a59d8 100644
--- a/README.md
+++ b/README.md
@@ -5,11 +5,8 @@ Runs GGUF models on CPU (AVX2/AVX-512 SIMD) and GPU (Vulkan compute shaders or C
 Includes an OpenAI- and Anthropic-compatible API server and native pipelines for
 [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) and FLUX.1 image generation.
 
-> ⚠️ **Status: experimental / work-in-progress.** This is a personal research project
-> shared as-is. Features documented here may be incomplete, subtly broken, or behave
-> differently than advertised on your hardware. Expect rough edges, breaking changes
-> without notice, and the occasional crash. Bug reports and PRs are welcome, but
-> please don't depend on this for anything important. No warranty — see [LICENSE](LICENSE).
+> **Status: spike.** A quick experiment to see how LLM tooling can be built
+> from scratch in .NET. Things may be broken or not work as advertised. No warranty — see [LICENSE](LICENSE).
 
 ## Prerequisites
 

From 3554cc4662d1773cced641fffdbd3cdec610c117 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 15 Apr 2026 14:06:04 +0000
Subject: [PATCH 5/5] docs: note that the ASP.NET server host is untested
 end-to-end

---
 README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README.md b/README.md
index 67a59d8..2a01f98 100644
--- a/README.md
+++ b/README.md
@@ -230,6 +230,10 @@ Prints all GGUF metadata key/value pairs in a table (architecture, context lengt
 
 ## API Server
 
+> **Note:** The ASP.NET host hasn't been exercised end-to-end — it builds and the
+> endpoint handlers have unit tests, but running against real clients has not been
+> validated. Expect it to need fixes.
+
 Starts an HTTP server compatible with OpenAI and Anthropic clients. Defaults to `http://localhost:5000`.
 
 ```bash