From 240be373915d34282fad8044ecea9d7e410bd52b Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 15 Apr 2026 13:58:09 +0000 Subject: [PATCH 1/5] docs: add MIT LICENSE and prep for public release - Add MIT LICENSE (required before flipping repo to public) - README: link to LICENSE; clarify CUDA DLL requirements (NVRTC resolver also supports CUDA 12.x / 11.2+, cublas/cudart remain CUDA 11) - CLAUDE.md, design doc: fix upscaler example file extension (.pth -> .safetensors; RRDBNet.Load only supports safetensors) --- CLAUDE.md | 2 +- LICENSE | 21 +++++++++++++++++++++ README.md | 6 +++++- docs/SharpInference-Design.md | 2 +- 4 files changed, 28 insertions(+), 3 deletions(-) create mode 100644 LICENSE diff --git a/CLAUDE.md b/CLAUDE.md index 4ae9068..ea00f4b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -39,7 +39,7 @@ dotnet run --project src/SharpInference.Cli -c Release -- image \ --vae models/z-image-turbo/vae \ --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \ --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \ - --upscaler models/RealESRGAN_x4plus.pth \ + --upscaler models/RealESRGAN_x4plus.safetensors \ --upscale-blend 0.8 \ -p "a serene mountain lake at sunrise" -W 512 -H 512 --steps 4 -o out.png diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..c3538a8 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Pekka Heikura + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md index d996cc8..d1bb382 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ Includes an OpenAI- and Anthropic-compatible API server and native pipelines for |---------|-----------|-------| | **Faster batched GEMM (CPU)** | [OpenBLAS](https://github.com/OpenMathLib/OpenBLAS/releases) | Place `libopenblas.dll` in `tools/openblas/` or system PATH. Auto-detected at startup; silently skipped if absent. | | **GPU inference (Vulkan)** | Vulkan-capable GPU + drivers | Works on AMD/Intel/NVIDIA. No extra install on Windows — just up-to-date GPU drivers. The `VULKAN_SDK` env var is used for shader recompilation only. | -| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll`, `cudart64_110.dll`, and `nvrtc64_11*.dll` on PATH. NVIDIA GPU only. Used for image generation pipelines. | +| **GPU inference (CUDA)** | [CUDA Toolkit 11.x](https://developer.nvidia.com/cuda-toolkit) | Requires `cublas64_11.dll` and `cudart64_110.dll` on PATH (CUDA 11 runtime). NVRTC resolver additionally tries `nvrtc64_120_0.dll` (CUDA 12.x), then `nvrtc64_112_0.dll`, then `nvrtc64_11*.dll`. NVIDIA GPU only. Used for image generation pipelines. | | **Image upscaling (RRDBNet)** | CUDA (above) | Real-ESRGAN ×2/×4 upscaler. Falls back to bicubic if CUDA is unavailable. | ## Getting Models @@ -336,3 +336,7 @@ dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*' See [docs/SharpInference-Design.md](docs/SharpInference-Design.md). +## License + +Released under the [MIT License](LICENSE). + diff --git a/docs/SharpInference-Design.md b/docs/SharpInference-Design.md index b8a667a..438b09e 100644 --- a/docs/SharpInference-Design.md +++ b/docs/SharpInference-Design.md @@ -2252,7 +2252,7 @@ dotnet run --project src/SharpInference.Cli -- image \ --vae models/z-image-turbo/vae \ --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \ --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \ - --upscaler models/RealESRGAN_x4plus.pth \ + --upscaler models/RealESRGAN_x4plus.safetensors \ --upscale-blend 0.8 \ -p "photorealistic woman with red lipstick" -W 512 -H 512 -o out.png ``` From 9dab550acb01c38d2276796a264d1a41248dec9e Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 15 Apr 2026 14:01:29 +0000 Subject: [PATCH 2/5] docs: document helper scripts in scripts/ directory Adds a Helper Scripts section to README describing download-model.ps1, setup-openblas.ps1, setup-llamacpp.ps1, generate-reference-logits.ps1, and the two Python helpers, with a typical first-time-setup snippet. --- README.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/README.md b/README.md index d1bb382..9d155cb 100644 --- a/README.md +++ b/README.md @@ -317,6 +317,29 @@ dotnet publish src/SharpInference.Server -c Release -r win-x64 dotnet run --project benchmarks/SharpInference.Bench -c Release -- --filter '*' ``` +## Helper Scripts + +The `scripts/` directory contains optional helpers for development and validation. The PowerShell scripts target Windows; the Python scripts require [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python). + +| Script | Purpose | +|--------|---------| +| `download-model.ps1` | Downloads GGUF models into `models/` from Hugging Face. Accepts `-Model ` for any of `smollm2`, `qwen3-8b`, `llama31-70b`, `qwen3-coder-30b-a3b`, `llama4-scout`, `z-image-turbo`, `z-image-turbo-q8`, `realesrgan-x4`. Skips files already present. | +| `setup-openblas.ps1` | Downloads OpenBLAS (default `0.3.28`) and installs `libopenblas.dll` into `tools/openblas/` for the optional CPU GEMM acceleration path. | +| `setup-llamacpp.ps1` | Downloads prebuilt llama.cpp binaries into `tools/llama.cpp/`. Variants: `cpu` (default), `vulkan`, `cuda-12.4`, `cuda-13.1`. Used as the reference implementation for forward-pass validation. | +| `generate-reference-logits.ps1` | Runs llama.cpp with `--logits-all` on a fixed prompt and writes reference logits to `tests/reference-data/` for comparison against the SharpInference forward pass. Requires `setup-llamacpp.ps1` and `download-model.ps1 -Model smollm2` to have been run first. | +| `compare_tokens.py` | Python helper that tokenizes a chat prompt with `llama-cpp-python` and prints top-5 logits at each step. Used to debug divergence against Llama 4 Scout. | +| `extract_reference.py` | Python helper that prints model metadata (`n_vocab`, `n_ctx_train`, `n_embd`) and token IDs for prompt fragments. Useful when investigating tokenizer disagreements. | + +Typical first-time setup on Windows: + +```powershell +# From repo root +.\scripts\setup-openblas.ps1 # optional, enables OpenBLAS GEMM +.\scripts\download-model.ps1 -Model smollm2 # fetch a small test model +.\scripts\setup-llamacpp.ps1 # optional, for reference validation +.\scripts\generate-reference-logits.ps1 # optional, regenerates tests/reference-data/ +``` + ## Projects | Project | Description | From 626f00e109141806fb02c1deed1f11dfa820ef9c Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 15 Apr 2026 14:04:31 +0000 Subject: [PATCH 3/5] docs: add experimental/WIP status disclaimer to README --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index 9d155cb..615b03a 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,12 @@ Runs GGUF models on CPU (AVX2/AVX-512 SIMD) and GPU (Vulkan compute shaders or C Includes an OpenAI- and Anthropic-compatible API server and native pipelines for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) and FLUX.1 image generation. +> ⚠️ **Status: experimental / work-in-progress.** This is a personal research project +> shared as-is. Features documented here may be incomplete, subtly broken, or behave +> differently than advertised on your hardware. Expect rough edges, breaking changes +> without notice, and the occasional crash. Bug reports and PRs are welcome, but +> please don't depend on this for anything important. No warranty — see [LICENSE](LICENSE). + ## Prerequisites ### Required From 47e426845b13638dee6f1fd5e787608fdda72712 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 15 Apr 2026 14:05:30 +0000 Subject: [PATCH 4/5] docs: reframe README disclaimer as a spike --- README.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 615b03a..67a59d8 100644 --- a/README.md +++ b/README.md @@ -5,11 +5,8 @@ Runs GGUF models on CPU (AVX2/AVX-512 SIMD) and GPU (Vulkan compute shaders or C Includes an OpenAI- and Anthropic-compatible API server and native pipelines for [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) and FLUX.1 image generation. -> ⚠️ **Status: experimental / work-in-progress.** This is a personal research project -> shared as-is. Features documented here may be incomplete, subtly broken, or behave -> differently than advertised on your hardware. Expect rough edges, breaking changes -> without notice, and the occasional crash. Bug reports and PRs are welcome, but -> please don't depend on this for anything important. No warranty — see [LICENSE](LICENSE). +> **Status: spike.** A quick experiment to see how LLM tooling can be built +> from scratch in .NET. Things may be broken or not work as advertised. No warranty — see [LICENSE](LICENSE). ## Prerequisites From 3554cc4662d1773cced641fffdbd3cdec610c117 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 15 Apr 2026 14:06:04 +0000 Subject: [PATCH 5/5] docs: note that the ASP.NET server host is untested end-to-end --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 67a59d8..2a01f98 100644 --- a/README.md +++ b/README.md @@ -230,6 +230,10 @@ Prints all GGUF metadata key/value pairs in a table (architecture, context lengt ## API Server +> **Note:** The ASP.NET host hasn't been exercised end-to-end — it builds and the +> endpoint handlers have unit tests, but running against real clients has not been +> validated. Expect it to need fixes. + Starts an HTTP server compatible with OpenAI and Anthropic clients. Defaults to `http://localhost:5000`. ```bash