Validate Gemma 4 SWA long-context correctness against llama.cpp (cross-tool oracle)

Follow-up to #164 (commit 48a5622), which fixed a latent bug: Gemma 4 SWA KV caches were window-sized but indexed absolutely, so output was silently wrong beyond the 512-token sliding window. The fix (a real `pos % cacheSize` ring) is validated **internally** — chunked-vs-per-token parity on prompts whose final-token window spans a chunk boundary (so the observed logit depends on cross-chunk + ring-wrapped reads), plus a >5K-token coherence run.

What's missing: an **independent cross-tool** check vs llama.cpp at long context. It was skipped during #164 because `llama-cli` hung in interactive mode and the chat-template token-parity matching is a known rabbit hole (see CLAUDE/memory).

Task: greedy-decode a >4096-token prompt on both sharpi (`sharpi-cli.exe -g -1`) and `C:\p\llama.cpp-cuda\bin\llama-cli.exe` with matched tokenization (verify token ids via `llama-tokenize`, drive llama-cli non-interactively — pipe EOF / `-no-cnv` confirmed effective), and compare the first ~20 generated tokens. Document the result. Low priority — internal oracle already covers the regression; this is defense-in-depth for a correctness fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate Gemma 4 SWA long-context correctness against llama.cpp (cross-tool oracle) #165

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Validate Gemma 4 SWA long-context correctness against llama.cpp (cross-tool oracle) #165

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions