SWA KV ring: bf16-append wrap + CudaHybridForwardPass batched path are latent-only / untested

Tracking the two latent traps the #164 review surfaced (commit 48a5622). Both are correct/inert TODAY; this is so they aren't forgotten if those paths ever activate.

**1. bf16 append ring wrap is untested (identity-only today).** `llm_kv_append_bf16` / `llm_kv_append_batched_bf16` got the `pos % max_seq_len` modulo for symmetry with the f32 kernels, but the only consumer of the bf16 KV cache is the full-context GDN-hybrid path, where `pos < max_seq_len` makes the modulo the identity. If a future **windowed** model ever uses the bf16 KV cache, the wrap path would be exercised for the first time with no test. Add a synthetic bf16 ring-wrap bit-wise test (mirror the f32 `CudaGdnBatchedTrunkTests` append tests) when that happens.

**2. CudaHybridForwardPass SWA ring is bare-window (decode-only).** It deliberately sizes SWA caches at `min(ctx, window)` (not `SwaRingSize` = window+headroom), which is correct because Gemma 4 there is decode-only (`IsBatchedPrefillSupported` requires `!_isGemma4Like`) and per-token decode only needs ring ≥ window. If batched/chunked prefill is ever enabled for Gemma 4 in the hybrid path, the bare-window ring would overwrite a still-needed window — switch it to `SwaRingSize` at that point (there's a code comment flagging this).

**3. Pure-decode multi-wrap is untested** — generating >4608 tokens to wrap the decode ring is impractically slow via the per-token oracle; the prefill path now covers wrapped reads observably (#164), so this is low value.

No action needed now; reference if touching SWA / bf16-KV / hybrid-Gemma paths.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWA KV ring: bf16-append wrap + CudaHybridForwardPass batched path are latent-only / untested #166

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SWA KV ring: bf16-append wrap + CudaHybridForwardPass batched path are latent-only / untested #166

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions