You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CodeWhale already has prefix-cache awareness in its DNA — the "volatile-content-last invariant" in prompts.rs, byte-stable assistant message tests in client.rs, and a cache-hit-percent footer chip. But these are best-effort conventions, not a systematic invariant enforced at the architecture level.
deepseek-reasonix takes a different approach: it treats prefix-cache stability as a hard architectural invariant, not a guideline. The result is 85–99%+ cache hit rates in real sessions, translating to ~50× cost reduction on DeepSeek's pricing (¥0.02/1M cached vs ¥1/1M uncached).
What deepseek-reasonix does that CodeWhale could adopt
1. Byte-stable prompt construction as first-class invariant
Reasonix divides every request into three rigid zones that never shift:
Prefix (pinned): system prompt, tool definitions, persistent memory — hashed at construction, never mutated mid-session
Log (append-only): conversation history — only appended, never reordered or edited
Scratch (ephemeral): per-turn metadata — wiped at every turn boundary
Any code path that would mutate the prefix or reorder the log is rejected at the framework level, not caught in review.
This contrasts with CodeWhale's current approach where the volatile-content boundary is documented in comments but not enforced — a stray edit_file to instructions or an unlucky /compact can silently bust the cache.
2. Cross-session cache persistence
Reasonix sessions can be left running — the prefix stays stable across sessions because the pinned zone structure is deterministic. Reopening a session reconstructs the identical prefix from configuration, so the first API call of a new session can still hit the cache if the session was running recently.
3. Cache-first cost visibility
Reasonix surfaces cache economics directly: every turn shows cache hit rate %, estimated cost with/without caching, and cumulative savings. CodeWhale's footer chip (red <40%, yellow <80%) is a good start but doesn't show the cost impact.
Existing CodeWhale foundation (good news — not starting from zero)
Component
What exists
prompts.rs:614
Volatile-content-last invariant (documented)
client.rs:1529
Byte-stable assistant message test
client.rs / ui.rs
prompt_cache_hit_tokens / prompt_cache_miss_tokens tracked per turn
Footer
Cache hit % chip with color thresholds
System prompt
Layered most-static-first for DeepSeek KV cache
What's missing
No architectural enforcement — the volatile-content boundary is a comment, not a compile-time or runtime gate
No prefix hashing — we can't detect when the prefix has been mutated and warn
No cross-session prefix reuse — restarting CodeWhale invalidates the entire cache
No cost-equivalent visibility — cache hit % is shown but without translating to actual ¥ saved
/compact busts cache — the compaction relay intentionally rewrites the prefix, and there's no strategy to mitigate the cost
Suggested approach
Phase A — Harden existing invariants (low risk)
Add a compile-time check or runtime assertion that system prompt construction is deterministic
Warn (footer yellow) when prompt_cache_hit_tokens drops significantly between turns
Add a /cache stats command showing cumulative cache savings in ¥
Phase B — Prefix zone enforcement (medium)
Formally split prompt construction into PinnedPrefix / AppendLog / TurnScratch zones
Hash the pinned prefix at session start — warn if any subsequent turn sends a different prefix
Reject code paths that mutate the log instead of appending
Phase C — Cross-session cache persistence (ambitious)
When session restarts within DeepSeek's cache TTL (~5-15 min), reconstruct the identical pinned prefix
This requires deterministic system prompt generation (no timestamp-dependent blocks in the pinned zone, etc.)
Problem
CodeWhale already has prefix-cache awareness in its DNA — the "volatile-content-last invariant" in
prompts.rs, byte-stable assistant message tests inclient.rs, and a cache-hit-percent footer chip. But these are best-effort conventions, not a systematic invariant enforced at the architecture level.deepseek-reasonix takes a different approach: it treats prefix-cache stability as a hard architectural invariant, not a guideline. The result is 85–99%+ cache hit rates in real sessions, translating to ~50× cost reduction on DeepSeek's pricing (¥0.02/1M cached vs ¥1/1M uncached).
What deepseek-reasonix does that CodeWhale could adopt
1. Byte-stable prompt construction as first-class invariant
Reasonix divides every request into three rigid zones that never shift:
Any code path that would mutate the prefix or reorder the log is rejected at the framework level, not caught in review.
This contrasts with CodeWhale's current approach where the volatile-content boundary is documented in comments but not enforced — a stray
edit_fileto instructions or an unlucky/compactcan silently bust the cache.2. Cross-session cache persistence
Reasonix sessions can be left running — the prefix stays stable across sessions because the pinned zone structure is deterministic. Reopening a session reconstructs the identical prefix from configuration, so the first API call of a new session can still hit the cache if the session was running recently.
3. Cache-first cost visibility
Reasonix surfaces cache economics directly: every turn shows cache hit rate %, estimated cost with/without caching, and cumulative savings. CodeWhale's footer chip (red <40%, yellow <80%) is a good start but doesn't show the cost impact.
Existing CodeWhale foundation (good news — not starting from zero)
prompts.rs:614client.rs:1529client.rs/ui.rsprompt_cache_hit_tokens/prompt_cache_miss_tokenstracked per turnWhat's missing
/compactbusts cache — the compaction relay intentionally rewrites the prefix, and there's no strategy to mitigate the costSuggested approach
Phase A — Harden existing invariants (low risk)
prompt_cache_hit_tokensdrops significantly between turns/cache statscommand showing cumulative cache savings in ¥Phase B — Prefix zone enforcement (medium)
PinnedPrefix/AppendLog/TurnScratchzonesPhase C — Cross-session cache persistence (ambitious)
Related
[auto] cost_saving— existing cost-conscious routingQuestions for discussion
PinnedPrefix/AppendLog/TurnScratchsplit be acceptable, or is it too invasive for the prompt construction pipeline?