Feature Request: Systematic prefix-cache stability — learn from deepseek-reasonix's 99%+ cache hit architecture

## Problem

CodeWhale already has prefix-cache awareness in its DNA — the "volatile-content-last invariant" in `prompts.rs`, byte-stable assistant message tests in `client.rs`, and a cache-hit-percent footer chip. But these are **best-effort conventions**, not a systematic invariant enforced at the architecture level.

[deepseek-reasonix](https://github.com/esengine/deepseek-reasonix) takes a different approach: it treats prefix-cache stability as a **hard architectural invariant**, not a guideline. The result is 85–99%+ cache hit rates in real sessions, translating to ~50× cost reduction on DeepSeek's pricing (¥0.02/1M cached vs ¥1/1M uncached).

## What deepseek-reasonix does that CodeWhale could adopt

### 1. Byte-stable prompt construction as first-class invariant

Reasonix divides every request into three rigid zones that never shift:

- **Prefix (pinned)**: system prompt, tool definitions, persistent memory — hashed at construction, **never** mutated mid-session
- **Log (append-only)**: conversation history — only appended, never reordered or edited
- **Scratch (ephemeral)**: per-turn metadata — wiped at every turn boundary

Any code path that would mutate the prefix or reorder the log is **rejected at the framework level**, not caught in review.

This contrasts with CodeWhale's current approach where the volatile-content boundary is documented in comments but not enforced — a stray `edit_file` to instructions or an unlucky `/compact` can silently bust the cache.

### 2. Cross-session cache persistence

Reasonix sessions can be left running — the prefix stays stable **across sessions** because the pinned zone structure is deterministic. Reopening a session reconstructs the identical prefix from configuration, so the first API call of a new session can still hit the cache if the session was running recently.

### 3. Cache-first cost visibility

Reasonix surfaces cache economics directly: every turn shows cache hit rate %, estimated cost with/without caching, and cumulative savings. CodeWhale's footer chip (red <40%, yellow <80%) is a good start but doesn't show the cost impact.

## Existing CodeWhale foundation (good news — not starting from zero)

| Component | What exists |
|-----------|-------------|
| `prompts.rs:614` | Volatile-content-last invariant (documented) |
| `client.rs:1529` | Byte-stable assistant message test |
| `client.rs` / `ui.rs` | `prompt_cache_hit_tokens` / `prompt_cache_miss_tokens` tracked per turn |
| Footer | Cache hit % chip with color thresholds |
| System prompt | Layered most-static-first for DeepSeek KV cache |

## What's missing

1. **No architectural enforcement** — the volatile-content boundary is a comment, not a compile-time or runtime gate
2. **No prefix hashing** — we can't detect when the prefix has been mutated and warn
3. **No cross-session prefix reuse** — restarting CodeWhale invalidates the entire cache
4. **No cost-equivalent visibility** — cache hit % is shown but without translating to actual ¥ saved
5. **`/compact` busts cache** — the compaction relay intentionally rewrites the prefix, and there's no strategy to mitigate the cost

## Suggested approach

### Phase A — Harden existing invariants (low risk)
- Add a compile-time check or runtime assertion that system prompt construction is deterministic
- Warn (footer yellow) when `prompt_cache_hit_tokens` drops significantly between turns
- Add a `/cache stats` command showing cumulative cache savings in ¥

### Phase B — Prefix zone enforcement (medium)
- Formally split prompt construction into `PinnedPrefix` / `AppendLog` / `TurnScratch` zones
- Hash the pinned prefix at session start — warn if any subsequent turn sends a different prefix
- Reject code paths that mutate the log instead of appending

### Phase C — Cross-session cache persistence (ambitious)
- When session restarts within DeepSeek's cache TTL (~5-15 min), reconstruct the identical pinned prefix
- This requires deterministic system prompt generation (no timestamp-dependent blocks in the pinned zone, etc.)

## Related

- #1207 `[auto] cost_saving` — existing cost-conscious routing
- #1676 Dual mode — cost savings via Flash/Pro routing
- #2026 Whale-size naming — user-facing model/cost understanding

## Questions for discussion

1. Would a formal `PinnedPrefix` / `AppendLog` / `TurnScratch` split be acceptable, or is it too invasive for the prompt construction pipeline?
2. Cross-session cache persistence requires moving time-dependent blocks (like current date) into the scratch zone — acceptable trade-off?
3. Should this be a standalone initiative or folded into #1676 as a companion cost-saving pillar?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Systematic prefix-cache stability — learn from deepseek-reasonix's 99%+ cache hit architecture #2264

Problem

What deepseek-reasonix does that CodeWhale could adopt

1. Byte-stable prompt construction as first-class invariant

2. Cross-session cache persistence

3. Cache-first cost visibility

Existing CodeWhale foundation (good news — not starting from zero)

What's missing

Suggested approach

Phase A — Harden existing invariants (low risk)

Phase B — Prefix zone enforcement (medium)

Phase C — Cross-session cache persistence (ambitious)

Related

Questions for discussion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Component	What exists
`prompts.rs:614`	Volatile-content-last invariant (documented)
`client.rs:1529`	Byte-stable assistant message test
`client.rs` / `ui.rs`	`prompt_cache_hit_tokens` / `prompt_cache_miss_tokens` tracked per turn
Footer	Cache hit % chip with color thresholds
System prompt	Layered most-static-first for DeepSeek KV cache

Feature Request: Systematic prefix-cache stability — learn from deepseek-reasonix's 99%+ cache hit architecture #2264

Description

Problem

What deepseek-reasonix does that CodeWhale could adopt

1. Byte-stable prompt construction as first-class invariant

2. Cross-session cache persistence

3. Cache-first cost visibility

Existing CodeWhale foundation (good news — not starting from zero)

What's missing

Suggested approach

Phase A — Harden existing invariants (low risk)

Phase B — Prefix zone enforcement (medium)

Phase C — Cross-session cache persistence (ambitious)

Related

Questions for discussion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions