Problem
After compaction runs, the context token counter falls back to a heuristic that significantly underestimates the true token count. This causes ShouldCompact() to never trigger auto-compaction, and the next API call blows past the real model context window.
Reproduction / observed flow
- Chat history grows; API reports real input tokens (e.g. 22k).
ShouldCompact() sees lastInputTokens > 0 and uses the accurate count.
- Compaction runs. History is summarized/summarizer truncated.
GetContextStats() resets lastInputTokens = 0.
- Next turn:
ShouldCompact() falls back to compaction.EstimateMessageTokens(history).
- The heuristic counts only session messages and ignores:
- System prompt
- Registered tool schemas
- Previous tool results
- True payload is ~20k+ tokens, but heuristic reports ~8k.
- Auto-compaction never fires. API call fails with "context too big".
Impact
Any consumer with a large system prompt or many tools will hit this after the first compaction. The user is told to "try compacting" even though compaction already happened.
Suggested fixes
Option A — Improve the heuristic
Make EstimateMessageTokens() accept or account for:
- System prompt size
- Tool schema token count
- Recent tool result tokens
Option B — Preserve post-compaction count
Instead of zeroing lastInputTokens, subtract an estimate of the removed history and keep using the API-reported baseline.
Option C — Add a safety margin
When lastInputTokens == 0, add a configurable overhead (e.g. + systemTokens + toolTokens) before comparing against the window limit.
References
pkg/kit/compaction.go — lastInputTokens reset after compaction
internal/compaction/compaction.go — EstimateMessageTokens() implementation
- Observed in downstream project using
kit@v0.82.1
Environment
kit version: v0.82.1
- Model fallback window:
262143 (when model not in registry, compounding the issue)
- Go version:
1.24
Happy to open a PR for Option B or C if you point me toward the preferred approach.
Problem
After compaction runs, the context token counter falls back to a heuristic that significantly underestimates the true token count. This causes
ShouldCompact()to never trigger auto-compaction, and the next API call blows past the real model context window.Reproduction / observed flow
ShouldCompact()seeslastInputTokens > 0and uses the accurate count.GetContextStats()resetslastInputTokens = 0.ShouldCompact()falls back tocompaction.EstimateMessageTokens(history).Impact
Any consumer with a large system prompt or many tools will hit this after the first compaction. The user is told to "try compacting" even though compaction already happened.
Suggested fixes
Option A — Improve the heuristic
Make
EstimateMessageTokens()accept or account for:Option B — Preserve post-compaction count
Instead of zeroing
lastInputTokens, subtract an estimate of the removed history and keep using the API-reported baseline.Option C — Add a safety margin
When
lastInputTokens == 0, add a configurable overhead (e.g.+ systemTokens + toolTokens) before comparing against the window limit.References
pkg/kit/compaction.go—lastInputTokensreset after compactioninternal/compaction/compaction.go—EstimateMessageTokens()implementationkit@v0.82.1Environment
kitversion:v0.82.1262143(when model not in registry, compounding the issue)1.24Happy to open a PR for Option B or C if you point me toward the preferred approach.