Summary
DeepSeek models in thinking mode fail on follow-up turns with:
litellm.BadRequestError: DeepseekException - {"error":{"message":"The `reasoning_content` in the thinking mode must be passed back to the API.", ...}}
Raven already captures reasoning_content from LiteLLM responses, but the context assembly path strips that field before the next request is built. As a result, the next DeepSeek call replays the assistant message body without the required reasoning_content, and DeepSeek rejects the request.
Steps to reproduce
- Configure Raven to use a DeepSeek thinking-capable model.
- Send a first message that produces an assistant response with
reasoning_content.
- Send a follow-up message in the same session.
- Observe the second LLM call fail with:
The `reasoning_content` in the thinking mode must be passed back to the API.
Expected behavior
If Raven persists an assistant message with reasoning_content / thinking_blocks, the next request should replay those fields intact for providers that require them (DeepSeek thinking mode, and likely other reasoning-capable OpenAI-compatible models).
Actual behavior
- Raven preserves reasoning when building the assistant message (
raven/utils/helpers.py:76-90).
- LiteLLM request sanitization also allows
reasoning_content (raven/providers/litellm_provider.py:35, _ALLOWED_MSG_KEYS).
- But the Curator history projection drops
reasoning_content / thinking_blocks before the next request is assembled:
- fast path:
raven/context_engine/segments/curator.py:295-305
- trimmed path:
raven/context_engine/history_trimmer.py:32-35, 116-121
- That means the next prompt contains the assistant content but not the paired reasoning payload DeepSeek expects.
Root cause analysis
The bug is in history projection, not in response capture:
- Raven stores reasoning fields on assistant messages:
raven/utils/helpers.py:76-90
- The context engine always uses
ContextAssembler / Curator history selection for main-agent turns:
raven/context_engine/assembler.py:61-65
raven/agent/loop/main.py:664-672, 737-742
- Both Curator history filters use a fixed allowlist that excludes
reasoning_content and thinking_blocks:
raven/context_engine/segments/curator.py:295-305
raven/context_engine/history_trimmer.py:32-35, 116-121
- Minimal reproduction from the current tree:
from raven.context_engine.segments.curator import CuratorSegmentBuilder
msgs = [
{"role": "user", "content": "hi"},
{
"role": "assistant",
"content": "answer",
"reasoning_content": "think",
"thinking_blocks": [{"thinking": "block"}],
},
]
print(CuratorSegmentBuilder._history_from_messages(msgs))
# => [{'role': 'user', 'content': 'hi'}, {'role': 'assistant', 'content': 'answer'}]
Expected fix direction
The history projection allowlists should preserve reasoning fields when present, at least for providers/models that require exact replay. A regression test should cover multi-turn DeepSeek thinking-mode conversations so follow-up turns keep passing.
Environment
OS: macOS (user report)
Shell: zsh (user report)
Provider: DeepSeek
Raven source analyzed: main @ 040fc15
Logs or screenshots
User-observed error:
2026-07-03 14:41:08.742 | ERROR | raven.agent.loop.main:_run_agent_loop:1524 - LLM returned error: Error calling LLM: litellm.BadRequestError: DeepseekException - {"error":{"message":"The `reasoning_content` in the thinking mode must be passed back to the API.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}
Summary
DeepSeek models in thinking mode fail on follow-up turns with:
Raven already captures
reasoning_contentfrom LiteLLM responses, but the context assembly path strips that field before the next request is built. As a result, the next DeepSeek call replays the assistant message body without the requiredreasoning_content, and DeepSeek rejects the request.Steps to reproduce
reasoning_content.Expected behavior
If Raven persists an assistant message with
reasoning_content/thinking_blocks, the next request should replay those fields intact for providers that require them (DeepSeek thinking mode, and likely other reasoning-capable OpenAI-compatible models).Actual behavior
raven/utils/helpers.py:76-90).reasoning_content(raven/providers/litellm_provider.py:35,_ALLOWED_MSG_KEYS).reasoning_content/thinking_blocksbefore the next request is assembled:raven/context_engine/segments/curator.py:295-305raven/context_engine/history_trimmer.py:32-35,116-121Root cause analysis
The bug is in history projection, not in response capture:
raven/utils/helpers.py:76-90ContextAssembler/ Curator history selection for main-agent turns:raven/context_engine/assembler.py:61-65raven/agent/loop/main.py:664-672,737-742reasoning_contentandthinking_blocks:raven/context_engine/segments/curator.py:295-305raven/context_engine/history_trimmer.py:32-35,116-121Expected fix direction
The history projection allowlists should preserve reasoning fields when present, at least for providers/models that require exact replay. A regression test should cover multi-turn DeepSeek thinking-mode conversations so follow-up turns keep passing.
Environment
OS: macOS (user report)
Shell: zsh (user report)
Provider: DeepSeek
Raven source analyzed:
main@040fc15Logs or screenshots
User-observed error: