Feature hasn't been suggested before.
Describe the enhancement you want to request
Currently prune is a basic remove after X amount of tokens have been reached, with a very basic filter of the latest operations. This is not enough when interleave thinking models are debugging complex/deep problems on their big context windows. Removing outputs/results from important steps is making them feel dumb or force them to hallucinate. By implementing a summary and smarter mechanism of that, keeping the feature of reducing unnecessary tokens, we can improve the interaction with the LLM
Feature hasn't been suggested before.
Describe the enhancement you want to request
Currently prune is a basic remove after X amount of tokens have been reached, with a very basic filter of the latest operations. This is not enough when interleave thinking models are debugging complex/deep problems on their big context windows. Removing outputs/results from important steps is making them feel dumb or force them to hallucinate. By implementing a summary and smarter mechanism of that, keeping the feature of reducing unnecessary tokens, we can improve the interaction with the LLM