Conversation
985fd62 to
47c662b
Compare
|
Does |
|
In the decoder - yes |
|
|
||
| // llm_build_context | ||
| static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= n_ctx | ||
| static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= kv_size |
There was a problem hiding this comment.
looks like this commend was missing a closing )
|
I do not agree with this change (but I like the underlying intention of making As I'm working on supporting Mamba in With Mamba, the KV cache size is tied to the maximum number of distinct sequences processed at the same time. Not the "context size". What I propose instead (and this is what I've started doing in #5328) is to keep TL;DR: renaming |
The
n_ctxname is causing some confusion since it's actual meaning is the size of the KV cache, whilen_ctx_trainis the training context of the modelThis change fixes that, but since it is a big one and touches a lot of stuff, I'm not sure if it worth merging. Maybe sometime in the future, when the time is right
Original PR: #5546