server: rename legacy --ctx-size to --kv-size option#5546
server: rename legacy --ctx-size to --kv-size option#5546phymbert wants to merge 4 commits intoggml-org:masterfrom
Conversation
|
Gatekeeping intensifies.. |
@pugzly Agreed , server |
|
Taking into account @pugzly feedback, @ggerganov I am wondering if this change should also be aplied to the whole code base. For example, |
|
I'm new to llama.cpp but the behavior of this option was immediately obvious from the first-run output, that the "context size" gets divided by the number of slots and thus that it's not exactly a context size, but rather the space allocated for context. I'm just a N=1 datapoint, but I think the confusion could be corrected simply by updating the docs of the server example to deal with parallel slots and the need for raising the ctx_size to slots*ctx_size -- or the code could multiply it itself, while treating the ctx_size as if it's only for a single slot. |
Yes, eventually |
OK reverted to draft PR, I will give it a try. |
cd99def to
c8e172a
Compare
|
@ggerganov I have tried, but I have the feeling that it's a bigbang change and I am not confident to be the one to bring it to master. Even if I spent some time on it, please feel free to simply close the PR, otherwise I will add necessary changes you request. |
|
No worries, I've moved the changes to #5568 in order to run |
That is great, but there's is 1 year worth of guides, tutorials, and applications built around and on top of llama.cpp, many of which may or may not be rendered obsolete, due to this, to the most part, "aesthetic" change. |
|
Don't worry - when and if the change is applied, there will be deprecation notices. Plus it's actually a tiny API change (see |
Context
--ctx-sizeis a legacy name before introduction of parallelism slots and creates confusion (see discussion #4130).Proposed changes
Introduce
--kv-sizeoption and deprecate--ctx-sizeone.@ggerganov Thanks for the amazing job you are doing here, hope this small contribution will help.