fix: restore MiniCPM inference after Granite Four changes#14850
Merged
CISC merged 1 commit intoggml-org:masterfrom Jul 24, 2025
Merged
fix: restore MiniCPM inference after Granite Four changes#14850CISC merged 1 commit intoggml-org:masterfrom
CISC merged 1 commit intoggml-org:masterfrom
Conversation
Member
Hm, I don't think the actual enum values matter. Which parsing is broken? |
MiniCPM models use the llm_build_granite constructor which was changed in the Granite Four PR to use hparams.rope_finetuned instead of a use_rope parameter. MiniCPM models need rope enabled by default. Fixes inference from gibberish to correct responses.
19594e5 to
2cdb760
Compare
Contributor
Author
|
sorry you are right, the enum value doesn't matter. |
CISC
approved these changes
Jul 24, 2025
gabe-l-hart
added a commit
to gabe-l-hart/llama.cpp
that referenced
this pull request
Jul 25, 2025
* origin/master: docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874) ggml : remove invalid portPos specifiers from dot files (ggml-org#14838) context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870) mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503) rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868) sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855) musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498) sync : ggml cmake : fix usage issues (ggml/1257) ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) context : perform output reorder lazily upon access after sync (ggml-org#14853) chat : fix kimi-k2 chat template (ggml-org#14852) sycl: fixed semantics of block offset calculation (ggml-org#14814) llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850) docs: add libcurl-dev install hint for Linux distros (ggml-org#14801) metal : fix fusion across different encoders (ggml-org#14849) sycl: fix undefined variable in work group size check (ggml-org#14843) convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823) CUDA: fix overflow in FA, tune performance (ggml-org#14840) CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)
blime4
referenced
this pull request
in blime4/llama.cpp
Feb 5, 2026
MiniCPM models use the llm_build_granite constructor which was changed in the Granite Four PR to use hparams.rope_finetuned instead of a use_rope parameter. MiniCPM models need rope enabled by default. Fixes inference from gibberish to correct responses.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit fixes MiniCPM model inference that was broken by the Granite Four PR (#13550). The issue had two parts:
Missing LLM_KV_ATTENTION_LAYER_INDICES enum value that was removed, causing enum ordering to shift and breaking model metadata parsing
MiniCPM architecture uses llm_build_granite which was changed to use hparams.rope_finetuned instead of use_rope parameter, but MiniCPM models were not setting this flag correctly
Changes:
Fixes inference output from gibberish to correct model responses.
Tested with MiniCPM 0.5B model showing proper inference: Input: "你好"
Output: "你好,我是MiniCPM系列模型,由面壁智能和OpenBMB开源社区开发。详细信息请访问 https://github.com/OpenBMB/ [end of text]"
Make sure to read the contributing guidelines before submitting a PR