fix: restore MiniCPM inference after Granite Four changes by jk3456a · Pull Request #14850 · ggml-org/llama.cpp

jk3456a · 2025-07-24T07:09:28Z

This commit fixes MiniCPM model inference that was broken by the Granite Four PR (#13550). The issue had two parts:

Missing LLM_KV_ATTENTION_LAYER_INDICES enum value that was removed, causing enum ordering to shift and breaking model metadata parsing
MiniCPM architecture uses llm_build_granite which was changed to use hparams.rope_finetuned instead of use_rope parameter, but MiniCPM models were not setting this flag correctly

Changes:

Restore LLM_KV_ATTENTION_LAYER_INDICES enum and string mapping
Set hparams.rope_finetuned = true for MiniCPM architecture

Fixes inference output from gibberish to correct model responses.

Tested with MiniCPM 0.5B model showing proper inference: Input: "你好"
Output: "你好，我是MiniCPM系列模型，由面壁智能和OpenBMB开源社区开发。详细信息请访问 https://github.com/OpenBMB/ [end of text]"

Make sure to read the contributing guidelines before submitting a PR

ggerganov · 2025-07-24T07:16:33Z

Missing LLM_KV_ATTENTION_LAYER_INDICES enum value that was removed, causing enum ordering to shift and breaking model metadata parsing

Hm, I don't think the actual enum values matter. Which parsing is broken?

MiniCPM models use the llm_build_granite constructor which was changed in the Granite Four PR to use hparams.rope_finetuned instead of a use_rope parameter. MiniCPM models need rope enabled by default. Fixes inference from gibberish to correct responses.

jk3456a · 2025-07-24T08:37:21Z

sorry you are right, the enum value doesn't matter.
The actual issue was that MiniCPM models use the llm_build_granite constructor, which was changed in the Granite Four PR to use hparams.rope_finetuned instead of the use_rope parameter. MiniCPM models need rope enabled by default, but weren't setting this flag.

) MiniCPM models use the llm_build_granite constructor which was changed in the Granite Four PR to use hparams.rope_finetuned instead of a use_rope parameter. MiniCPM models need rope enabled by default. Fixes inference from gibberish to correct responses.

* origin/master: docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874) ggml : remove invalid portPos specifiers from dot files (ggml-org#14838) context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870) mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503) rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868) sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855) musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498) sync : ggml cmake : fix usage issues (ggml/1257) ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) context : perform output reorder lazily upon access after sync (ggml-org#14853) chat : fix kimi-k2 chat template (ggml-org#14852) sycl: fixed semantics of block offset calculation (ggml-org#14814) llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850) docs: add libcurl-dev install hint for Linux distros (ggml-org#14801) metal : fix fusion across different encoders (ggml-org#14849) sycl: fix undefined variable in work group size check (ggml-org#14843) convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823) CUDA: fix overflow in FA, tune performance (ggml-org#14840) CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)

MiniCPM models use the llm_build_granite constructor which was changed in the Granite Four PR to use hparams.rope_finetuned instead of a use_rope parameter. MiniCPM models need rope enabled by default. Fixes inference from gibberish to correct responses.

jk3456a force-pushed the fix/minicpm-rope-inference branch from 19594e5 to 2cdb760 Compare July 24, 2025 08:33

CISC approved these changes Jul 24, 2025

View reviewed changes

CISC merged commit 86f5623 into ggml-org:master Jul 24, 2025
47 checks passed

jk3456a deleted the fix/minicpm-rope-inference branch July 24, 2025 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore MiniCPM inference after Granite Four changes#14850

fix: restore MiniCPM inference after Granite Four changes#14850
CISC merged 1 commit intoggml-org:masterfrom
jk3456a:fix/minicpm-rope-inference

jk3456a commented Jul 24, 2025

Uh oh!

ggerganov commented Jul 24, 2025

Uh oh!

jk3456a commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jk3456a commented Jul 24, 2025

Uh oh!

ggerganov commented Jul 24, 2025

Uh oh!

jk3456a commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants