server : enable multi-modal prompt caching by ggerganov · Pull Request #19877 · ggml-org/llama.cpp

ggerganov · 2026-02-25T06:20:44Z

target #19849
cont #16391

We can now clone server_tokens so re-enable host-memory prompt caching for multi-modal cases.

andyceo · 2026-02-25T12:05:57Z

I switched to gg/server-mtmd-prompt-cache, do git rebase master and build app.

Confirm now prompt caching works! Note that I run with: no-mmproj-offload = on and not try without it

VRAM usage is quite unstable for qwen3.5-27B, but this is not related to that issue. Thank you!

BVEsun · 2026-02-26T05:19:13Z

Thank you very much for ggerganov effort to troubleshoot this problem

I am using windows 10 b8157 release cuda 12.4

After this PR, text prompt processing is being handled by the CPU instead of the GPU, which is causing a significant slowdown.

ggerganov requested a review from ngxson as a code owner February 25, 2026 06:20

github-actions bot added examples server labels Feb 25, 2026

ngxson approved these changes Feb 25, 2026

View reviewed changes

Base automatically changed from pr/19747-alt to master February 25, 2026 13:14

server : enable multi-modal prompt caching

dc4d447

ggerganov force-pushed the gg/server-mtmd-prompt-cache branch from f94fc71 to dc4d447 Compare February 25, 2026 13:15

ggerganov merged commit f20469d into master Feb 25, 2026
75 of 76 checks passed

ggerganov deleted the gg/server-mtmd-prompt-cache branch February 25, 2026 13:15

shashwata2020 mentioned this pull request Feb 26, 2026

Qwen3.5-35B-A3B: KV cache reuse not supported — full prompt recompute on every request lmstudio-ai/lmstudio-bug-tracker#1563

Open

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026

server : enable multi-modal prompt caching (ggml-org#19877)

d34fd7e

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026

server : enable multi-modal prompt caching (ggml-org#19877)

9eb8813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : enable multi-modal prompt caching#19877

server : enable multi-modal prompt caching#19877
ggerganov merged 1 commit intomasterfrom
gg/server-mtmd-prompt-cache

ggerganov commented Feb 25, 2026

Uh oh!

andyceo commented Feb 25, 2026

Uh oh!

Uh oh!

BVEsun commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ggerganov commented Feb 25, 2026

Uh oh!

andyceo commented Feb 25, 2026

Uh oh!

Uh oh!

BVEsun commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants