llama : fix build_ffn without gate by ngxson · Pull Request #13336 · ggml-org/llama.cpp

ngxson · 2025-05-06T08:54:57Z

While porting these build_* function to clip.cpp, I came across the case of FFN not having gate (which is quite common in vision transformers)

The current logic in llm_graph_context::build_ffn kinda assumes that gate always exist (which make sense, since most - if not all - modern LLM have up/gate/down FFN). If the gate is missing, this causes the logic to calculate cur = up_state * up_state which results in an incorrect result.

I'm not sure if the code is intended to be written this way, but I think we should either:

add gate to the check as proposed in this PR
or, GGML_ASSERT(gate) to make sure the gate is always there

Please note that when porting to ViT, I only copy the case of LLM_FFN_PAR and not LLM_FFN_SEQ

This reverts commit fc420d3.

* origin/master: (27 commits) llama : fix build_ffn without gate (ggml-org#13336) CUDA: fix bad asserts for partial offload (ggml-org#13337) convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331) CUDA: fix --split-mode row for MMQ (ggml-org#13323) gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036) CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320) sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264) server : Webui - change setText command from parent window to also send the message. (ggml-org#13309) mtmd : rename llava directory to mtmd (ggml-org#13311) clip : fix confused naming ffn_up and ffn_down (ggml-org#13290) convert : bailingmoe : set yarn metadata if present (ggml-org#13312) SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308) mtmd : add C public API (ggml-org#13184) rpc : use backend registry, support dl backends (ggml-org#13304) ggml : activate s390x simd for Q3_K (ggml-org#13301) llava/mtmd : fixes to fully support dl backends (ggml-org#13303) llama : build windows releases with dl backends (ggml-org#13220) CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299) CUDA: fix race condition in MMQ ids_dst (ggml-org#13294) vulkan: Additional type support for unary, binary, and copy (ggml-org#13266) ...

* llama : fix build_ffn without gate * fix build on windows * Revert "fix build on windows" This reverts commit fc420d3.

llama : fix build_ffn without gate

98ce93e

ngxson requested review from ggerganov and slaren May 6, 2025 08:54

fix build on windows

fc420d3

github-actions bot added the examples label May 6, 2025

slaren approved these changes May 6, 2025

View reviewed changes

Revert "fix build on windows"

8681d3d

This reverts commit fc420d3.

ngxson merged commit 2f54e34 into master May 6, 2025
45 checks passed

ngxson deleted the xsn/graph_ffn_gate_fix branch October 5, 2025 11:28

timwu pushed a commit to timwu/llama.cpp that referenced this pull request Dec 20, 2025

llama : fix build_ffn without gate (ggml-org#13336)

e63a493

* llama : fix build_ffn without gate * fix build on windows * Revert "fix build on windows" This reverts commit fc420d3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : fix build_ffn without gate#13336

llama : fix build_ffn without gate#13336
ngxson merged 3 commits intomasterfrom
xsn/graph_ffn_gate_fix

ngxson commented May 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngxson commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented May 6, 2025 •

edited

Loading