-
Notifications
You must be signed in to change notification settings - Fork 17.4k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add support for prefetching llama.cpp models from a preset file via
--prefetch
examples
server
#22417
opened Apr 27, 2026 by
stazio
Loading…
fix: re-throw raised_exception in statement::execute without wrapping
jinja parser
Issues related to the jinja parser
#22409
opened Apr 26, 2026 by
rxho
Loading…
ggml-webgpu: add layer norm ops
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#22406
opened Apr 26, 2026 by
Constannnnnt
Contributor
Loading…
Docs/SeepSeek V4 GGUF Stanardization
documentation
Improvements or additions to documentation
#22405
opened Apr 26, 2026 by
Nottlespike
•
Draft
llama: allow partial seq_rm for GDN models for speculative decoding
examples
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
Nvidia GPU
Issues specific to Nvidia GPUs
server
testing
Everything test related
Vulkan
Issues specific to the Vulkan backend
fix: rpc-server cache may not work in Windows environments
examples
ggml
changes relating to the ggml tensor library for machine learning
#22394
opened Apr 26, 2026 by
unraido
Loading…
server : add slot_prompt_similarity getter/setter
examples
server
#22393
opened Apr 26, 2026 by
bernardladenthin
Loading…
Windows: raise stdio limit for loading many GGUF shards
#22385
opened Apr 26, 2026 by
Thireus
Contributor
Loading…
Wip/deepseek v4 support
examples
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
testing
Everything test related
ggml-webgpu: add Q1_0 support
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#22374
opened Apr 25, 2026 by
SharmaRithik
Contributor
Loading…
Convert argv from UTF-16 on Windows for non-ASCII -p prompts
examples
#22366
opened Apr 25, 2026 by
duzenko
Loading…
Add DeepSeek V4 GGUF conversion
python
python script changes
#22359
opened Apr 25, 2026 by
nisparks
Contributor
Loading…
convert : remove input_scale for dequantized fp8 modelopt
python
python script changes
#22356
opened Apr 25, 2026 by
CISC
Member
Loading…
ggml : revert to -lm linking instead of find_library
ggml
changes relating to the ggml tensor library for machine learning
#22355
opened Apr 25, 2026 by
angt
Member
Loading…
rpc: add ipv6 support
examples
ggml
changes relating to the ggml tensor library for machine learning
#22350
opened Apr 25, 2026 by
alphaonex86
Loading…
fix: read n_embd before vocab_only early return for mmproj init
#22348
opened Apr 25, 2026 by
ChenYFan
Loading…
ggml-webgpu: fast matrix-vector multiplication for i-quants
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#22344
opened Apr 25, 2026 by
SharmaRithik
Contributor
Loading…
chat: preserve media markers for typed-content templates
#22342
opened Apr 25, 2026 by
AlexonOliveiraRH
Loading…
2 tasks done
ggml: implement changes relating to the ggml tensor library for machine learning
testing
Everything test related
gguf_init_from_buffer
ggml
#22341
opened Apr 24, 2026 by
giladgd
Contributor
Loading…
common: fix missing exports in llama-common
examples
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#22340
opened Apr 24, 2026 by
max-krasnyansky
Member
Loading…
server: respect per-request enable_thinking toggle via extra_body
examples
server
#22336
opened Apr 24, 2026 by
pju-hoge
Loading…
opencl: refactor Adreno q4_0
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.