Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 by ikawrakow · Pull Request #4996 · ggml-org/llama.cpp

ikawrakow · 2024-01-17T10:22:53Z

I have missed this tweak when adding Q2_K_S.

With this change, model size for Mistral-7B increases by only ~30 MB (0.03 bpw) while

Perplexity for a context of 512 on wiki.test.raw goes down from 6.9259 to 6.7116
10-shot HellaSwag score after 2000 tasks increases by 0.95 +/- 0.42.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4

9fd1e83

ggerganov approved these changes Jan 17, 2024

View reviewed changes

ggerganov merged commit 2b3a665 into master Jan 17, 2024

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024

llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (ggml-org#4996)

ec0b035

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4#4996

Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4#4996
ggerganov merged 1 commit intomasterfrom
ik/better_q2_k_s

ikawrakow commented Jan 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ikawrakow commented Jan 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants