ggml : reuse quantum structs across backends by ggerganov · Pull Request #5943 · ggml-org/llama.cpp

ggerganov · 2024-03-08T15:17:13Z

Trying to see if we can have the quantum structs declared in a single place (e.g. ggml-common.h)

The #define GGML_COMMON_AGGR data is needed because nvcc does not allow members with constructors (half) in anonymous struct
ggml-common.h is structured in such a way to allow the following usage:

// some-file.h
#define GGML_COMMON_DECL_C
#include "ggml-common.h"

// some-file.c
#include "some-file.h"

#define GGML_COMMON_IMPL_C
#include "ggml-common.h"

On the CPU, block_q8_1 now uses fp16 instead of fp32 for d and s. This was already the case for CUDA and to make things similar, I decided to apply the same format on the CPU. This type is used only to quantize activations, so it is not breaking any existing models

TODO:

CPU
CUDA
Metal
SYCL
~~Vulkan~~ (might not be applicable here)
~~Kompute~~ (might not be applicable here)

ggerganov · 2024-03-09T13:58:36Z

I wonder if this change is worth it. @slaren what do you think?

We reduce code duplication, but maybe the macros are bit too much. Any better way to do this?

slaren · 2024-03-09T15:02:20Z

I am not sure if this would be better, but it is also possible to access the data as a half2 with a reinterpret cast, ie. *(half2*)&x.d. That wouldn't require changes to the structs for CUDA.

ggerganov · 2024-03-10T21:01:40Z

I am not sure if this would be better, but it is also possible to access the data as a half2 with a reinterpret cast, ie. *(half2*)&x.d. That wouldn't require changes to the structs for CUDA.

Yup, that's an option, but I'm also not sure it would be better

I'm 50/50 about merging this. We can potentially adopt this pattern and use it for other stuff that can be shared by introducing more GGML_COMMON_XXX guards. But on the other hand it is a little ugly. So I don't know

ggml-ci

slaren · 2024-03-11T11:46:30Z

Avoiding code duplication is definitely good. We can always iterate over it in the future if we figure a cleaner way to do this.

ikawrakow · 2024-03-11T16:42:34Z

ggml-common.h

+
+#define QK4_0 32
+#define QI4_0 (QK4_0 / (4 * QR4_0))
+#define QR4_0 2


I think these CUDA-specific macros (QIX, QRX) shouldn't be propagated to the other backends. If it so happens that another back-end uses these, it would be better to just duplicate there.

ggml-ci

ggerganov · 2024-03-12T09:27:21Z

Ok, will merge this after the CI is green

ggml-ci

ggerganov · 2024-03-12T11:03:12Z

These thread sanitizer failures seem unrelated to our changes:

https://github.com/ggerganov/llama.cpp/actions/runs/8246447769/job/22552498578?pr=5943#step:5:27

Maybe related to this: google/sanitizers#1716?
Though the runner seems to be running kernel 6.5.0 so not sure

* ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci

ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Mar 8, 2024

ggerganov mentioned this pull request Mar 8, 2024

ggml : add ggml-common.h to deduplicate shared code #5940

Merged

1 task

ggerganov force-pushed the gg/ggml-common-decl branch 3 times, most recently from 40b0a47 to 7a8c050 Compare March 9, 2024 08:35

Base automatically changed from gg/ggml-common to master March 9, 2024 10:47

ggerganov force-pushed the gg/ggml-common-decl branch 3 times, most recently from c5fa4ed to f33ab14 Compare March 10, 2024 20:37

ggerganov added need feedback Testing and feedback with results are needed and removed demo Demonstrate some concept or idea, not intended to be merged labels Mar 10, 2024

ggml : reuse quant blocks across backends

7741456

ggml-ci

ggerganov force-pushed the gg/ggml-common-decl branch from 6075b27 to 7741456 Compare March 11, 2024 08:52

ggerganov marked this pull request as ready for review March 11, 2024 13:06

Merge branch 'master' into gg/ggml-common-decl

54ebe70

ikawrakow reviewed Mar 11, 2024

View reviewed changes

ggml : define helper constants only for CUDA and SYCL

dca5020

ggml-ci

ggerganov force-pushed the gg/ggml-common-decl branch from ed01e0c to dca5020 Compare March 11, 2024 17:41

Merge branch 'master' into gg/ggml-common-decl

895f437

ggml-ci

ggml : define helper quantum constants for SYCL

66b88a8

ggml-ci

ggerganov merged commit 8030da7 into master Mar 12, 2024

ggerganov deleted the gg/ggml-common-decl branch March 12, 2024 12:27

ggerganov mentioned this pull request Mar 13, 2024

llama : add pipeline parallelism support #6017

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : reuse quantum structs across backends#5943

ggml : reuse quantum structs across backends#5943
ggerganov merged 5 commits intomasterfrom
gg/ggml-common-decl

ggerganov commented Mar 8, 2024 •

edited

Loading

Uh oh!

ggerganov commented Mar 9, 2024

Uh oh!

slaren commented Mar 9, 2024

Uh oh!

ggerganov commented Mar 10, 2024

Uh oh!

slaren commented Mar 11, 2024

Uh oh!

ikawrakow Mar 11, 2024

Uh oh!

ggerganov commented Mar 12, 2024

Uh oh!

ggerganov commented Mar 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggerganov commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Mar 9, 2024

Uh oh!

slaren commented Mar 9, 2024

Uh oh!

ggerganov commented Mar 10, 2024

Uh oh!

slaren commented Mar 11, 2024

Uh oh!

ikawrakow Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Mar 12, 2024

Uh oh!

ggerganov commented Mar 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Mar 8, 2024 •

edited

Loading