ggml : reuse quantum structs across backends#5943
Conversation
40b0a47 to
7a8c050
Compare
|
I wonder if this change is worth it. @slaren what do you think? We reduce code duplication, but maybe the macros are bit too much. Any better way to do this? |
|
I am not sure if this would be better, but it is also possible to access the data as a |
c5fa4ed to
f33ab14
Compare
Yup, that's an option, but I'm also not sure it would be better I'm 50/50 about merging this. We can potentially adopt this pattern and use it for other stuff that can be shared by introducing more |
6075b27 to
7741456
Compare
|
Avoiding code duplication is definitely good. We can always iterate over it in the future if we figure a cleaner way to do this. |
|
|
||
| #define QK4_0 32 | ||
| #define QI4_0 (QK4_0 / (4 * QR4_0)) | ||
| #define QR4_0 2 |
There was a problem hiding this comment.
I think these CUDA-specific macros (QIX, QRX) shouldn't be propagated to the other backends. If it so happens that another back-end uses these, it would be better to just duplicate there.
ed01e0c to
dca5020
Compare
|
Ok, will merge this after the CI is green |
|
These thread sanitizer failures seem unrelated to our changes: https://github.com/ggerganov/llama.cpp/actions/runs/8246447769/job/22552498578?pr=5943#step:5:27 Maybe related to this: google/sanitizers#1716? |
* ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci
* ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci
Trying to see if we can have the quantum structs declared in a single place (e.g.
ggml-common.h)#define GGML_COMMON_AGGR datais needed becausenvccdoes not allow members with constructors (half) in anonymousstructggml-common.his structured in such a way to allow the following usage:block_q8_1now usesfp16instead offp32fordands. This was already the case for CUDA and to make things similar, I decided to apply the same format on the CPU. This type is used only to quantize activations, so it is not breaking any existing modelsTODO:
Vulkan(might not be applicable here)Kompute(might not be applicable here)