-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Closed
Labels
Description
Hello,
I know the quantize.py converts a ggml 16 bits into a ggml 4 bits RTN.
Do you think it's possible to create a script that converts a ggml 16 bits into a ggml 4bits GPTQ?
Referring to this repository, it appears that the current implementation of the quantization relies only on GPU, which demands a significant amount of VRAM and might not be suitable for the average user.
A new script, which we could call "quantize-ggml_16bit-to-gptq.py", could be designed to use only CPU and RAM resources, making it more accessible to the general public.
Reactions are currently unavailable