Making a "quantize-ggml_16bit-to-gptq.py" script?

Hello,

I know the [quantize.py](https://github.com/ggerganov/llama.cpp/blob/master/quantize.py) converts a ggml 16 bits into a ggml 4 bits RTN.
Do you think it's possible to create a script that converts a ggml 16 bits into a ggml 4bits GPTQ?

Referring to [this repository](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/pytorch), it appears that the current implementation of the quantization relies only on GPU, which demands a significant amount of VRAM and might not be suitable for the average user.

A new script, which we could call "quantize-ggml_16bit-to-gptq.py", could be designed to use only CPU and RAM resources, making it more accessible to the general public.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making a "quantize-ggml_16bit-to-gptq.py" script? #618

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Making a "quantize-ggml_16bit-to-gptq.py" script? #618

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions