Skip to content

Making a "quantize-ggml_16bit-to-gptq.py" script? #618

@BadisG

Description

@BadisG

Hello,

I know the quantize.py converts a ggml 16 bits into a ggml 4 bits RTN.
Do you think it's possible to create a script that converts a ggml 16 bits into a ggml 4bits GPTQ?

Referring to this repository, it appears that the current implementation of the quantization relies only on GPU, which demands a significant amount of VRAM and might not be suitable for the average user.

A new script, which we could call "quantize-ggml_16bit-to-gptq.py", could be designed to use only CPU and RAM resources, making it more accessible to the general public.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions