convert-hf : support direct Q8_0 conversion#7234
Conversation
I didn't notice these on my first pass.
|
Hi, this PR breaks model conversion on my system. I was using However, I am hoping that it is possible to allow it to work with numpy 1.22 as it did before this commit as a fallback? A lot of toolchains might still be on slightly older versions of numpy, forcing the use of the latest newest version may not be ideal. |
| if tensor_dtype == np.uint8: | ||
| block_size, type_size = GGML_QUANT_SIZES[raw_dtype] | ||
| if tensor_shape[-1] % type_size != 0: | ||
| raise ValueError(f"Quantized tensor row size ({tensor_shape[-1]}) is not a multiple of {dtype.name} type size ({type_size})") | ||
| tensor_shape = tuple(tensor_shape[:-1]) + (tensor_shape[-1] // type_size * block_size,) |
There was a problem hiding this comment.
This has broken copying of tensors on i-quants (and probably several others as well), using
./gguf-new-metadata.py foo.IQ4_NL.gguf bar.ggufyou now get
ValueError: Quantized tensor row size (4096) is not a multiple of IQ4_NL type size (18)
There was a problem hiding this comment.
The issue seems to be that the type_size is off by 2, however I don't see why the tensor should be reshaped in this scenario, so this should probably be re-evaluated.
There was a problem hiding this comment.
Thanks for finding this!
I think it also breaks copying of all other quantized tensors in gguf-new-metadata.
Sorry about that.
I think I found a way to fix this while also simplifying what happens to the shape in the round-trip between GGUFReader and GGUFWriter. See #7483
This adds
Q8_0conversion toconvert-hf-to-gguf.py, and it results in EXACTLY the same files as if converted with./quantizefrom anf32model.Note that this was NOT the case for
convert.py, because it rounds to nearest even and divides by the scale, while the reference implementation inggml-quants.crounds away from zero and multiplies by the inverse of the scale.Summary of changes
self.gguf_writer.add_file_type(self.ftype)forStableLMModel,InternLM2Model,PlamoModel,QwenModel,BaichuanModel, andXverseModel.gguf-py/gguf/quants.pyto put theQ8_0implementation there and also movebf16conversion in there too.bf16conversion, from40-60 MB/son my machine to104 MB/sGGUFWritersupport arbitrary quants withnp.uint8dtyperaw_dtypeTODO:
Model.extra_f16_tensorstoModel.extra_quantized_tensorsQ8_0inconvert.pyto round in the same way as the reference implementation?Testing
To be sure this Python implementation of
Q8_0really is working in the exact same way as the reference implementation fromggml-quants.c, I'm testing conversion and quantization of a bunch of different model architectures.I recently got a big external hard drive, which makes storing the output of these tests much easier.
I'm doing pretty much this for each for every model architecture tested below
(using
outfile{ftype}templating introduced in #7158) :I'd say there is some suspense when the checksums begin to appear. Will they match?
torch.float32) https://huggingface.co/BAAI/bge-small-en-v1.5torch.bfloat16) https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0torch.float32) https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moetorch.bfloat16) https://huggingface.co/smallcloudai/Refact-1_6B-fimtorch.float16) https://huggingface.co/pansophic/rocket-3Bf16, but I'm quantizing fromf32anyway for consistency with the other tests.AhA! Looking at the diff of the
gguf-dumpof each model shows this is a metadata problem! Theftypewas missing!ftypein 2b1e5ea: CHECKSUMS MATCH!!!torch.float16)torch.float16models? Bloom is infloat16, yet it still worked for it.AHA, the problem is that the
ftypeis not put in the model!!!ftypein 2b1e5ea: CHECKSUMS MATCH!!!!torch.float32) https://huggingface.co/jondurbin/bagel-dpo-2.8b-v0.2torch.float16) https://huggingface.co/bigscience/bloom-560mtorch.bfloat16) https://huggingface.co/Qwen/Qwen-1_8B-ChatStableLMModel, theftypeis missing.ftypein 2b1e5ea: checksums match!torch.bfloat16) https://huggingface.co/internlm/internlm2-chat-1_8bftypewas missing, but I noticed it before first converting. After adding theftypein 2b1e5ea: checksums match!