Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models)#7461
Conversation
|
Tested with https://huggingface.co/EleutherAI/pythia-1.4b/tree/main Seems to work. PPL on wiki.test is ./perplexity -m models/pythia-1b/ggml-model-f16.gguf -f build/wikitext-2-raw/wiki.test.rawI guess it's normal for 1.4B model that is 1 year old. Thanks for implementing this |
|
It seems that the perplexity is a little higher compared to the HF transformers implementation because there are differences in tokenization output between llama.cpp and GPTNeoXTokenizerFast. |
|
The tokenization differences on diff ./build/wikitext-2-raw/wiki.test.raw.tok ./build/wikitext-2-raw/wiki.test.raw.tokcpp
245413,245414c245413,245414
< 50276
< 6285
---
> 209
> 20589
245440,245441c245440,245441
< 50276
< 6285
---
> 209
> 20589
246660,246661c246660,246661
< 50276
< 6285
---
> 209
> 20589
246687,246688c246687,246688
< 50276
< 6285
---
> 209
> 20589Likely the perplexity computation used in the HF transformers differs from For Pythia 2.8b I get PPL |
Feel free to merge this when ready - I think it works |
…EOX - didn't notice it was already present
|
Thank you for this, @fairydreaming! I have wanted it for so long! |




This pull request adds missing pieces to support inference for GPT-NeoX-based models like the GPT-NeoX and the Pythia family. Fixes #742. It also adds model types for all Pythia model sizes.
Added use_par_res hparams field corresponds to the use_parallel_residual parameter from config.json.