Hi,
Thanks for open-sourcing the code.
I was wondering how it compares in terms of throughput with existing inference frameworks like https://github.com/huggingface/text-generation-inference and https://github.com/vllm-project/vllm , do we have any benchmarks?
Hi,
Thanks for open-sourcing the code.
I was wondering how it compares in terms of throughput with existing inference frameworks like https://github.com/huggingface/text-generation-inference and https://github.com/vllm-project/vllm , do we have any benchmarks?