1 pointsby agcat9 months ago

1 comment

agcat9 months ago

Hey community: In this deep dive, analyzed LLM speed benchmarks, comparing models like Qwen2-7B-Instruct, Gemma-2-9B-it, Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, Phi-3-medium-128k-instruct across Libraries like vLLM, TGI, TensorRT-LLM, Tritonvllm, Deepspeed-mii, ctranslate. All independent on A100 GPUs on Azure, no sponsorship.

Qwen2-7B-Instruct with TensorRT-LLM: consistently high tokens/SEC

1 comment

Qwen2-7B-Instruct with TensorRT-LLM: consistently high tokens/SEC

1 comment