3 pointsby paulcjhover 1 year ago

1 comment

paulcjhover 1 year ago

Managed to get 1.8k tokens per second with a batch of 60 when running vLLM with Mistral 7B on an A100 40GB in bfloat16 mode. Pretty damn fast!<p>vllm==0.2.0 got released an hour or so ago, so it's pretty fresh. Let me know fi you'd like anything else in there.

评论 #37698175 未加载

VLLM with Mistral 7B guide and benchmarks (1.8k+ tokens/s)

1 comment

VLLM with Mistral 7B guide and benchmarks (1.8k+ tokens/s)

1 comment