Managed to get 1.8k tokens per second with a batch of 60 when running vLLM with Mistral 7B on an A100 40GB in bfloat16 mode. Pretty damn fast!<p>vllm==0.2.0 got released an hour or so ago, so it's pretty fresh. Let me know fi you'd like anything else in there.