Llama 2 chat with vLLM and tensor parallel guide

1 pointsby paulcjhover 1 year ago

1 comment

paulcjhover 1 year ago

Hope that you enjoy the guide, below is also some cost/speed comparisons for running the models with vLLM:<p>- 7B, 1x A100, 25GB VRAM, 49 tok/s, $0.0113 /1k tok - 13B, 1x A100, 37GB VRAM, 32 tok/s, $0.0174 /1k tok - 70B, 2x A100, 150GB VRAM, 13 tok/s, $0.128 /1k tok