Hope that you enjoy the guide, below is also some cost/speed comparisons for running the models with vLLM:<p>- 7B, 1x A100, 25GB VRAM, 49 tok/s, $0.0113 /1k tok
- 13B, 1x A100, 37GB VRAM, 32 tok/s, $0.0174 /1k tok
- 70B, 2x A100, 150GB VRAM, 13 tok/s, $0.128 /1k tok