TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

VLLM with Mistral 7B guide and benchmarks (1.8k+ tokens/s)

3 pointsby paulcjhover 1 year ago

1 comment

paulcjhover 1 year ago
Managed to get 1.8k tokens per second with a batch of 60 when running vLLM with Mistral 7B on an A100 40GB in bfloat16 mode. Pretty damn fast!<p>vllm==0.2.0 got released an hour or so ago, so it&#x27;s pretty fresh. Let me know fi you&#x27;d like anything else in there.
评论 #37698175 未加载