TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Best practice to use Llama 3 8B on production server

3 pointsby andrew_zhongabout 1 year ago
The new Llama 3 8b in on par with 22b models, better but could be 10x cheaper than GPT3.5<p>AI builders, if you are using Llama 3 in backend, where do you host it or what API do you use? (For production usecases with good speed and rate limits close to ChatGPT or Claude)<p>-AWS sagemaker<p>-Self host on cloud GPUs<p>-Replicate API (just found them, 0.05&#x2F;1m token, legit?)<p>-AWS bedrock (seems pricy)<p>-Others - pls comment<p>Any feedback is welcome!

1 comment

whereismyaccabout 1 year ago
$0.05 is per million token input, it&#x27;s $0.25 for output tokens