TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Llama 3 8B Instruct quantized with GPTQ to fit in 10gb vRAM

2 pointsby jlaneveabout 1 year ago

1 comment

jlaneveabout 1 year ago
Hey HN!<p>We&#x27;ve had lots of success using quantized LLMs for inference speed and cost because you can fit them on smaller GPUs (Nvidia T4, Nvidia K80, RTX 4070, etc). There&#x27;s no need for everyone to quantize - we quantized Llama 3 8b Instruct to 8 bits using GPTQ and figured we&#x27;d share it with the community. Excited to see what everyone does with it!
评论 #40086599 未加载