TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Llama 3.1 8B CPU Inference in a Browser via WebAssembly

4 pointsby om85 months ago

1 comment

om85 months ago
This is a demo of what's possible to run on edge devices using SOTA quantization. Other similar projects that try to run 8B models in browser are either using webgpu or 2 bit quantization that breaks the model. I implemented inference of AQLM quantized representation, making model that has 2 bit quantization and does not blow up.
评论 #42510839 未加载