TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Run Llama2-70B in Web Browser with WebGPU Acceleration

9 pointsby ruihanglalmost 2 years ago

2 comments

brucethemoose2almost 2 years ago
Apache TVM is super cool in theory. Its fast thanks to the autotuning, and it supports tons of backends like Vulkan, Metal, WASM + WebGPU, fpgas, weird mobile accelerators and such. It supports quantization, dynamism and other cool features.<p>But... It isn&#x27;t used much outside MLC? And MLC&#x27;s implementations are basically demos.<p>I dunno why. AI inference communities are <i>dying</i> for fast multiplatform backends without the fuss of PyTorch.
评论 #36852431 未加载
评论 #36852111 未加载
ruihanglalmost 2 years ago
Purely running in web browser. Generating 6.2 tok&#x2F;s on Apple M2 Ultra with 64GB of memory.