9 pointsby ruihanglalmost 2 years ago

2 comments

Apache TVM is super cool in theory. Its fast thanks to the autotuning, and it supports tons of backends like Vulkan, Metal, WASM + WebGPU, fpgas, weird mobile accelerators and such. It supports quantization, dynamism and other cool features.<p>But... It isn't used much outside MLC? And MLC's implementations are basically demos.<p>I dunno why. AI inference communities are <i>dying</i> for fast multiplatform backends without the fuss of PyTorch.

评论 #36852431 未加载

评论 #36852111 未加载

ruihanglalmost 2 years ago

Purely running in web browser. Generating 6.2 tok/s on Apple M2 Ultra with 64GB of memory.

Run Llama2-70B in Web Browser with WebGPU Acceleration

2 comments

Run Llama2-70B in Web Browser with WebGPU Acceleration

2 comments