Apache TVM is super cool in theory. Its fast thanks to the autotuning, and it supports tons of backends like Vulkan, Metal, WASM + WebGPU, fpgas, weird mobile accelerators and such. It supports quantization, dynamism and other cool features.<p>But... It isn't used much outside MLC? And MLC's implementations are basically demos.<p>I dunno why. AI inference communities are <i>dying</i> for fast multiplatform backends without the fuss of PyTorch.