> PowerInfer’s source code is publicly available at <a href="https://github.com/SJTU-IPADS/PowerInfer">https://github.com/SJTU-IPADS/PowerInfer</a><p>---<p>Just curious - PowerInfer seems to market itself by running very large models (40B, 70B) on something like a 4090. If I have, say, a 3060 12GB, and I want to run something like a 7B or 13B, can I expect the same speedup of around 10x? Or does this only help that much for models that wouldn't already fit in VRAM?