This is more like a thought experiment and I am hoping to learn the other developments in the LLM inference space that are not strictly GPUs.<p>Conditions:<p>1. You want a solution for LLM inference and LLM inference only. You don't care about any other general or special purpose computing<p>2. The solution can use any kind of hardware you want<p>3. Your only goal is to maximize the (inference speed) X (model size) for 70b+ models<p>4. You're allowed to build this with tech mostly likely available by end of 2025.<p>How do you do it?
You wait until someone posts an answer here, <a href="https://www.reddit.com/r/LLMDevs/comments/1if0q87/you_have_roughly_50000_usd_you_have_to_build_an/" rel="nofollow">https://www.reddit.com/r/LLMDevs/comments/1if0q87/you_have_r...</a><p><a href="https://www.phind.com/search/cm6lxx6hw00002e6gioj41wa5">https://www.phind.com/search/cm6lxx6hw00002e6gioj41wa5</a>