Given that there's plenty of options for every point in the README.md, one thing missing is how to guarantee that your stack does not miss requests from paying customers, metering usage & avoid ballooning server costs. I see a lot of YC startups trying to solve this Lago, Paigo etc.<p>I'm trying to evaluate best serverless solutions for inference without compromising on client usage & reducing idle time on GPU boxes. So far its down to base10, HF, Banana, I'll end up pooling them all & then sending requests between them. For dedicated training boxes Lambda, Modal, Oblivus, Runpod are the contenders.