Who provides cheapest GPU inferencing and hosting of fine-tuned models (7B size)? I already have the finetuned model ready, just looking for a cheap place to host and run inferencing.<p>I've looked at Replicate and Together.ai, they both provide really the best tools in this space, but hosting is expensive. Together costs about 1.4/hr to host a 7B model. Replicate is more expensive.<p>Ideally, I wouldn't be charged for idle time and only active time (replicate does this already, but your finetuned model needs to be based off of a limited set of base models)<p>Any recommendations?
Following - we host our own models for a variety of architectures in vocal synthesis, and have tried using Replicate and Mystic as well.<p>Roll your own k8s? Predibase?