I’m currently trying to solve a problem we're having, GPUs are expensive! I've been thinking of ways to cut our inference costs at my company and wanted to hear your perspective.<p>Did anyone implement something similar?
How did it go?
How much time did it save? What was the cost improvement?
I recently found this tool in the AWS samples: https://github.com/aws-samples/scalable-hw-agnostic-inference<p>I'm wondering if anyone used/tried it or other approaches?
i've used GCP GPU Cloud Run to build an on-demand/auto scaling livestream/HLS video translation --> subtitle generation pipeline with great success.<p>[edit: sorry, not inference, but a great cost-saver]