TechEcho

Hi HN: I wanted to share this piece I wrote on how I saved our small startup 10s of thousands of dollars every month by lifting and shifting or AI data-pipelines from using OpenAI's API to a vLLM deployment ontop of Kubernetes running on a few nodes with T4 GPUs.<p>I haven't seen alot on the "AI-DevOps" or infrastructure side of actually running an at-scale AI service. Many of the AI inference engines that offer an OpenAI compatible API (like vLLM, llama.cpp, etc.) make it very approachable and cost effective. Today, this vLLM AI service handles all of our batching micro-services which scrape for content to generate text on over 40,000+ repos on GitHub.<p>I'm happy to answer any / all questions you might have!

Show HN: Saving Money Deploying Open Source AI at Scale with Kubernetes

1 comment

Show HN: Saving Money Deploying Open Source AI at Scale with Kubernetes

1 comment