Hi HN: I wanted to share this piece I wrote on how I saved our small startup 10s of thousands of dollars every month by lifting and shifting or AI data-pipelines from using OpenAI's API to a vLLM deployment ontop of Kubernetes running on a few nodes with T4 GPUs.<p>I haven't seen alot on the "AI-DevOps" or infrastructure side of actually running an at-scale AI service. Many of the AI inference engines that offer an OpenAI compatible API (like vLLM, llama.cpp, etc.) make it very approachable and cost effective. Today, this vLLM AI service handles all of our batching micro-services which scrape for content to generate text on over 40,000+ repos on GitHub.<p>I'm happy to answer any / all questions you might have!
This was a good read. Seeing the story of AI infrastructure is a breath of fresh air. Too much witchcraft and hand waving in the AI space at the moment.