TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Saving Money Deploying Open Source AI at Scale with Kubernetes

3 pointsby jpmcbabout 1 year ago
Hi HN: I wanted to share this piece I wrote on how I saved our small startup 10s of thousands of dollars every month by lifting and shifting or AI data-pipelines from using OpenAI&#x27;s API to a vLLM deployment ontop of Kubernetes running on a few nodes with T4 GPUs.<p>I haven&#x27;t seen alot on the &quot;AI-DevOps&quot; or infrastructure side of actually running an at-scale AI service. Many of the AI inference engines that offer an OpenAI compatible API (like vLLM, llama.cpp, etc.) make it very approachable and cost effective. Today, this vLLM AI service handles all of our batching micro-services which scrape for content to generate text on over 40,000+ repos on GitHub.<p>I&#x27;m happy to answer any &#x2F; all questions you might have!

1 comment

brianllamarabout 1 year ago
This was a good read. Seeing the story of AI infrastructure is a breath of fresh air. Too much witchcraft and hand waving in the AI space at the moment.
评论 #40356419 未加载