TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What's your serverless stack for AI/LLM apps in production?

3 pointsby fazlerocks4 months ago
I&#x27;ve been building AI applications using Next.js, GPT, and Langchain. As I&#x27;m approaching production scale, I&#x27;m curious how others are handling deployment infrastructure.<p>Current stack: - Next.js on Vercel - Serverless functions for AI&#x2F;LLM endpoints - Pinecone for vector storage<p>Questions for those running AI in production:<p>1. What&#x27;s your serverless infrastructure choice? (Vercel&#x2F;Cloud Run&#x2F;Lambda)<p>2. How are you handling state management for long-running agent tasks?<p>3. What&#x27;s your approach to cost optimization with LLM API calls?<p>4. Are you self-hosting any components?<p>5. How are you handling vector store scaling?<p>Particularly interested in hearing from teams who&#x27;ve scaled beyond prototype stage. Have you hit any unexpected limitations with serverless for AI workloads?

1 comment

lunarcave4 months ago
I have a hosted code-first agent builder platform in production, so I respond these question a lot from our customers.<p>1. Probably the best is fly.io IMHO. It has a nice balance between running ephemeral containers that can support long running tasks, and quickly booting up to respond to a tool call. [1]<p>2. If your task is truly long running, (I&#x27;m thinking several minutes), probably wise to put trigger [2] or temporal [3] under it.<p>3. A mix of prompt caching, context shedding, progressive context enrichment [4].<p>4. I&#x27;m building a platform that can be self-hosted to do a few of the above, so I can&#x27;t speak to this. But most of my customers do not.<p>5. To start with, a simple postgres table and pgvector is all you need. But I&#x27;ve recently been delighted with the DX of Upstash vector [5]. They handle the embeddings for you and give you a text-in, text-out experience. If you want more control, and savings on a higher scale, have heard good things about marqo.ai [6].<p>Happy to talk more about this at length. (E-mail in the profile)<p>[1] <a href="https:&#x2F;&#x2F;fly.io&#x2F;docs&#x2F;reference&#x2F;architecture&#x2F;">https:&#x2F;&#x2F;fly.io&#x2F;docs&#x2F;reference&#x2F;architecture&#x2F;</a><p>[2] trigger.dev<p>[3] temporal.io<p>[4] <a href="https:&#x2F;&#x2F;www.inferable.ai&#x2F;blog&#x2F;posts&#x2F;llm-progressive-context-encrichment" rel="nofollow">https:&#x2F;&#x2F;www.inferable.ai&#x2F;blog&#x2F;posts&#x2F;llm-progressive-context-...</a><p>[5] <a href="https:&#x2F;&#x2F;upstash.com&#x2F;docs&#x2F;vector&#x2F;overall&#x2F;getstarted" rel="nofollow">https:&#x2F;&#x2F;upstash.com&#x2F;docs&#x2F;vector&#x2F;overall&#x2F;getstarted</a><p>[6] <a href="https:&#x2F;&#x2F;www.marqo.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.marqo.ai&#x2F;</a>
评论 #42660266 未加载