I've been building AI applications using Next.js, GPT, and Langchain. As I'm approaching production scale, I'm curious how others are handling deployment infrastructure.<p>Current stack:
- Next.js on Vercel
- Serverless functions for AI/LLM endpoints
- Pinecone for vector storage<p>Questions for those running AI in production:<p>1. What's your serverless infrastructure choice? (Vercel/Cloud Run/Lambda)<p>2. How are you handling state management for long-running agent tasks?<p>3. What's your approach to cost optimization with LLM API calls?<p>4. Are you self-hosting any components?<p>5. How are you handling vector store scaling?<p>Particularly interested in hearing from teams who've scaled beyond prototype stage. Have you hit any unexpected limitations with serverless for AI workloads?
I have a hosted code-first agent builder platform in production, so I respond these question a lot from our customers.<p>1. Probably the best is fly.io IMHO. It has a nice balance between running ephemeral containers that can support long running tasks, and quickly booting up to respond to a tool call. [1]<p>2. If your task is truly long running, (I'm thinking several minutes), probably wise to put trigger [2] or temporal [3] under it.<p>3. A mix of prompt caching, context shedding, progressive context enrichment [4].<p>4. I'm building a platform that can be self-hosted to do a few of the above, so I can't speak to this. But most of my customers do not.<p>5. To start with, a simple postgres table and pgvector is all you need. But I've recently been delighted with the DX of Upstash vector [5]. They handle the embeddings for you and give you a text-in, text-out experience. If you want more control, and savings on a higher scale, have heard good things about marqo.ai [6].<p>Happy to talk more about this at length. (E-mail in the profile)<p>[1] <a href="https://fly.io/docs/reference/architecture/">https://fly.io/docs/reference/architecture/</a><p>[2] trigger.dev<p>[3] temporal.io<p>[4] <a href="https://www.inferable.ai/blog/posts/llm-progressive-context-encrichment" rel="nofollow">https://www.inferable.ai/blog/posts/llm-progressive-context-...</a><p>[5] <a href="https://upstash.com/docs/vector/overall/getstarted" rel="nofollow">https://upstash.com/docs/vector/overall/getstarted</a><p>[6] <a href="https://www.marqo.ai/" rel="nofollow">https://www.marqo.ai/</a>