TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: AI infrastructure in production – what is your tech stack?

5 pointsby k92942 months ago
I&#x27;m working at an AI startup, and our infrastructure has grown increasingly complex. What started as a few simple model calls has evolved into systems that limit us and slow us down.<p>Our current setup requires touching multiple different components whenever we add a new feature. We spend more time maintaining infrastructure than building actual products.<p>The real pain comes when stakeholders get excited about new capabilities. Last week, they saw Google Gemini&#x27;s new native image generation and editing features and wanted them integrated ASAP. This requires:<p>- Adding new API integration code to handle LLMs returning images<p>- Updating a cost calculator to support this case<p>- Updating our storage system to automatically handle images in LLM responses and store them in S3<p>- Probably something will blow up that I haven&#x27;t thought about yet<p>I&#x27;m spending about 70% of development time (if not more) on infrastructure related tasks rather than product features, and this ratio is getting worse as we add more capabilities.<p>- How is your company handling this complexity? Any recommendations?<p>- Any open-source tools&#x2F;frameworks that have significantly reduced your AI infrastructure complexity that more teams should know about?<p>- Which tools or approaches did you try that became maintenance nightmares and you&#x27;d warn others to avoid?<p>- Any AI-related third-party services you would recommend?<p>p.s. I will share our stack in the comments. Hit HN&#x27;s max character limit.

1 comment

k92942 months ago
Can&#x27;t share the company name, but it&#x27;s a relatively small startup with a few thousand monthly active users.<p>Here&#x27;s what our current stack looks like:<p>- LLM Access: Custom proxy layer to an OpenAI-compatible gateway for chat completion, transcriptions, image generation, embeddings (we started with LangChain but were fighting with it more than it helped us). Now we use the OpenAI SDK + a custom API gateway that handles cost tracking, metered billing, API with model fallbacks, aliases, and feature flags (e.g., supports_structured_output, supports_reasoning, modality: text, image, audio, etc.).<p>- Workflow Orchestration: We started with a very simple task executor + background job queue to handle retries and durable executions. Now it has become a real limitation, and we are looking for a complete replacement, considering Temporal, trigger.dev, and Conductor. I don&#x27;t see an obvious winner here, everything looks like a really sophisticated piece of tech that will have a learning curve and require some rethinking of our decisions and serious refactoring to support new flows.<p>- Observability: OTEL + <a href="https:&#x2F;&#x2F;signoz.io">https:&#x2F;&#x2F;signoz.io</a>, we don&#x27;t track LLM outputs for security reasons (consumer product, a lot of private stuff).<p>- Cost Tracking: We embedded cost tracking into our API proxy layer. For each request, we estimate usage and push it into ClickHouse (via TinyBird) to provide analytics of product usage and to our metering&#x2F;billing provider (Stripe meters, but evaluating Orb to replace Stripe).<p>- Agent Memory &#x2F; Chat History &#x2F; Persistence: Postgres for everything + S3 for files and images. Frankly, it looks like a chat app schema (chat threads, messages, attachments). Relatively simple and straightforward.<p>- Billing: This was a real pain - a hybrid subscription with a monthly fee + pay-as-you-go for overages. Stripe offers limited support for subscriptions with usage billing and does not support automatic top-ups. Implementing a free plan + paid plan with metered usage required a lot of code and even more tests. I thought billing providers had already solved this?<p>- RAG: Custom document ingestion service based on LlamaIndex (document extraction, indexing, querying), managed Qdrant for vector search. Pros - works well, quite simple, scales well. Cons - requires an API boundary layer and more work upfront but was quite low-maintenance in the long run.<p>- Integrations (Tools, MCPs): LLM tools wrappers + user credentials management (API keys, OAuth) to connect users&#x27; CRM, Notion, whatever. That&#x27;s a big pain; we have accumulated quite some debt here because we can&#x27;t decide on a layer to manage this. There is no open-source &quot;go-to&quot; solution.