Can't share the company name, but it's a relatively small startup with a few thousand monthly active users.<p>Here's what our current stack looks like:<p>- LLM Access: Custom proxy layer to an OpenAI-compatible gateway for chat completion, transcriptions, image generation, embeddings (we started with LangChain but were fighting with it more than it helped us). Now we use the OpenAI SDK + a custom API gateway that handles cost tracking, metered billing, API with model fallbacks, aliases, and feature flags (e.g., supports_structured_output, supports_reasoning, modality: text, image, audio, etc.).<p>- Workflow Orchestration: We started with a very simple task executor + background job queue to handle retries and durable executions. Now it has become a real limitation, and we are looking for a complete replacement, considering Temporal, trigger.dev, and Conductor. I don't see an obvious winner here, everything looks like a really sophisticated piece of tech that will have a learning curve and require some rethinking of our decisions and serious refactoring to support new flows.<p>- Observability: OTEL + <a href="https://signoz.io">https://signoz.io</a>, we don't track LLM outputs for security reasons (consumer product, a lot of private stuff).<p>- Cost Tracking: We embedded cost tracking into our API proxy layer. For each request, we estimate usage and push it into ClickHouse (via TinyBird) to provide analytics of product usage and to our metering/billing provider (Stripe meters, but evaluating Orb to replace Stripe).<p>- Agent Memory / Chat History / Persistence: Postgres for everything + S3 for files and images. Frankly, it looks like a chat app schema (chat threads, messages, attachments). Relatively simple and straightforward.<p>- Billing: This was a real pain - a hybrid subscription with a monthly fee + pay-as-you-go for overages. Stripe offers limited support for subscriptions with usage billing and does not support automatic top-ups. Implementing a free plan + paid plan with metered usage required a lot of code and even more tests. I thought billing providers had already solved this?<p>- RAG: Custom document ingestion service based on LlamaIndex (document extraction, indexing, querying), managed Qdrant for vector search. Pros - works well, quite simple, scales well. Cons - requires an API boundary layer and more work upfront but was quite low-maintenance in the long run.<p>- Integrations (Tools, MCPs): LLM tools wrappers + user credentials management (API keys, OAuth) to connect users' CRM, Notion, whatever. That's a big pain; we have accumulated quite some debt here because we can't decide on a layer to manage this. There is no open-source "go-to" solution.