Over the past two years, I’ve developed a toolkit for helping dozens of clients improve their LLM-powered products, which I'm now open-sourcing.
First up: a library to bring product analytics to conversational AI.<p>One of the biggest challenges I see clients face is understanding how their assistants are performing in production. Evals are great for catching regressions, but they can’t surface the blind spots in your AI’s behavior.<p>This gets even more challenging for conversational AI products that don’t have a single “correct” answer. Different users cohorts want different experiences. That makes measurement tricky.<p>Coming from a product analytics background, my default instinct is always: “instrument the product!” However, tracking generic events like user_sent_message doesn’t tell you much.<p>What you really want are insights like:<p><pre><code> - How frequently do users request to speak with a human when interacting with a customer support agent?
- Which user journeys trigger self-reflection during a session with an AI therapist?
- What percentage of the time does an AI tutor's explanation leave the student confused?
</code></pre>
This new library enables these types of insights through the following workflow:<p><pre><code> - Analyzes your conversation transcripts
- Auto-generates a rich event schema
- Tags each message with relevant events and event properties
- Sends the events to your analytics tool (currently supports Amplitude and PostHog)
</code></pre>
Any thoughts or feedback would be greatly appreciated!