Google's Gemini 2.0 solved the RAG problem for Conversational AI. Put your Knowledge Base (KB) in Gemini’s system prompt and have your agent make a tool call to Gemini.<p>Accuracy: You get the right answer EVERY TIME.<p>Latency: Response time is about 900 ms.<p>Cost: 300 queries per day on a 50-page KB costs $26 per month ($7 with prompt caching), on par with RAG-as-a-service providers.<p>RAG is one of the last mile problems for real-time conversational AI. It’s very difficult to get production-worthy recall from a RAG pipeline. Model-Assisted Generation (MAG) with Gemini 2.0 Flash Lite just works. Period.<p>The blogpost has a link to an open source demo, which we built using the open source Pipecat Voice AI platform.<p>We don't have any skin in the game here. We're not making money off this.<p>Open source, not a money-making project, talks about one of the newly released toys in tech...it seemed like a good post for Hacker News.<p>Tom
<a href="https://x.com/tom_shapland/status/1889041960293560540" rel="nofollow">https://x.com/tom_shapland/status/1889041960293560540</a>