Hi HN!<p>I lead product at Vectara. Yesterday, we launched Vectara Chat, which is a set of APIs to help organizations combine chatbots with RAG (Retrieval Augmented Generation). One of the most common things we've heard from the community of users trying to build chatbots is that they hallucinate (and hence the release of our hallucination evaluation model: <a href="https://huggingface.co/vectara/hallucination_evaluation_model" rel="nofollow">https://huggingface.co/vectara/hallucination_evaluation_mode...</a>). One of the things we've realized when working with customers on RAG systems (especially multi-turn) is that a lot of overemphasis of effort into "which model," "which vector DB," and so on, but then miss incredibly important things like "how do you make sure things like keywords -- non semantic-based bits of context -- influence results" and "how do I prompt engineer" and "how do I rewrite queries appropriately to include history?"<p>This is our attempt to tackle that: you basically just give the API a conversation ID and a turn ID and Vectara will store the history and rewrite the queries using modern LLM techniques to provide the end user with the answer to their question.<p>We've just launched this feature: we started off as a LLM-based semantic search company, and then launched a RAG solution in May 2023, and we think this multi-turn Chat capability is the next stage of our offering. We'd love to hear feedback from the community!<p>A longer form blog post is at <a href="https://vectara.com/blog/vectara-chat-revolutionizing-chat-development-for-the-modern-business/" rel="nofollow">https://vectara.com/blog/vectara-chat-revolutionizing-chat-d...</a>