Got tired of manually parsing all my chatGPT logs. So I built a real-time hallucination detector for my logs in production. Now instead of manually trying to figure out which, of my hundreds of logs, were bad responses (invented new facts, refused to answer, etc.) I can just get chatGPT to flag them for me.<p>How does it work?
Bettershot aims to detect 3 things:<p><pre><code> Was the question relevant to the data (i.e. filter out questions like "how's the weather?" if the chatbot's purpose was to answer questions on the history of jeans)
</code></pre>
If relevant then,<p><pre><code> Did the model response invent new information when answering the question (i.e. information that was not in the prompt passed in)
Did the model refuse to answer the question (e.g. "Sorry as an AI language model...")
</code></pre>
We do this by using chatgpt (currently gpt-3.5-turbo-16k) to evaluate each prompt-response pair 5 times, sampling the most frequent result (e.g. if it evaluated it to 'True' 4 times out of 5, then it's probably a good response).<p>Check out the repo to know more <a href="https://github.com/ClerkieAI/bettershot">https://github.com/ClerkieAI/bettershot</a>