We are experimenting with ways to use ChatGPT to get better answers more reliably, remove hallucinations, etc.<p>This little library will generate multiple draft responses and then use a second model to judge the answers and pick a winner, which is then returned to the user. Google's Bard uses this same approach.<p>With this library you can apply the pattern to gpt-3.5 and gpt-4.<p>Drafts are generated in parallel and all drafts are evaluated with a single prompt.<p>This will use a lot of tokens. For example to generate 3 drafts, you are at 3x + you need to feed those drafts into another prompt + get that response, so >7x.<p>Streamlit demo: <a href="https://theoremone-gptgladiator-streamlit-ui-5ljwmm.streamlit.app/" rel="nofollow">https://theoremone-gptgladiator-streamlit-ui-5ljwmm.streamli...</a>
It could be interesting to use this approach in a product that also lets humans pick what they thought was the best answer (in the cases where they are curious about seeing all three).<p>That data could be gathered internally by that product into an RLHF data set used to train future LLMs.