One complaint we heard over and over was that no one likes to write tests when evaluating LLMs.<p>We automate writing these tests by having ChatGPT generate sample queries and answers based on a given text. These test cases can then be run through the testing framework and can be run similarly to PyTest.