Evaluating the quality of the responses of AI agents used to be tricky. It required knowledge of eval criteria as well as third-party tools like promptfoo, ragas or prometheus. Now openAI makes it ridiculously easy with a new API endpoint. It can grade a completion against a reference response, assess its format and tone, and you can even promt the eval to add your own criteria.