OpenAI's new evals library (<a href="https://github.com/zeno-ml/zeno-evals">https://github.com/zeno-ml/zeno-evals</a>) makes it easy to create and run evaluations to find the limitations of GPT-4.<p>The CLI library just prints out an overall metric after the process has finished, and doesn't give you any insights into what types of outputs and failures GPT-4 produced.<p>This simple one-line command lets you pass in the log outputs from Open AI evals (<a href="https://github.com/openai/evals/">https://github.com/openai/evals/</a>) and interactively explore the actual generated data!<p>Try it out with `pip install zeno-evals`