Hi HN,<p>This project has grown a lot recently and figure it's worth another submission. I use this tool for several LLM-based use cases that have over 100k DAU. It works pretty simply:<p>1) Create a list of test cases<p>2) Set up assertions for metrics/guardrails you care about, such as outputting only JSON or not saying "As an AI language model"<p>3) Run tests as you make changes. Integrate with CI if desired.<p>This makes LLM model and prompt selection easier because it reduces the process to something we're all familiar with: developing against test cases. You can iterate with confidence and avoid regressions.<p>There are a bunch of startups popping up in this space, but I think it's important to have something that is local (private), on the command line (easy to use in the development loop), and open-source.