科技回声

Hi HN!This is Arjun and Saikat, and like other product engineers, we've been excited to build with LLMs. Getting powerful models available as off-the-shelf HTTP endpoints is a huge leap forward to integrate and ship ML to end-users.While building on top of LLMs, we've also experienced the pain of non-deterministic behavior – especially for applications that require smaller models. Iterating through model configuration while ensuring no regressions across hundreds of scenarios is a tricky balance.To make this easier, we built Empirical. Here’s a demo video: <a href="https://www.youtube.com/watch?v=p8gSGphcOSU" rel="nofollow">https://www.youtube.com/watch?v=p8gSGphcOSU</a>We've focused on:- Fast iteration cycles and interactivity when you need to change the prompt or add a new sample. We wanted to build something that feels like “hot reload” for LLM development- A capable UI that combines objective and subjective evaluation, since eye-balling outputs makes it easier to build intuition around model behavior- Ability to customize which model to test, or how to score it — with JavaScript (or Python, if you really must)- Embedded analytics for evaluation results, powered by DuckDB under the hood (more coming up on this!)You can try Empirical today – with a one line CLI command – locally or on CI/CD. And oh, Empirical is 100% open source – so file an issue and we’d be happy to make it work for your use-case$ npx empiricalrunGitHub: <a href="https://github.com/empirical-run/empirical">https://github.com/empirical-run/empirical</a>Docs: <a href="https://docs.empirical.run/" rel="nofollow">https://docs.empirical.run/</a>

Show HN: Empirical – test framework for JavaScript developers building with LLMs

暂无评论

Show HN: Empirical – test framework for JavaScript developers building with LLMs

暂无评论