TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Pytest-evals – Simple LLM apps evaluation using pytest

13 pointsby almogbaku4 months ago

2 comments

almogbaku4 months ago
Hey HN! Creator here. I recently found myself writing evaluations for actual production LLM projects, and kept facing the same dilemma: either reinvent the wheel or use a heavyweight commercial system with tons of features I don&#x27;t need right now.<p>Then it hit me - evaluations are just (kind of) tests, so why not write them as such using pytest?<p>That&#x27;s why I created pytest-evals - a lightweight pytest plugin for building evaluations. It&#x27;s intentionally not a sophisticated system with dashboards (and not suitable as a &quot;robust&quot; solution). It&#x27;s minimalistic, focused, and definitely not trying to be a startup<p><pre><code> # Predict the LLM performance for each case @pytest.mark.eval(name=&quot;my_classifier&quot;) @pytest.mark.parametrize(&quot;case&quot;, TEST_DATA) def test_classifier(case: dict, eval_bag, classifier): # Run predictions and store results eval_bag.prediction = classifier(case[&quot;Input Text&quot;]) eval_bag.expected = case[&quot;Expected Classification&quot;] eval_bag.accuracy = eval_bag.prediction == eval_bag.expected # Now let&#x27;s see how our app performing across all cases... @pytest.mark.eval_analysis(name=&quot;my_classifier&quot;) def test_analysis(eval_results): accuracy = sum([result.accuracy for result in eval_results]) &#x2F; len(eval_results) print(f&quot;Accuracy: {accuracy:.2%}&quot;) assert accuracy &gt;= 0.7 # Ensure our performance is not degrading </code></pre> Would love to hear your thoughts and if you find this useful, a GitHub star would be appreciated
westurner4 months ago
The pytest-evals README mentions that it&#x27;s built on pytest-harvest, which works with pytest-xdist and pytest-asyncio.<p>pytest-harvest: <a href="https:&#x2F;&#x2F;smarie.github.io&#x2F;python-pytest-harvest&#x2F;" rel="nofollow">https:&#x2F;&#x2F;smarie.github.io&#x2F;python-pytest-harvest&#x2F;</a> :<p>&gt; <i>Store data created during your pytest tests execution, and retrieve it at the end of the session, e.g. for applicative benchmarking purposes</i>
评论 #42785116 未加载