TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Launch HN: Encord (YC W21) – Unit testing for computer vision models

91 pointsby ulrikhansen54over 1 year ago
Eric and Ulrik from Encord here. We build developer tooling to help computer vision (CV) teams enhance their model-building capabilities. Today we are proud to launch our model and data unit testing toolkit, Encord Active (<a href="https:&#x2F;&#x2F;encord.com&#x2F;active&#x2F;">https:&#x2F;&#x2F;encord.com&#x2F;active&#x2F;</a>) [1].<p>Imagine you&#x27;re building a device that needs to see and understand the world around it – like a self-driving car or a robot that sorts recycling. To do this, you need a vision model that processes the real world as a sequence of frames and makes decisions based on what it sees.<p>Bringing such models to production is hard. You can’t just train it once and then it works—you need to constantly test and improve it to make sure it understands the world correctly. For example, you don&#x27;t want a self-driving car to confuse a stop sign with a billboard, or classify a pedestrian as an unknown object [2].<p>This is where Encord Active comes in. It&#x27;s a toolkit that helps developers “unit test”, understand, and debug their vision models. We put “unit test” in quotes because while it isn’t classic software unit testing, the idea is similar: to see which <i>parts</i> of your model are working well and which aren&#x27;t. Here’s a short video that shows the tool: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;CD7_lw0PZNY?si=MngLE7PwH3s2_VTK" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;CD7_lw0PZNY?si=MngLE7PwH3s2_VTK</a> [3]<p>For instance, if you&#x27;re working on a self-driving car, Encord Active can help you figure out why the car is confusing stop signs with billboards. It lets you dive into the data the model has seen and understand what&#x27;s going wrong. Maybe the model hasn&#x27;t seen enough stop signs at night, or maybe it gets confused when the sign is partially blocked by a tree.<p>Having extensive unit test coverage won’t guarantee that your software (or vision model) is correct, but it helps a lot, and is awesome at catching regressions (i.e. things that work at one point and then stop working later). For example, consider retraining your model with a 25% larger dataset, including examples from a new US state characterized by distinctly different weather conditions (e.g., California vs. Vermont). Intuitively, one might think ‘the more signs, the merrier.’ However, adding new signs can confuse the model, perhaps it’s suddenly biased to rely mostly on surroundings because signs are covered in snow. This can cause the model to regress and fall below your desired performance threshold (e.g., 85% accuracy) for existing test data.<p>These issues are not easily solvable by making changes to the model architecture or hyperparameter tuning (e.g., adjusting learning rates), especially as the types of problems you are trying to solve by the model get more complex. Rather, they are solved by training or fine-tuning the model on more of &quot;the right&quot; data.<p>Contrary to purely embeddings-based data exploration and model analytics&#x2F;evaluation tools that help folks discover surface-level problems without offering suggestions for solving them, Encord Active will give concrete recommendations and actionable steps to solve the identified model and data errors by automatically analyzing your model performance. Specifically, the system detects the weakest and strongest aspects of the data distribution, serving as a guide for where to focus for improving subsequent iterations of your model training. The analysis encompasses various factors: the ‘qualities’ of the images (size, brightness, blurriness), the geometric characteristics of objects and model predictions (aspect ratio, outliers), as well as metadata and class distribution. It correlates these factors with chosen model performance metrics, surfacing low performing subsets for attention, providing you with actionable next steps. One of our early customers, for example, reduced their dataset size by 35% but increased their model’s accuracy (in this case, the mAP score) by 20% [4], which is a huge improvement in this domain). This is counterintuitive to most people as the thinking is generally “more data = better models”.<p>If any of these experiences resonate with you, we are eager for you to try out the product and hear your opinions and feedback. We are available to answer any questions you may have!<p>[1] <a href="https:&#x2F;&#x2F;encord.com&#x2F;active&#x2F;">https:&#x2F;&#x2F;encord.com&#x2F;active&#x2F;</a><p>[2] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Death_of_Elaine_Herzberg" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Death_of_Elaine_Herzberg</a><p>[3] <a href="https:&#x2F;&#x2F;youtu.be&#x2F;CD7_lw0PZNY?si=MngLE7PwH3s2_VTK" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;CD7_lw0PZNY?si=MngLE7PwH3s2_VTK</a><p>[4] <a href="https:&#x2F;&#x2F;encord.com&#x2F;customers&#x2F;automotus-customer-story&#x2F;">https:&#x2F;&#x2F;encord.com&#x2F;customers&#x2F;automotus-customer-story&#x2F;</a>

8 comments

btownover 1 year ago
This is really cool. The annotation-to-testing-to-annotation-etc. feedback loop makes a ton of sense, and I&#x27;d encourage others who may be confused on this post to look at the Automotus case study <a href="https:&#x2F;&#x2F;encord.com&#x2F;customers&#x2F;automotus-customer-story&#x2F;">https:&#x2F;&#x2F;encord.com&#x2F;customers&#x2F;automotus-customer-story&#x2F;</a> which has a great diagram.<p>For those of us with similar needs for annotation and &quot;unit testing,&quot; but on text corpuses, I&#x27;m aware of <a href="https:&#x2F;&#x2F;prodi.gy&#x2F;" rel="nofollow">https:&#x2F;&#x2F;prodi.gy&#x2F;</a> for the annotation side, but my understanding is the relationship between model outputs and annotation steering is out of scope for that project - do you know of tooling (open source or paid) that integrates an &quot;Active&quot; component similarly to what you do? Or is text a direction you want to go as well?<p>[I&#x27;m a fan of Vellum (YC W23) for evaluation and testing of multiple prompts <a href="https:&#x2F;&#x2F;www.vellum.ai&#x2F;blog&#x2F;introducing-vellum-test-suites">https:&#x2F;&#x2F;www.vellum.ai&#x2F;blog&#x2F;introducing-vellum-test-suites</a> - but I don&#x27;t believe they feed annotation workflows in an automated and full-circle way.]
评论 #39210657 未加载
adrianhover 1 year ago
I had a look at your pricing page — <a href="https:&#x2F;&#x2F;encord.com&#x2F;pricing&#x2F;">https:&#x2F;&#x2F;encord.com&#x2F;pricing&#x2F;</a> — and was sad to see no pricing is actually communicated there.<p>What could I expect to pay for my company to use the Team plan?
评论 #39207275 未加载
dontwearitoutover 1 year ago
Does this include tools to evaluate for performance on out-of-distribution and adversarial images?
评论 #39206382 未加载
emil_sorensenover 1 year ago
This looks promising - but how is this different from tools like Aquarium Learning or Voxel51?
评论 #39214188 未加载
annahrkhanover 1 year ago
This is amazing!!!
asong37over 1 year ago
Congratulations Eric and Ulrik!
maciejgrykaover 1 year ago
Congrats on the launch!<p>I haven’t had a chance to try out Active yet, but having had a project with Erik and the team a while back, they’re a great team to work with :)
评论 #39208721 未加载
kgiddens1over 1 year ago
Congrats on the launch Eric!
评论 #39208712 未加载