Hi HN! One of the authors here.<p>We found pervasive errors in the test sets of 10 of the most commonly used benchmark ML datasets, so we made labelerrors.com where anyone can examine the data labels. We think it’s neat to browse through the errors to get an intuitive sense of what kinds of things go wrong (e.g. completely mixed-up labels, like a frog being labeled “cat”, or situations where an image contains multiple things, like a bucket full of baseballs being labeled “bucket”), so that’s why we built this errors gallery. To our surprise, there are lots of errors, even in gold standard datasets like ImageNet and MNIST.<p>For those who want to dig into the details, we have a blog post here: <a href="https://l7.curtisnorthcutt.com/label-errors" rel="nofollow">https://l7.curtisnorthcutt.com/label-errors</a>, where we talk more about the study and the implications<p>Happy to answer any questions here!
Also of interest might be cleanlab (<a href="https://github.com/cgnorthcutt/cleanlab" rel="nofollow">https://github.com/cgnorthcutt/cleanlab</a>), the open-source software we used for initially identifying potentially mislabeled data.