Why aren't these data sets editable instead of static? Treat them like a collaborative wiki or something (OpenStreetMap being the closest fit) and allow everyone to submit improvements so that all may benefit.<p>I hope the people in this article had a way to contribute back their improvements, and did so.
The title here seems wrong. Suggested change:<p>"Cleaning algorithm finds 20% of errors in major image recognition datasets" -> "Cleaning algorithm finds errors in 20% of annotations in major image recognitions."<p>We don't know if the found errors represent 20%, 90% or 2% of the total errors in the dataset.
> We then used the error spotting tool on the Deepomatic platform to detect errors and to correct them.<p>I'm wondering if those errors are selected on how much they impact the performance?<p>Anyway, this is probably a much better way of gaining accuracy on the cheap than launching 100+ models for hyperparameter tuning.
Best I can tell, they are using the ML model to detect the errors. Isn't this a bit of an ouroboros? The model will naturally get better, because you are only correcting problems where it was right but the label was wrong.<p>It's not necessarily a representation of a better model, but just of a better testing set.
Weird behaviour on pinch to zoom (macbook). It scrolls instead of zooming and when swiping back nothing happens.<p>Another example of why you should never mess with the defaults unless strictly necessary.
Using simple techniques, they found out that popular open source datasets like VOC or COCO contain up to 20% annotation errors in. By manually correcting those errors, they got an average error reduction of 5% for state-of-the-art computer vision models.
An idea on how this could work: repeatedly re-split the dataset (to cover all of it), and re-train a detector on the splits, then at the end of each training cycle surface validation frames with the highest computed loss (or some other metric more directly derived from bounding boxes, such as the number of high confidence "false" positives which could be instances of under-labeling) at the end of training. That's what I do on noisy, non-academic datasets, anyway.