Many real-world datasets use multiple annotations per example to ensure higher-quality labels. CROWDLAB is a new set of algorithms that estimate 3 key quantities better than prior standard crowdsourcing algorithms like GLAD and Dawid-Skene: (1) a consensus label per example, (2) a confidence score for the correctness of the consensus label, and (3) a rating for each annotator.<p>The blog post gives some intuition for how it works, along with some benchmarking results, and the math and the nitty-gritty details can be found in this paper: <a href="https://cleanlab.github.io/multiannotator-benchmarks/paper.pdf" rel="nofollow">https://cleanlab.github.io/multiannotator-benchmarks/paper.p...</a><p>Happy to answer any questions related to multi-annotator datasets or data-centric approaches to ML in general here.