TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Crowdlab: Effective algorithms to handle data labeled by multiple annotators

2 pointsby anishathalyeover 2 years ago

1 comment

anishathalyeover 2 years ago
Many real-world datasets use multiple annotations per example to ensure higher-quality labels. CROWDLAB is a new set of algorithms that estimate 3 key quantities better than prior standard crowdsourcing algorithms like GLAD and Dawid-Skene: (1) a consensus label per example, (2) a confidence score for the correctness of the consensus label, and (3) a rating for each annotator.<p>The blog post gives some intuition for how it works, along with some benchmarking results, and the math and the nitty-gritty details can be found in this paper: <a href="https:&#x2F;&#x2F;cleanlab.github.io&#x2F;multiannotator-benchmarks&#x2F;paper.pdf" rel="nofollow">https:&#x2F;&#x2F;cleanlab.github.io&#x2F;multiannotator-benchmarks&#x2F;paper.p...</a><p>Happy to answer any questions related to multi-annotator datasets or data-centric approaches to ML in general here.