TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Experience with weak labelling (e.g. Snorkel) for data annotation?

7 pointsby jordnover 3 years ago
I&#x27;m trying to learn from the experience of others about weak labelling.<p>Weak labelling: writing heuristic rules to approximately label a dataset, and using those as a form of &#x27;weak&#x27; supervision for training a machine learning model. Snorkel.org were pioneers of this approach.<p>Would love to hear any real world tales of trying it! What worked well and what didn&#x27;t? How easy was it to get domain experts to write the rules? How did you mix ground truth data with probabilistic labels? that sort of thing.<p>context: we&#x27;re building tools in this space.

1 comment

kingcaiover 3 years ago
I&#x27;ve used Snorkel quite a bit at work, usually combined with transformers models.<p>It has worked quite well for us. The snorkel public package is a bit out of date now, as I think they&#x27;re building a SaaS solution and focusing more on that. But aside from that it&#x27;s quite easy to use. Other downside is that lot&#x27;s of cool ideas are present in the papers but not fully implemented (not complaining though!). Also thinking of a diverse set of heuristics can be hard.<p>We use snorkel a lot for bootstrapping text classifiers. Our classification models don&#x27;t require much domain expertise, as it&#x27;s pretty easy to tell if a text sample is classified correctly, so the main advantage is just avoiding labeling costs and quicker prototyping. We find that we can usually use embedding similarity as a good heuristic. I wrote up a little bit about this approach here if you&#x27;re curious: <a href="https:&#x2F;&#x2F;cultivate.com&#x2F;why-cultivate-uses-embeddings-for-rapid-prototyping&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cultivate.com&#x2F;why-cultivate-uses-embeddings-for-rapi...</a><p>Happy to answer any additional questions you have too :)
评论 #29316281 未加载