TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Experience with weak labelling (e.g. Snorkel) for data annotation?

7 点作者 jordn超过 3 年前
I&#x27;m trying to learn from the experience of others about weak labelling.<p>Weak labelling: writing heuristic rules to approximately label a dataset, and using those as a form of &#x27;weak&#x27; supervision for training a machine learning model. Snorkel.org were pioneers of this approach.<p>Would love to hear any real world tales of trying it! What worked well and what didn&#x27;t? How easy was it to get domain experts to write the rules? How did you mix ground truth data with probabilistic labels? that sort of thing.<p>context: we&#x27;re building tools in this space.

1 comment

kingcai超过 3 年前
I&#x27;ve used Snorkel quite a bit at work, usually combined with transformers models.<p>It has worked quite well for us. The snorkel public package is a bit out of date now, as I think they&#x27;re building a SaaS solution and focusing more on that. But aside from that it&#x27;s quite easy to use. Other downside is that lot&#x27;s of cool ideas are present in the papers but not fully implemented (not complaining though!). Also thinking of a diverse set of heuristics can be hard.<p>We use snorkel a lot for bootstrapping text classifiers. Our classification models don&#x27;t require much domain expertise, as it&#x27;s pretty easy to tell if a text sample is classified correctly, so the main advantage is just avoiding labeling costs and quicker prototyping. We find that we can usually use embedding similarity as a good heuristic. I wrote up a little bit about this approach here if you&#x27;re curious: <a href="https:&#x2F;&#x2F;cultivate.com&#x2F;why-cultivate-uses-embeddings-for-rapid-prototyping&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cultivate.com&#x2F;why-cultivate-uses-embeddings-for-rapi...</a><p>Happy to answer any additional questions you have too :)
评论 #29316281 未加载