TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Autolabel, a Python library to label and enrich text data with LLMs

153 pointsby nihit-desaialmost 2 years ago
Hi HN! I&#x27;m excited to share Autolabel, an open-source Python library to label and enrich text datasets with any Large Language Model (LLM) of your choice.<p>We built Autolabel because access to clean, labeled data is a huge bottleneck for most ML&#x2F;data science teams. The most capable LLMs are able to label data with high accuracy, and at a fraction of the cost and time compared to manual labeling. With Autolabel, you can leverage LLMs to label any text dataset with &lt;5 lines of code.<p>We’re eager for your feedback!

7 comments

bomewishalmost 2 years ago
What can this do that the new ‘calling functions’ feature can’t? It seems to be roughly the same thing?
评论 #36417479 未加载
devjabalmost 2 years ago
This is very interesting to me. We spent a significant time “labelling” data when I was in the public sector digitalisation. Basically what was done, was to do the LLM part manually and then have engines like this on top of it. Having used ChatGPT to write JSDoc documentation for a while now, and been very impressed with how good it is when it understands your code through good use of naming conventions, I’m fairly certain it’ll be the future of “librarian” styled labelling of case files.<p>But the key issue is going to be privacy. I’m not big on LLM, so I’m sorry if this is obvious, but can I use something like this without sending my data outside my own organisation?
评论 #36415859 未加载
评论 #36417333 未加载
viswajithiiialmost 2 years ago
Thank you for open sourcing this! This seems very useful, especially because of the confidence estimation, which lets you use LLMs for the points they can do well and fall back to manual labelling for the rest.
msp26almost 2 years ago
&gt;Refuel provides LLMs that can compute confidence scores for every label, if the LLM you&#x27;ve chosen doesn&#x27;t provide token-level log probabilities.<p>How does this work exactly?
isawczukalmost 2 years ago
You should read carefully OpenAI terms and conditions before using it to build custom datasets.
评论 #36413952 未加载
评论 #36409703 未加载
评论 #36409773 未加载
applgo443almost 2 years ago
How does the confidence scores work?
voz_almost 2 years ago
You just posted this here <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36384015">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36384015</a><p>It&#x27;s one thing to show HN &#x2F; share, its another thing to spam it with your ads.
评论 #36411062 未加载