153 pointsby nihit-desaialmost 2 years ago

Hi HN! I'm excited to share Autolabel, an open-source Python library to label and enrich text datasets with any Large Language Model (LLM) of your choice.We built Autolabel because access to clean, labeled data is a huge bottleneck for most ML/data science teams. The most capable LLMs are able to label data with high accuracy, and at a fraction of the cost and time compared to manual labeling. With Autolabel, you can leverage LLMs to label any text dataset with <5 lines of code.We’re eager for your feedback!

7 comments

bomewishalmost 2 years ago

What can this do that the new ‘calling functions’ feature can’t? It seems to be roughly the same thing?

评论 #36417479 未加载

devjabalmost 2 years ago

This is very interesting to me. We spent a significant time “labelling” data when I was in the public sector digitalisation. Basically what was done, was to do the LLM part manually and then have engines like this on top of it. Having used ChatGPT to write JSDoc documentation for a while now, and been very impressed with how good it is when it understands your code through good use of naming conventions, I’m fairly certain it’ll be the future of “librarian” styled labelling of case files.But the key issue is going to be privacy. I’m not big on LLM, so I’m sorry if this is obvious, but can I use something like this without sending my data outside my own organisation?

评论 #36415859 未加载

评论 #36417333 未加载

viswajithiiialmost 2 years ago

Thank you for open sourcing this! This seems very useful, especially because of the confidence estimation, which lets you use LLMs for the points they can do well and fall back to manual labelling for the rest.

msp26almost 2 years ago

>Refuel provides LLMs that can compute confidence scores for every label, if the LLM you've chosen doesn't provide token-level log probabilities.How does this work exactly?

isawczukalmost 2 years ago

You should read carefully OpenAI terms and conditions before using it to build custom datasets.

评论 #36413952 未加载

评论 #36409703 未加载

评论 #36409773 未加载

applgo443almost 2 years ago

How does the confidence scores work?

voz_almost 2 years ago

You just posted this here <a href="https://news.ycombinator.com/item?id=36384015">https://news.ycombinator.com/item?id=36384015</a>It's one thing to show HN / share, its another thing to spam it with your ads.

评论 #36411062 未加载

Show HN: Autolabel, a Python library to label and enrich text data with LLMs