TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

LLMs can label data as well as human annotators, but 20 times faster

55 pointsby nihit-desaialmost 2 years ago

7 comments

rossdavidhalmost 2 years ago
AHAHAHAHAHA! There is approximately a 0% chance that the big companies paying for data annotation will be far-sighted enough to avoid LLM-automated labeling of their data, for several reasons:<p>1) it will work well, at first, and only become low-quality after they (and their budgets) have become accustomed to paying 1&#x2F;20th as much for the service<p>2) even if they pay for &quot;human&quot; labeling, they will go for the low cost bid, in a far-away country, which will subcontract to an LLM service without telling them<p>3) &quot;hey, we should pay more for this input, in order to avoid not-yet-seen quality problems in the future&quot;, has practically never won an argument in any large corporation ever. I won&#x27;t say absolutely 0 times, but pretty close.<p>Long story short, the use of LLM&#x27;s by Big Tech may be doomed. Much like how &quot;SEO optimization&quot; turns quickly into clickbait and link farms if there is not high-urgency and high-priority efforts to combat it, LLM&#x27;s (and other trendy forms of AI that require lots of labeled input) will quickly turn sour and produce even less impressive results than they already do.<p>The current wave of &quot;AI&quot; hype looks set to succeed about as well as IBM Watson.
评论 #36385466 未加载
评论 #36384749 未加载
评论 #36385554 未加载
poomeralmost 2 years ago
At work we were facing this dilemna. Our team is working on a model to detect fraud&#x2F;scam messages, in production it needs to label ~500k messages a day at low cost. We wanted to train a basic gbt&#x2F;BERT model to run locally but we considered using GPT-4 as an label source instead of our usual human labelers.<p>For us human labeling is suprisingly cheap, the main advantage of GPT-4 would be that it would be much faster, since scams are always changing we could general new labels regularly and be continuously retraining our model.<p>In the end we didn&#x27;t go down that route, there were several problems:<p>- GPT-4 accuracy wasn&#x27;t as good as human labelers. I believe this is because scam messages are intentionally tricky, and require a much more general understanding of the world compared to the datasets used in this article which feature simpler labeling problems. Also, I don&#x27;t trust that there was no funny business going on in generating the results for this blog, since there is clear conflict of interest with the business that owns it.<p>- GPT-4 would be consistently fooled by certain types of scams whereas human annotators work off a consensus procedure. This could probably be solved in the future when there&#x27;s a larger pool of other high-quality LLMs available, and we can pool them for consensus.<p>- Concern that some PII information gets accidentally sent to OpenAI, of course nobody trusts that those guys will treat our customers data with any level of appropriate ethics.
评论 #36391026 未加载
评论 #36386417 未加载
评论 #36385648 未加载
orangepurplealmost 2 years ago
This will probably work as long as the material being annotated is similar to the material the LLM was trained on. When it encounters novel data (value) it will likely perform poorly.
评论 #36384498 未加载
dimaturaalmost 2 years ago
I don&#x27;t have experience with text&#x2F;nlp problems, but some degree of automation&#x2F;assistance in labeling is a fairly common practice in computer vision. If you have a certain task where the ML model gets you 90% there, then you can use that as a starting point and have a human fix the remaining 10%. (Of course, this should be done in a way that the overall effort is lower than labeling from scratch, which is partially a UI problem). If your model is so good that it completely outperforms humans (at least for now, before data drift kicks in) then that&#x27;s a good problem to have, assume your model evaluation is sane.
voz_almost 2 years ago
If an llm labels it, does that have the same value? Isn’t it just fancy regurgitation of knowns?
评论 #36384255 未加载
评论 #36384323 未加载
评论 #36386335 未加载
评论 #36385681 未加载
morelispalmost 2 years ago
How was ground truth obtained if not via human annotation?
评论 #36384309 未加载
评论 #36384301 未加载
coldteaalmost 2 years ago
Only 20?
评论 #36395012 未加载