TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

GPT-4 Outperforms Elite Crowdworkers, Saving Researchers $500k and 20k hours

147 pointsby mztwoabout 2 years ago

18 comments

troops_h8rabout 2 years ago
I don&#x27;t think I see enough discussion about what this means for privacy. There was some protection in the fact that it was prohibitively expensive to get someone to listen to every single one of our phonecalls&#x2F;read all our emails&#x2F;etc.<p>Worrying that this will no longer be the case.
评论 #35534708 未加载
评论 #35533744 未加载
评论 #35533748 未加载
评论 #35534114 未加载
评论 #35533517 未加载
评论 #35534136 未加载
rossdavidhabout 2 years ago
So, uh, GPT-4 outperforms at labeling. What is that labeling used for?<p>&quot;Employing Surge AI&#x27;s top-tier human annotators at a rate of $25 per hour would have cost $500,000 for 20,000 hours of work, an excessive amount to invest in the research endeavor. Surge AI is a venture-backed startup that performs the human labeling for numerous AI companies including OpenAI, Meta, and Anthropic.&quot;<p>What could go wrong? Using GPT-4 to perform labeling used by OpenAI in order to train...uh, wait.
评论 #35534250 未加载
评论 #35536459 未加载
评论 #35534216 未加载
评论 #35545118 未加载
评论 #35536077 未加载
courseofactionabout 2 years ago
We need new political arrangements to distribute the gains of AI or things are going to get very bad very quickly.
评论 #35534048 未加载
评论 #35534241 未加载
评论 #35534085 未加载
评论 #35551322 未加载
评论 #35534221 未加载
876978095789789about 2 years ago
Great to see this tech and the money invested in it being used to take low-paying jobs away from people with limited options, instead of something like drug discovery or cancer biology.
评论 #35534246 未加载
评论 #35534187 未加载
评论 #35534259 未加载
评论 #35534146 未加载
AndreLockabout 2 years ago
Interesting to see what the impact will be on crowdsourcing annotation companies like Scale AI, especially after reading this article: <a href="https:&#x2F;&#x2F;www.forbes.com&#x2F;sites&#x2F;kenrickcai&#x2F;2023&#x2F;04&#x2F;11&#x2F;how-alexandr-wang-turned-an-army-of-clickworkers-into-a-73-billion-ai-unicorn&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.forbes.com&#x2F;sites&#x2F;kenrickcai&#x2F;2023&#x2F;04&#x2F;11&#x2F;how-alexa...</a>
评论 #35533491 未加载
评论 #35534049 未加载
hnaouesteuhoabout 2 years ago
From reading the paper, GPT-4 also outperformed the researchers themselves in many categories, despite the researchers being the ones who created the dataset being used to perform the comparison.<p>In other words, the metrics are biased in the researchers’ favor — so GPT-4 would have beat them even more often (probably a majority of the time based on the numbers), if someone else had created the guidelines and golden labels.
评论 #35534706 未加载
评论 #35534410 未加载
fatherzineabout 2 years ago
This sounds awfully close to the bootstrap loop of singularity AGI.
og_kaluabout 2 years ago
NLP is solved, more or less. Either way, Bespoke NLP is on its way out. It&#x27;s pretty funny how buried this is in the original paper.
评论 #35533405 未加载
评论 #35533700 未加载
mztwoabout 2 years ago
Buried in an arXiv paper was this nugget. Thought I&#x27;d share!
shaky-carrouselabout 2 years ago
Very interesting. Until the day OpenAI has a problem in their systems and the entire world grinds to a halt. Or they put outrageous new prices. Which apparently never happened in other fields, seems.
评论 #35533789 未加载
评论 #35534509 未加载
Workaccount2about 2 years ago
So if AI can generate datasets better than it&#x27;s own datasets...well that&#x27;s pretty damn substantial.
ftxbroabout 2 years ago
If you look at the table, the GPT-4 model has better correlation with the expert ensemble than the crowd does, but only on some criteria. The GPT-4 model is closer for all of the ethics questions, but the crowd is closer for the utility level and economic impact questions.
评论 #35534208 未加载
boringuser2about 2 years ago
Does OpenAI even have the compute to begin to meet demand?
评论 #35533816 未加载
评论 #35533742 未加载
tpoacherabout 2 years ago
When an AI &quot;outperforms&quot; the &quot;ground truth&quot;, it is by definition &quot;worse&quot;, not &quot;better&quot;.<p>And if your ground truth is problematic, then this is generally a problem of specification and quality control, <i>not</i> performance.
two_in_oneabout 2 years ago
&gt;This breakthrough saved the researchers over $500,000 and 20,000 hours of human labor.<p>BTW, this is interesting. There is a lot of noise about AI carbon footprint. Now imagine how much humans would eat and fart for 20.000 work hours. It&#x27;s about 10 man&#x2F;years. Assuming 8h &#x2F; 5d &#x2F; 50 weeks schedule.
评论 #35534820 未加载
评论 #35534710 未加载
g42gregoryabout 2 years ago
This is really interesting result. Immediate and direct application of LLMs, with significant financial benefits. I think LLMs will drive tremendous productivity increase.
m3kw9about 2 years ago
“ Employing Surge AI&#x27;s top-tier human annotators at a rate of $25 per hour would have cost $500,000 for 20,000 hours of work”. That’s a wrap for Surge AI
评论 #35534421 未加载
naveen99about 2 years ago
What’s an elite crowdworker ? Top 1% sheep ? Or just the usual clickbait oxymoron ?