TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GPT-4 Outperforms Elite Crowdworkers, Saving Researchers $500k and 20k hours

147 点作者 mztwo大约 2 年前

18 条评论

troops_h8r大约 2 年前
I don&#x27;t think I see enough discussion about what this means for privacy. There was some protection in the fact that it was prohibitively expensive to get someone to listen to every single one of our phonecalls&#x2F;read all our emails&#x2F;etc.<p>Worrying that this will no longer be the case.
评论 #35534708 未加载
评论 #35533744 未加载
评论 #35533748 未加载
评论 #35534114 未加载
评论 #35533517 未加载
评论 #35534136 未加载
rossdavidh大约 2 年前
So, uh, GPT-4 outperforms at labeling. What is that labeling used for?<p>&quot;Employing Surge AI&#x27;s top-tier human annotators at a rate of $25 per hour would have cost $500,000 for 20,000 hours of work, an excessive amount to invest in the research endeavor. Surge AI is a venture-backed startup that performs the human labeling for numerous AI companies including OpenAI, Meta, and Anthropic.&quot;<p>What could go wrong? Using GPT-4 to perform labeling used by OpenAI in order to train...uh, wait.
评论 #35534250 未加载
评论 #35536459 未加载
评论 #35534216 未加载
评论 #35545118 未加载
评论 #35536077 未加载
courseofaction大约 2 年前
We need new political arrangements to distribute the gains of AI or things are going to get very bad very quickly.
评论 #35534048 未加载
评论 #35534241 未加载
评论 #35534085 未加载
评论 #35551322 未加载
评论 #35534221 未加载
876978095789789大约 2 年前
Great to see this tech and the money invested in it being used to take low-paying jobs away from people with limited options, instead of something like drug discovery or cancer biology.
评论 #35534246 未加载
评论 #35534187 未加载
评论 #35534259 未加载
评论 #35534146 未加载
AndreLock大约 2 年前
Interesting to see what the impact will be on crowdsourcing annotation companies like Scale AI, especially after reading this article: <a href="https:&#x2F;&#x2F;www.forbes.com&#x2F;sites&#x2F;kenrickcai&#x2F;2023&#x2F;04&#x2F;11&#x2F;how-alexandr-wang-turned-an-army-of-clickworkers-into-a-73-billion-ai-unicorn&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.forbes.com&#x2F;sites&#x2F;kenrickcai&#x2F;2023&#x2F;04&#x2F;11&#x2F;how-alexa...</a>
评论 #35533491 未加载
评论 #35534049 未加载
hnaouesteuho大约 2 年前
From reading the paper, GPT-4 also outperformed the researchers themselves in many categories, despite the researchers being the ones who created the dataset being used to perform the comparison.<p>In other words, the metrics are biased in the researchers’ favor — so GPT-4 would have beat them even more often (probably a majority of the time based on the numbers), if someone else had created the guidelines and golden labels.
评论 #35534706 未加载
评论 #35534410 未加载
fatherzine大约 2 年前
This sounds awfully close to the bootstrap loop of singularity AGI.
og_kalu大约 2 年前
NLP is solved, more or less. Either way, Bespoke NLP is on its way out. It&#x27;s pretty funny how buried this is in the original paper.
评论 #35533405 未加载
评论 #35533700 未加载
mztwo大约 2 年前
Buried in an arXiv paper was this nugget. Thought I&#x27;d share!
shaky-carrousel大约 2 年前
Very interesting. Until the day OpenAI has a problem in their systems and the entire world grinds to a halt. Or they put outrageous new prices. Which apparently never happened in other fields, seems.
评论 #35533789 未加载
评论 #35534509 未加载
Workaccount2大约 2 年前
So if AI can generate datasets better than it&#x27;s own datasets...well that&#x27;s pretty damn substantial.
ftxbro大约 2 年前
If you look at the table, the GPT-4 model has better correlation with the expert ensemble than the crowd does, but only on some criteria. The GPT-4 model is closer for all of the ethics questions, but the crowd is closer for the utility level and economic impact questions.
评论 #35534208 未加载
boringuser2大约 2 年前
Does OpenAI even have the compute to begin to meet demand?
评论 #35533816 未加载
评论 #35533742 未加载
tpoacher大约 2 年前
When an AI &quot;outperforms&quot; the &quot;ground truth&quot;, it is by definition &quot;worse&quot;, not &quot;better&quot;.<p>And if your ground truth is problematic, then this is generally a problem of specification and quality control, <i>not</i> performance.
two_in_one大约 2 年前
&gt;This breakthrough saved the researchers over $500,000 and 20,000 hours of human labor.<p>BTW, this is interesting. There is a lot of noise about AI carbon footprint. Now imagine how much humans would eat and fart for 20.000 work hours. It&#x27;s about 10 man&#x2F;years. Assuming 8h &#x2F; 5d &#x2F; 50 weeks schedule.
评论 #35534820 未加载
评论 #35534710 未加载
g42gregory大约 2 年前
This is really interesting result. Immediate and direct application of LLMs, with significant financial benefits. I think LLMs will drive tremendous productivity increase.
m3kw9大约 2 年前
“ Employing Surge AI&#x27;s top-tier human annotators at a rate of $25 per hour would have cost $500,000 for 20,000 hours of work”. That’s a wrap for Surge AI
评论 #35534421 未加载
naveen99大约 2 年前
What’s an elite crowdworker ? Top 1% sheep ? Or just the usual clickbait oxymoron ?