I don't think I see enough discussion about what this means for privacy. There was some protection in the fact that it was prohibitively expensive to get someone to listen to every single one of our phonecalls/read all our emails/etc.<p>Worrying that this will no longer be the case.
So, uh, GPT-4 outperforms at labeling. What is that labeling used for?<p>"Employing Surge AI's top-tier human annotators at a rate of $25 per hour would have cost $500,000 for 20,000 hours of work, an excessive amount to invest in the research endeavor. Surge AI is a venture-backed startup that performs the human labeling for numerous AI companies including OpenAI, Meta, and Anthropic."<p>What could go wrong? Using GPT-4 to perform labeling used by OpenAI in order to train...uh, wait.
Great to see this tech and the money invested in it being used to take low-paying jobs away from people with limited options, instead of something like drug discovery or cancer biology.
Interesting to see what the impact will be on crowdsourcing annotation companies like Scale AI, especially after reading this article: <a href="https://www.forbes.com/sites/kenrickcai/2023/04/11/how-alexandr-wang-turned-an-army-of-clickworkers-into-a-73-billion-ai-unicorn/" rel="nofollow">https://www.forbes.com/sites/kenrickcai/2023/04/11/how-alexa...</a>
From reading the paper, GPT-4 also outperformed the researchers themselves in many categories, despite the researchers being the ones who created the dataset being used to perform the comparison.<p>In other words, the metrics are biased in the researchers’ favor — so GPT-4 would have beat them even more often (probably a majority of the time based on the numbers), if someone else had created the guidelines and golden labels.
Very interesting. Until the day OpenAI has a problem in their systems and the entire world grinds to a halt. Or they put outrageous new prices. Which apparently never happened in other fields, seems.
If you look at the table, the GPT-4 model has better correlation with the expert ensemble than the crowd does, but only on some criteria. The GPT-4 model is closer for all of the ethics questions, but the crowd is closer for the utility level and economic impact questions.
When an AI "outperforms" the "ground truth", it is by definition "worse", not "better".<p>And if your ground truth is problematic, then this is generally a problem of specification and quality control, <i>not</i> performance.
>This breakthrough saved the researchers over $500,000 and 20,000 hours of human labor.<p>BTW, this is interesting. There is a lot of noise about AI carbon footprint. Now imagine how much humans would eat and fart for 20.000 work hours. It's about 10 man/years. Assuming 8h / 5d / 50 weeks schedule.
This is really interesting result. Immediate and direct application of LLMs, with significant financial benefits. I think LLMs will drive tremendous productivity increase.
“ Employing Surge AI's top-tier human annotators at a rate of $25 per hour would have cost $500,000 for 20,000 hours of work”. That’s a wrap for Surge AI