TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

New AI Training Technique Is Drastically Faster, Says Google

84 点作者 moondistance11 个月前

6 条评论

vessenes11 个月前
So the paper itself is pretty significant, I think, from looking at it. The general methodology seems to be: train small model as a discriminatory scoring model on very high quality data (JEST is mostly concerned with multi-modal tasks it seems, so think image&#x2F;text caption pairs), have that model score ‘maximally learnable’ batches on a larger &#x2F; lower quality dataset, then train the big model using the scoring.<p>This turns out to be significant FLOPs and quality win, even counting for the initial model training and scoring part of it, they claim roughly 10x for quality&#x2F;FLOP tradeoffs, and they show some significantly beating SOTA numbers for some tasks in their model size.<p>The bad part, to me, is that this is some significant engineering — it requires known high quality datasets, training of the scoring model, selection and scoring of the data for the big training run - this is not a bold new leap that’s going to be easy to implement for hobbyists - this is a practitioner’s excellent engineering showing the way forward for certain training needs.<p>As always, appreciate the publishing from DeepMind - this looks like great work. It would be nice to see a company like together.ai or others get it actionized into a pipeline; it might be a bit, though. It looks relatively gnarly in the details on the data and scoring side.
评论 #40893390 未加载
morbicer11 个月前
Nice. Google scientists come up with ground breaking idea, then Google&#x27;s PM bungles the chance to bring it to the market and productize it and someone like OpenAI or Anthropic will swoop in to reap the rewards. And the cycle repeats.<p>Deep Mind people invent transformers and then they watch people laugh at Bard or what it&#x27;s called nowadays because product and engineering lost the plot. Kodak is paging you some message from the grave, read it Google.
评论 #40892705 未加载
评论 #40893721 未加载
评论 #40893039 未加载
评论 #40893841 未加载
eutropia11 个月前
<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2406.17711" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2406.17711</a> - link to the paper
评论 #40893269 未加载
kelseyfrog11 个月前
Great, improvements in efficiency will lead to greater resource consumption due to Jevons Paradox[1].<p>1. <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Jevons_paradox" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Jevons_paradox</a>
评论 #40895192 未加载
评论 #40893527 未加载
ricopags11 个月前
Pretty similar to cappy <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2311.06720" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2311.06720</a>
swax11 个月前
AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.
评论 #40893448 未加载
评论 #40893090 未加载