科技回声

6 条评论

vessenes11 个月前

So the paper itself is pretty significant, I think, from looking at it. The general methodology seems to be: train small model as a discriminatory scoring model on very high quality data (JEST is mostly concerned with multi-modal tasks it seems, so think image/text caption pairs), have that model score ‘maximally learnable’ batches on a larger / lower quality dataset, then train the big model using the scoring.This turns out to be significant FLOPs and quality win, even counting for the initial model training and scoring part of it, they claim roughly 10x for quality/FLOP tradeoffs, and they show some significantly beating SOTA numbers for some tasks in their model size.The bad part, to me, is that this is some significant engineering — it requires known high quality datasets, training of the scoring model, selection and scoring of the data for the big training run - this is not a bold new leap that’s going to be easy to implement for hobbyists - this is a practitioner’s excellent engineering showing the way forward for certain training needs.As always, appreciate the publishing from DeepMind - this looks like great work. It would be nice to see a company like together.ai or others get it actionized into a pipeline; it might be a bit, though. It looks relatively gnarly in the details on the data and scoring side.

评论 #40893390 未加载

morbicer11 个月前

Nice. Google scientists come up with ground breaking idea, then Google's PM bungles the chance to bring it to the market and productize it and someone like OpenAI or Anthropic will swoop in to reap the rewards. And the cycle repeats.Deep Mind people invent transformers and then they watch people laugh at Bard or what it's called nowadays because product and engineering lost the plot. Kodak is paging you some message from the grave, read it Google.

评论 #40892705 未加载

评论 #40893721 未加载

评论 #40893039 未加载

评论 #40893841 未加载

eutropia11 个月前

<a href="https://arxiv.org/pdf/2406.17711" rel="nofollow">https://arxiv.org/pdf/2406.17711</a> - link to the paper

评论 #40893269 未加载

kelseyfrog11 个月前

Great, improvements in efficiency will lead to greater resource consumption due to Jevons Paradox[1].1. <a href="https://en.wikipedia.org/wiki/Jevons_paradox" rel="nofollow">https://en.wikipedia.org/wiki/Jevons_paradox</a>

评论 #40895192 未加载

评论 #40893527 未加载

ricopags11 个月前

Pretty similar to cappy <a href="https://arxiv.org/abs/2311.06720" rel="nofollow">https://arxiv.org/abs/2311.06720</a>

swax11 个月前

AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.

评论 #40893448 未加载

评论 #40893090 未加载

6 条评论

vessenes11 个月前

评论 #40893390 未加载

morbicer11 个月前

评论 #40892705 未加载

评论 #40893721 未加载

评论 #40893039 未加载

评论 #40893841 未加载

eutropia11 个月前

<a href="https://arxiv.org/pdf/2406.17711" rel="nofollow">https://arxiv.org/pdf/2406.17711</a> - link to the paper

评论 #40893269 未加载

kelseyfrog11 个月前

评论 #40895192 未加载

评论 #40893527 未加载

ricopags11 个月前

Pretty similar to cappy <a href="https://arxiv.org/abs/2311.06720" rel="nofollow">https://arxiv.org/abs/2311.06720</a>

swax11 个月前

AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.

评论 #40893448 未加载

评论 #40893090 未加载

New AI Training Technique Is Drastically Faster, Says Google

6 条评论

New AI Training Technique Is Drastically Faster, Says Google

6 条评论