TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Pre-Training GPT-4.5 [video]

4 pointsby waynenilsenabout 1 month ago

1 comment

waynenilsenabout 1 month ago
The only reason that I&#x27;m sharing this is because there is a gem at the end. From the transcript<p>44:26 its responses but it&#x27;s incredible it is incredible related to that and sort of last question in some sense this whole 44:33 effort which was hugely expensive in terms of people and time and dollars and everything else was an experiment to 44:41 further validate that the scaling laws keep going and why and turns out they do and they 44:48 probably keep going for a long time um I accept scaling laws like I accept quantum mechanics or something but they 44:54 still don&#x27;t like I still don&#x27;t know why like why should that be a property of the universe so why are scaling laws a 45:01 property of the universe<p>you want I can I can take a stab well the the fact that more compression will lead to more 45:07 intelligence that has this very strong philosophical grounding so the question is why does training bigger models for 45:15 longer give you more compression and there are a lot of theories here 45:20 there&#x27;s the one I like is that the the relevant concepts are sort of uh sparse 45:27 in the in the the data of the world and in particularly it&#x27;s is a power law so 45:34 that the like the hundth uh most important concept appears in one out of 45:39 a hundred of the documents or or whatever so there&#x27;s long tales does that mean that<p>if we make a perfect data set 45:44 and figure out very data efficient algorithms i mean can go home it it means that there&#x27;s potentially 45:50 exponential compute wins on the table to be very s sophisticated about your choice of data but but basically when 45:59 you just scoop up data passively you&#x27;re going to require 10xing your compute and your 46:07 data to to get the next constant number of things in that tail and there&#x27;s just that tail keeps 46:14 going it&#x27;s long you keep you can keep uh mining it although as you alluded to you 46:22 can probably do a lot better<p>i think that&#x27;s a good place to leave it 46:28 thank you guys very much that was fun yeah thank you
评论 #43691434 未加载