The only reason that I'm sharing this is because there is a gem at the end. From the transcript<p>44:26
its responses but it's incredible it is incredible related to that and sort of last question in some sense this whole
44:33
effort which was hugely expensive in terms of people and time and dollars and everything else was an experiment to
44:41
further validate that the scaling laws keep going and why and turns out they do and they
44:48
probably keep going for a long time um I accept scaling laws like I accept quantum mechanics or something but they
44:54
still don't like I still don't know why like why should that be a property of the universe so why are scaling laws a
45:01
property of the universe<p>you want I can I can take a stab well the the fact that more compression will lead to more
45:07
intelligence that has this very strong philosophical grounding so the question is why does training bigger models for
45:15
longer give you more compression and there are a lot of theories here
45:20
there's the one I like is that the the relevant concepts are sort of uh sparse
45:27
in the in the the data of the world and in particularly it's is a power law so
45:34
that the like the hundth uh most important concept appears in one out of
45:39
a hundred of the documents or or whatever so there's long tales does that mean that<p>if we make a perfect data set
45:44
and figure out very data efficient algorithms i mean can go home it it means that there's potentially
45:50
exponential compute wins on the table to be very s sophisticated about your choice of data but but basically when
45:59
you just scoop up data passively you're going to require 10xing your compute and your
46:07
data to to get the next constant number of things in that tail and there's just that tail keeps
46:14
going it's long you keep you can keep uh mining it although as you alluded to you
46:22
can probably do a lot better<p>i think that's a good place to leave it
46:28
thank you guys very much that was fun yeah thank you