With MLs for video and audio getting better and better, I had a thought, what if you were to train a model to perfectly replicate 1000 movies.<p>Would the model be smaller in size than the size of all 1000 together?<p>I’m sorry if this is a stupid question, I don’t fully understand how these work under the hood, I do have a bit of theory but very little practice.<p>My curiosity was along the lines of the CDs like “Greatest Hits of X”, similarly (if the model was more efficient on storage) you could have “Greatest Movies and TV Shows of X”.
> Would the model be smaller in size than the size of all 1000 together?<p>This is a good question, because you're butting up against the fundamental limits of information theory that make this field interesting.<p>For starters, I think we have to set some rules. A "lossless" representation of these 1000 movies would mean that simply prompting the system with the name of a film could generate the movie perfectly. If the model fails to reproduce that film on it's first try, or cannot <i>exactly</i> recreate the training file, it is a lossy compression.<p>So with that being said, I think we can start painting a picture of how efficient ML can be. You are processing tens of thousands of frames of visual data while attempting to lose as little of the source material as possible. Getting a <i>single movie</i> to render properly on the first time inherently relies on luck; retrying over multiple attempts is infeasible due to how long you have to wait and how much energy you pay for. You'd be bruteforcing a video-generator against a checksum that it may-or-may not hit.<p>I would argue that the efficiency of ML for storing this data relies on your tolerance for error. If you require an output equally as pristine as your input, ML is not a suitable compression medium for your data.
This is actually an excellent question ! And quite an active field of research.<p>This is one paper you could read - it's about text and not films, but same idea:<p>"LANGUAGE MODELING IS COMPRESSION" - <a href="https://arxiv.org/pdf/2309.10668" rel="nofollow">https://arxiv.org/pdf/2309.10668</a>
I wouldn’t aim for “perfectly”, not because it is impossible, but because it introduces an unnecessary constraint.<p>Your typical movie is compressed with a lossy compression algorithm, so it will not perfectly replicate the original. You should aim for creating a compressor that creates something that has a smaller error, using some metric that takes into account what viewers can see.<p>You also should define “size” as “size of the compressed movies plus the decompressor” because otherwise, you’ll find that the winning decompressor will contain copies of the movies.<p>For a similar problem for text see <a href="https://en.wikipedia.org/wiki/Hutter_Prize" rel="nofollow">https://en.wikipedia.org/wiki/Hutter_Prize</a>