TechEcho

3 comments

talldayo10 months ago

> Would the model be smaller in size than the size of all 1000 together?This is a good question, because you're butting up against the fundamental limits of information theory that make this field interesting.For starters, I think we have to set some rules. A "lossless" representation of these 1000 movies would mean that simply prompting the system with the name of a film could generate the movie perfectly. If the model fails to reproduce that film on it's first try, or cannot exactly recreate the training file, it is a lossy compression.So with that being said, I think we can start painting a picture of how efficient ML can be. You are processing tens of thousands of frames of visual data while attempting to lose as little of the source material as possible. Getting a single movie to render properly on the first time inherently relies on luck; retrying over multiple attempts is infeasible due to how long you have to wait and how much energy you pay for. You'd be bruteforcing a video-generator against a checksum that it may-or-may not hit.I would argue that the efficiency of ML for storing this data relies on your tolerance for error. If you require an output equally as pristine as your input, ML is not a suitable compression medium for your data.

评论 #41026783 未加载

TheAlchemist10 months ago

This is actually an excellent question ! And quite an active field of research.This is one paper you could read - it's about text and not films, but same idea:"LANGUAGE MODELING IS COMPRESSION" - <a href="https://arxiv.org/pdf/2309.10668" rel="nofollow">https://arxiv.org/pdf/2309.10668</a>

Someone10 months ago

I wouldn’t aim for “perfectly”, not because it is impossible, but because it introduces an unnecessary constraint.Your typical movie is compressed with a lossy compression algorithm, so it will not perfectly replicate the original. You should aim for creating a compressor that creates something that has a smaller error, using some metric that takes into account what viewers can see.You also should define “size” as “size of the compressed movies plus the decompressor” because otherwise, you’ll find that the winning decompressor will contain copies of the movies.For a similar problem for text see <a href="https://en.wikipedia.org/wiki/Hutter_Prize" rel="nofollow">https://en.wikipedia.org/wiki/Hutter_Prize</a>

Ask HN: How efficient are MLs at storing data?

3 comments

Ask HN: How efficient are MLs at storing data?

3 comments