The $5000 Compression Challenge

42 pointsby igul222over 12 years ago

14 comments

habermanover 12 years ago

Poetic justice. I love this quotation from Mike:"I think that I did not make a mistake except to believe that you truly intended to attempt to compress the data I was sending you."Mike is happily preying upon people by taking their money to enter what he believes is an impossible contest, but the moment he is outsmarted he appeals to morals and calls into question whether Patrick was acting honestly.I'm guessing there is some backstory on this newsgroup involving people who would claim to invent compression algorithms that do the impossible. I'm imagining that one day Mike thought "time to get these people to put their money where their mouth is."Still, it's pretty low to pose the challenge without saying "I'm taking this bet because it's highly unlikely that you can actually succeed. This is explained in the FAQ. Are you sure you want to give me $100?"

评论 #5025548 未加载

评论 #5025615 未加载

评论 #5025566 未加载

Nate75Sandersover 12 years ago

Mike Goldman does not need to ever issue a challenge again.It is WIDELY agreed upon by people who engage in this activity regularly (known as "prop betting") that the spirit of the law does not matter one bit.He was way out of his element here and Patrick was well within his rights to do what he did.

评论 #5025581 未加载

评论 #5027827 未加载

arsover 12 years ago

The pigeonhole principle only applies to a specific compression algorithm.If you get to custom write the algorithm for the data you can arrange things so that for this datafile it will be smaller, yet larger for any other file, and still meet the principle.i.e. this is not actually impossible. If you can find ANY redundancy in the data, and you can code a very small decompresser specifically for that, you can probably win.And due to the nature of randomness, there are always numeric streaks, and other patterns in the data. The larger the datafile the better the chance of finding some sort of pattern or streak.The decompresser does not have to be large either - it would be perfectly legal to use a perl script for example. (i.e. so you don't have to use space writing IO handling code).Edit: I suspect I may be wrong here.

评论 #5025627 未加载

评论 #5025657 未加载

评论 #5026717 未加载

CJeffersonover 12 years ago

This is an old story, but a fun one.To me it is clear that Mike Goldman should have paid up. Patrick clearly completed the challenge placed on him.While Patrick didn't obey the spirit of the challenge he obeyed the law, and Mike knew full well that the spirit of his challenge was impossible!

评论 #5025561 未加载

评论 #5025543 未加载

splicerover 12 years ago

Here's another approach:Let the compressed file contain a hash (say, SHA1) of the original file. The decompression program then generates random files of the chosen size. If a generated file's hash doesn't match the desired hash, delete it. Now run the program for a very long time. The program is likely to eventually reproduce the original file (along with a bunch of files that happen to have the same hash), and you win :)

评论 #5025686 未加载

评论 #5025752 未加载

评论 #5025668 未加载

vanniover 12 years ago

Previous discussion:<a href="http://news.ycombinator.com/item?id=4616704" rel="nofollow">http://news.ycombinator.com/item?id=4616704</a>

algoriasover 12 years ago

Even without doing any filesystem tricks, the challenge seems quite possible. While it's true that writing a compressor that compresses all input data is impossible, this barrier is removed because the compressor only needs to work on a single file.

评论 #5025677 未加载

评论 #5025718 未加载

RyanZAGover 12 years ago

It is definitely possible to win this challenge though.Consider an arbitrary long series of integers. Somewhere within this series of integers, there will be some kind of randomly created pattern, since this is a property of an infinite set. eg. somewhere within the data set, there could be the values [1, 2, 3, ... 10] or [1, 3, 9, .. 27] or [1, 2, 4, 16, 32] - it does not matter which of these patterns, exist, only that there does exist some mathematical pattern in the data.The chances of there being no pattern in a big enough set of random data is impossible as there is a finite number of possible data combinations for bytes [1..256][1..256] etc. I guess a data set of 256^256 bytes would guarantee a pattern, but I'm sure there is a far smaller number that would give 99% confidence.Once you find a pattern in the data, you can remove that pattern and replace it with code that will recreate the pattern using a data offset. ie. you remove the pattern from the data completely, and replace it with a smaller piece of code to recreate that pattern exactly and insert it into the correct position.The key here is that once the data has been generated, it is no longer 'random data', but a fixed input. eg, you cannot compress a random unknown string of bytes, but you can compress the string [1,2,4,16..]The output data would have all possible mathematical patterns removed from it, and the decompression code would be just a list of mathematical functions and data offset points.

评论 #5025784 未加载

DanBCover 12 years ago

If you're interested in efficient compression with English you might like the Hutter Prize (<a href="http://prize.hutter1.net/" rel="nofollow">http://prize.hutter1.net/</a>) - a €50,000 prize if you can compress the 100MB file enwik8 to less than the current record of about 16MB. They claim it's an AI problem. I guess it could be if you're compressing a bunch of different English texts, but this challenge has a single text. A program that provides very efficient compression of this text might not be so great for other texts.Mike's challenge took a specially selected random file. As other people said, this was an attempt to show compression kooks that some files are not compressible (if you include the size of the decompressor).

pavankyover 12 years ago

Makes me wonder if a "decompressor" adding generating hundreds of decompressed files by brute force (of which the original file is one) would be a valid solution. You could store a certain section of the original as the "compressed file"

kyle4211over 12 years ago

I can't find anything which solidly points to this challenge being 'likely' impossible.This all is not so simple as pointing at Kolmogorov complexity. Randomness is not so much inherent as relative to the machine on which you're running your program.<a href="http://rjlipton.wordpress.com/2011/06/02/how-powerful-are-random-strings/" rel="nofollow">http://rjlipton.wordpress.com/2011/06/02/how-powerful-are-ra...</a>

xentroniumover 12 years ago

Why even bother with file concatenation? Just use 3M chars long filename for an empty file to "compress" 3Mb of data.

评论 #5025553 未加载

kunilover 12 years ago

I don't get it, why he didn't just write a simple algorithm and get the prize? It looks like an easy money

评论 #5025439 未加载

splicerover 12 years ago

Just post the original data file on the web, then have your "decompression" program download the file. The "compressed" file could simply contain a URL.

14 comments

habermanover 12 years ago

评论 #5025548 未加载

评论 #5025615 未加载

评论 #5025566 未加载

Nate75Sandersover 12 years ago

评论 #5025581 未加载

评论 #5027827 未加载

arsover 12 years ago

评论 #5025627 未加载

评论 #5025657 未加载

评论 #5026717 未加载

CJeffersonover 12 years ago

评论 #5025561 未加载

评论 #5025543 未加载

splicerover 12 years ago

评论 #5025686 未加载

评论 #5025752 未加载

评论 #5025668 未加载

vanniover 12 years ago

Previous discussion:<a href="http://news.ycombinator.com/item?id=4616704" rel="nofollow">http://news.ycombinator.com/item?id=4616704</a>

algoriasover 12 years ago

评论 #5025677 未加载

评论 #5025718 未加载

RyanZAGover 12 years ago

评论 #5025784 未加载

DanBCover 12 years ago

pavankyover 12 years ago

kyle4211over 12 years ago

xentroniumover 12 years ago

Why even bother with file concatenation? Just use 3M chars long filename for an empty file to "compress" 3Mb of data.

评论 #5025553 未加载

kunilover 12 years ago

I don't get it, why he didn't just write a simple algorithm and get the prize? It looks like an easy money

评论 #5025439 未加载

splicerover 12 years ago

Just post the original data file on the web, then have your "decompression" program download the file. The "compressed" file could simply contain a URL.