The $5000 Compression Challenge

104 pointsby jpabloover 12 years ago

24 comments

mm_alexover 12 years ago

surprised all the comments here are about whether he achieved compression or not. Point = Missed!the entire thing was an ingenious trolling exercise, as he says, to 'out trick the trickster'. his goal was to exploit the loose wording of the challenge, and prove a point (possibly winning $5k in the process); I found it an entertaining story along those lines.it proves nothing (new) about compression, or lack of it; he even stats that the consensus at the time was 'unanimous' that no compression had taken place. that's not where the interest in this link/story lies.thanks for the original link, OP.

adamtjover 12 years ago

The argument is over whether the challenge is identical with its spirit or with its statement. Patrick clearly failed the spirit of the challenge. He tried to exploit a loophole in the statement of the challenge. So what which is the actual challenge?Actually, it doesn't matter. Patrick failed to complete the challenge as stated, which is important because he was exploiting a loophole in the statement. Technically, his files were smaller. But technically, he was required to send Mike his files and decompresser. Instead, he put them on his server and sent Mike their location. Mike then had to get the files. It goes both ways.Finding loopholes, while clever, isn't terribly interesting.

评论 #4618102 未加载

kevinpetover 12 years ago

Look at the relevant part of the exchange:> > I meant can I send you a compressor and several compressed files whose total file size is less than the original uncompressed file and from which I can regenerate the original uncompressed file > > Patrick> Sure -- but you send me a decompressor, I don't need the compressor.Was this "compressor" vs. "decompressor" actually a slip up, or intentional to distract from the importance of what he was asking?

jereover 12 years ago

I read through the compression FAQ and found it quite interesting. I had never heard the counting argument before, but it's very intuitive.Part of the FAQ is a collection of crazies claiming that they have created novel compression techniques. One entry is about Jules Gilbert, but it ends abruptly. I googled him and found that the crazy continues....>I am afraid that I am not a fan of Google. They are gay-friendly and also they suppressed lot's of stuff about Obama before he was nominated, and again, before he was elected.<a href="http://scientopia.org/blogs/goodmath/2010/08/13/969/" rel="nofollow">http://scientopia.org/blogs/goodmath/2010/08/13/969/</a>

jpabloover 12 years ago

I wonder if Mike Goldman ever made a response after everything was published. I found very distasteful when he threatened to accuse Patrick of fraud.He should have recognized that the rules where vaguely stated and at the very least offer an apology. Plain refusing to download all his files, when he had expressly accepted multiple files at being ok was kind of weird.The google groups link in the page no longer works so I wonder if that's archived elsewhwere.

评论 #4617900 未加载

评论 #4619053 未加载

评论 #4617631 未加载

评论 #4618703 未加载

DaveListerover 12 years ago

Obviously Patrick met the requirements of the loosely worded challenge.Though I wonder if the inverse would be acceptable. Take 100 files all the same size, but smaller than the filesystem block size. These files are n100 bytes, but take up blocksize100 bytes on the disk. Then, using tar (which itself does not compress, just tacks files end to end with metadata in between), 'compress' them into a single file that takes up fewer disk blocks than before.

hermannj314over 12 years ago

Just to clarify, the guy made a contest that he knew, statistically, could not be won except if the random data file he produced just happened to be compressible (due to luck alone).In other words, he knowingly accepted money in a contest of chance not a contest of skill. So this contest (assuming either participant was in the US at the time) violated federal and state laws against unlicensed gambling operations, correct? [Assuming Mike Goldman did not possess a license to operate an online casino in 2001]

zwdrover 12 years ago

I dont think he completed the challenge. He found a loophole, so _maybe_ you could say he was successful. In exploiting the rules that is, not in solving the problem that Goldman stated. Still a clever solution, but obviously not what was asked for.

评论 #4617128 未加载

ykover 12 years ago

Perhaps a stupid question, but is the challenge not also solvable with 'honest' comp.sci? The way I think about it at the moment, is that I take a family of functions, C_k, which map the space of n elements strings to the space of [1, 2n] element strings ( the union of m element strings for 0 > m > 2n) in a essentially random way. The output is then in [1, n - sizeof(C_k) -1] with some probability p and if I try long enough, I find a compression scheme which compresses the given input.

评论 #4617881 未加载

sp332over 12 years ago

I think anyone would agree that if you sent me an uncompressed file of size 1000 bytes and I sent you a decompressor of size 400 bytes and a compressed file of size 599 bytes, I would have met the challenge in spite of the fact that these two files will take up more disk space than the original file.I don't agree, since the compressor can be tuned to the data you could just move some of the data into the "decompressor" and end up with an arbitrarily small "compressed" file.

评论 #4617092 未加载

评论 #4617045 未加载

评论 #4617015 未加载

评论 #4617091 未加载

评论 #4617046 未加载

julesover 12 years ago

Since he offers $5000 but you only need to pay $100 to participate it is quite easy to win money with the challenge. Ask for a file of 2 bits in length. If it is "11" then compress it to "1". Otherwise compress it to "0X" where X is the contents of the file. Now you have a 1 in 4 chance of winning the challenge. Since you pay $100 but win $5000, this earns good money. Plus it would be hard to argue that this is cheating.The problem is that you need to supply the decompressor as well. It would depend on how exactly he wants the decompressor be encoded, but the same strategy may still apply with modifications (although it would not work if the required encoding is as a bash script, since lg(100/5000) gives you just 2 bits to play with). [Or you could ask him to modify the challenge such that you are allowed to send him an encrypted decompressor before he sends you the data, and doesn't count towards the total file size (encrypted because otherwise he'll see your strategy and send you a file that it doesn't work on)]tl;dr: although you can't compress random with probability > 1/2, you can compress random data with probability > 100/5000.

评论 #4617597 未加载

评论 #4617347 未加载

debacleover 12 years ago

I wonder what the probability that a files is not compressible is.Even more, I wonder if the likelihood that a file is compressible increases, decreases, or is constant with relation to file size.My assumption would be that 50% of files are not compressible at a given size, and that compressibility doesn't vary with file size.

评论 #4618392 未加载

评论 #4618332 未加载

评论 #4618596 未加载

benjaminwoottonover 12 years ago

I guess I'm missing something here :-)<pre><code> ls -la /bin/tar 230464 Feb 25 2010 /bin/tar </code></pre> If tar can compress and decompress an arbitrary length file by more than 230464 bytes then doesn't it meet the challenge?Even if I've misunderstood, can anyone explain in plain english why this is a futile task?

评论 #4617067 未加载

评论 #4617362 未加载

评论 #4617076 未加载

评论 #4617052 未加载

评论 #4617075 未加载

评论 #4618642 未加载

eallocover 12 years ago

If you just have to decompress a single file, I don't see why this problem is impossible since you can tune your decompressor to that file.For example, files containing the digits of pi or e will have a uniform distribution of bytes (and are therefore incompressible by exploiting redundancy) but it is very easy to write a "decompressor" for them as there is a simple algorithm to obtain these numbers.Is there a simple algorithm for every possible number? Because then it just becomes a question of finding the algorithm that generates that number. My intuition tells me "yes" but that it may be very hard to find it. Plus, you don't need to find an algorithm for the whole file, just some subsection of it.

评论 #4617962 未加载

评论 #4619956 未加载

评论 #4617958 未加载

Tipzntrixover 12 years ago

Tsk tsk...hackers and a coding challenge are the last things you should except to be devoid of loopholes and tricks. Always be extra careful making them, and don't bet more than you can afford, or you'll come out with a rep like Mike.

Beltirasover 12 years ago

def decompress(filesize, sha1hash, openingbits, closingbits): for candidate in bits(filesize,openingbits): if sha1(candidate) == sha1hash: if candidate[-(len(closingbits)):] == closingbits: return candidate

评论 #4618393 未加载

评论 #4618244 未加载

bagosmover 12 years ago

Somebody please enlighten me. Is it that hard to find a pattern (for example fibonacci) that only has zeros up to F100, then remove those zeros?You dont have to start from the first bit, you can start from bit 11234 (information on the decompiler).So for example I skip 11134, then run +F(i) bits to the right and add a zero, then run +F(i + 1) bits to the right and add a new zero.. etc etc. The pattern has to be just long enough to cover the decompiler's size, which shouldn't be too much if scripting is allowed!I believe that discovery of such a pattern shouldn't be too hard in a single file, arbitrary large (but 3mb should be enough too)

评论 #4618663 未加载

bo1024over 12 years ago

This brings up some cool questions about compression. Can any experts help me out here?How many n-bit strings can be compressed? (I think this question is how many have a Kolmogorov complexity of less than n.)Also, what is the difference between the statements "no program can compress every string" and "not every string can be compressed by some program". I think both are true, but they're referring to different strings, so it's kind of odd....

评论 #4618757 未加载

评论 #4617591 未加载

tocommentover 12 years ago

Can anyone explain what this section is talking about? I'm completely not following it:cat $1 printf "X" cat $0 exit

评论 #4618489 未加载

评论 #4618192 未加载

评论 #4619118 未加载

tbirdzover 12 years ago

What he should've done is just create a new file whose name is a text encoding of the data in the original file, but it is 1 byte long. Then to decompress, just take the name of the file, and use that to create the data.

评论 #4617074 未加载

jimmybriteover 12 years ago

Let's all send him 100$, jackass.

lurker14over 12 years ago

This was the day that Mike learned the difference between theory and practice: In theory, there is no difference between theory and practice; but in practice, there is.When I read the challenge, and saw that arbitrary Unix programs were allowed, my first idea for a compressor was to scan the OS software for snippets of data that match the source data.Of course, that was 10 years ago. Today, in the Cloud world, I'd just write a decompressor that connects to my system over the network and retrieves the source data.

评论 #4617586 未加载

评论 #4618458 未加载

lurker14over 12 years ago

This was a great social engineering hack: The original challenge, poorly stated as it was, was not beaten. Craig started up a conversation with Mike, and tricked him into changing the rules. If Mike were careful, he would have paused to think about whether the proposed rule change would open the door to a hack.It's a reminder to all of us that correctness is an ongoing process, not a delivered product.Mike started the thread as a smart-alec Randi-esque elder, putting up money to shut up cranks. But he got blinded by his hubris.

评论 #4618325 未加载

lurker14over 12 years ago

Mike wrote a document (the challege), and offered a $5000 bug bounty. Patrick found a bug, Mike should have paid up and fixed the bug in the spec, just like Knuth pays for typos and Google pays for security bug reports.

24 comments

mm_alexover 12 years ago

adamtjover 12 years ago

评论 #4618102 未加载

kevinpetover 12 years ago

jereover 12 years ago

jpabloover 12 years ago

评论 #4617900 未加载

评论 #4619053 未加载

评论 #4617631 未加载

评论 #4618703 未加载

DaveListerover 12 years ago

hermannj314over 12 years ago

zwdrover 12 years ago

评论 #4617128 未加载

ykover 12 years ago

评论 #4617881 未加载

sp332over 12 years ago

评论 #4617092 未加载

评论 #4617045 未加载

评论 #4617015 未加载

评论 #4617091 未加载

评论 #4617046 未加载

julesover 12 years ago

评论 #4617597 未加载

评论 #4617347 未加载

debacleover 12 years ago

评论 #4618392 未加载

评论 #4618332 未加载

评论 #4618596 未加载

benjaminwoottonover 12 years ago

评论 #4617067 未加载

评论 #4617362 未加载

评论 #4617076 未加载

评论 #4617052 未加载

评论 #4617075 未加载

评论 #4618642 未加载

eallocover 12 years ago

评论 #4617962 未加载

评论 #4619956 未加载

评论 #4617958 未加载

Tipzntrixover 12 years ago

Beltirasover 12 years ago

评论 #4618393 未加载

评论 #4618244 未加载

bagosmover 12 years ago

评论 #4618663 未加载

bo1024over 12 years ago

评论 #4618757 未加载

评论 #4617591 未加载

tocommentover 12 years ago

Can anyone explain what this section is talking about? I'm completely not following it:cat $1 printf "X" cat $0 exit

评论 #4618489 未加载

评论 #4618192 未加载

评论 #4619118 未加载

tbirdzover 12 years ago

评论 #4617074 未加载

jimmybriteover 12 years ago

Let's all send him 100$, jackass.

lurker14over 12 years ago

评论 #4617586 未加载

评论 #4618458 未加载

lurker14over 12 years ago

评论 #4618325 未加载

lurker14over 12 years ago