surprised all the comments here are about whether he achieved compression or not. Point = Missed!<p>the entire thing was an ingenious trolling exercise, as he says, to 'out trick the trickster'. his goal was to exploit the loose wording of the challenge, and prove a point (possibly winning $5k in the process); I found it an entertaining story along those lines.<p>it proves nothing (new) about compression, or lack of it; he even stats that the consensus at the time was 'unanimous' that no compression had taken place. that's not where the interest in this link/story lies.<p>thanks for the original link, OP.
The argument is over whether the challenge is identical with its spirit or with its statement. Patrick clearly failed the spirit of the challenge. He tried to exploit a loophole in the statement of the challenge. So what which is the actual challenge?<p>Actually, it doesn't matter. Patrick failed to complete the challenge <i>as stated</i>, which is important because he was exploiting a loophole in the statement. Technically, his files were smaller. But technically, he was required to <i>send</i> Mike his files and decompresser. Instead, he put them on his server and sent Mike their location. Mike then had to <i>get</i> the files. It goes both ways.<p>Finding loopholes, while clever, isn't terribly interesting.
Look at the relevant part of the exchange:<p>> > I meant can I send you a compressor and several compressed files whose total file size is less than the original uncompressed file and from which I can regenerate the original uncompressed file
> > Patrick<p>> Sure -- but you send me a <i>decompressor</i>, I don't need the compressor.<p>Was this "compressor" vs. "decompressor" actually a slip up, or intentional to distract from the importance of what he was asking?
I read through the compression FAQ and found it quite interesting. I had never heard the counting argument before, but it's very intuitive.<p>Part of the FAQ is a collection of crazies claiming that they have created novel compression techniques. One entry is about Jules Gilbert, but it ends abruptly. I googled him and found that the crazy continues....<p>>I am afraid that I am not a fan of Google. They are gay-friendly and also they suppressed lot's of stuff about Obama before he was nominated, and again, before he was elected.<p><a href="http://scientopia.org/blogs/goodmath/2010/08/13/969/" rel="nofollow">http://scientopia.org/blogs/goodmath/2010/08/13/969/</a>
I wonder if Mike Goldman ever made a response after everything was published. I found very distasteful when he threatened to accuse Patrick of fraud.<p>He should have recognized that the rules where vaguely stated and at the very least offer an apology. Plain refusing to download all his files, when he had expressly accepted multiple files at being ok was kind of weird.<p>The google groups link in the page no longer works so I wonder if that's archived elsewhwere.
Obviously Patrick met the requirements of the loosely worded challenge.<p>Though I wonder if the inverse would be acceptable. Take 100 files all the same size, but smaller than the filesystem block size. These files are n<i>100 bytes, but take up blocksize</i>100 bytes on the disk. Then, using tar (which itself does not compress, just tacks files end to end with metadata in between), 'compress' them into a single file that takes up fewer disk blocks than before.
Just to clarify, the guy made a contest that he knew, statistically, could not be won except if the random data file he produced just happened to be compressible (due to luck alone).<p>In other words, he knowingly accepted money in a contest of chance not a contest of skill. So this contest (assuming either participant was in the US at the time) violated federal and state laws against unlicensed gambling operations, correct? [Assuming Mike Goldman did not possess a license to operate an online casino in 2001]
I dont think he completed the challenge. He found a loophole, so _maybe_ you could say he was successful. In exploiting the rules that is, not in solving the problem that Goldman stated. Still a clever solution, but obviously not what was asked for.
Perhaps a stupid question, but is the challenge not also solvable with 'honest' comp.sci?
The way I think about it at the moment, is that I take a family of functions, C_k, which map the space of n elements strings to the space of [1, 2n] element strings ( the union of m element strings for 0 > m > 2n) in a essentially random way. The output is then in [1, n - sizeof(C_k) -1] with some probability p and if I try long enough, I find a compression scheme which compresses the given input.
<i>I think anyone would agree that if you sent me an uncompressed file of size 1000 bytes and I sent you a decompressor of size 400 bytes and a compressed file of size 599 bytes, I would have met the challenge in spite of the fact that these two files will take up more disk space than the original file.</i><p>I don't agree, since the compressor can be tuned to the data you could just move some of the data into the "decompressor" and end up with an arbitrarily small "compressed" file.
Since he offers $5000 but you only need to pay $100 to participate it is quite easy to win money with the challenge. Ask for a file of 2 bits in length. If it is "11" then compress it to "1". Otherwise compress it to "0X" where X is the contents of the file. Now you have a 1 in 4 chance of winning the challenge. Since you pay $100 but win $5000, this earns good money. Plus it would be hard to argue that this is cheating.<p>The problem is that you need to supply the decompressor as well. It would depend on how exactly he wants the decompressor be encoded, but the same strategy may still apply with modifications (although it would not work if the required encoding is as a bash script, since lg(100/5000) gives you just 2 bits to play with). [Or you could ask him to modify the challenge such that you are allowed to send him an encrypted decompressor <i>before</i> he sends you the data, and doesn't count towards the total file size (encrypted because otherwise he'll see your strategy and send you a file that it <i>doesn't</i> work on)]<p>tl;dr: although you can't compress random with probability > 1/2, you can compress random data with probability > 100/5000.
I wonder what the probability that a files is not compressible is.<p>Even more, I wonder if the likelihood that a file is compressible increases, decreases, or is constant with relation to file size.<p>My assumption would be that 50% of files are not compressible at a given size, and that compressibility doesn't vary with file size.
I guess I'm missing something here :-)<p><pre><code> ls -la /bin/tar
230464 Feb 25 2010 /bin/tar
</code></pre>
If tar can compress and decompress an arbitrary length file by more than 230464 bytes then doesn't it meet the challenge?<p>Even if I've misunderstood, can anyone explain in plain english why this is a futile task?
If you just have to decompress a single file, I don't see why this problem is impossible since you can tune your decompressor to that file.<p>For example, files containing the digits of pi or e will have a uniform distribution of bytes (and are therefore incompressible by exploiting redundancy) but it is very easy to write a "decompressor" for them as there is a simple algorithm to obtain these numbers.<p>Is there a simple algorithm for every possible number? Because then it just becomes a question of finding the algorithm that generates that number. My intuition tells me "yes" but that it may be very hard to find it. Plus, you don't need to find an algorithm for the whole file, just some subsection of it.
Tsk tsk...hackers and a coding challenge are the last things you should except to be devoid of loopholes and tricks. Always be extra careful making them, and don't bet more than you can afford, or you'll come out with a rep like Mike.
def decompress(filesize, sha1hash, openingbits, closingbits):
for candidate in bits(filesize,openingbits):
if sha1(candidate) == sha1hash:
if candidate[-(len(closingbits)):] == closingbits:
return candidate
Somebody please enlighten me. Is it that hard to find a pattern (for example fibonacci) that only has zeros up to F100, then remove those zeros?<p>You dont have to start from the first bit, you can start from bit 11234 (information on the decompiler).<p>So for example I skip 11134, then run +F(i) bits to the right and add a zero, then run +F(i + 1) bits to the right and add a new zero.. etc etc. The pattern has to be just long enough to cover the decompiler's size, which shouldn't be too much if scripting is allowed!<p>I believe that discovery of such a pattern shouldn't be too hard in a single file, arbitrary large (but 3mb should be enough too)
This brings up some cool questions about compression. Can any experts help me out here?<p>How many n-bit strings can be compressed? (I think this question is how many have a Kolmogorov complexity of less than n.)<p>Also, what is the difference between the statements "no program can compress every string" and "not every string can be compressed by some program". I think both are true, but they're referring to different strings, so it's kind of odd....
What he should've done is just create a new file whose name is a text encoding of the data in the original file, but it is 1 byte long. Then to decompress, just take the name of the file, and use that to create the data.
This was the day that Mike learned the difference between theory and practice: In theory, there is no difference between theory and practice; but in practice, there is.<p>When I read the challenge, and saw that arbitrary Unix programs were allowed, my first idea for a compressor was to scan the OS software for snippets of data that match the source data.<p>Of course, that was 10 years ago. Today, in the Cloud world, I'd just write a decompressor that connects to my system over the network and retrieves the source data.
This was a great social engineering hack:
The original challenge, poorly stated as it was, was not beaten. Craig started up a conversation with Mike, and tricked him into changing the rules. If Mike were careful, he would have paused to think about whether the proposed rule change would open the door to a hack.<p>It's a reminder to all of us that correctness is an ongoing process, not a delivered product.<p>Mike started the thread as a smart-alec Randi-esque elder, putting up money to shut up cranks. But he got blinded by his hubris.
Mike wrote a document (the challege), and offered a $5000 bug bounty. Patrick found a bug, Mike should have paid up and fixed the bug in the spec, just like Knuth pays for typos and Google pays for security bug reports.