Paradoxical Compression with VDF

88 pointsby rdpintqogeogsaaover 3 years ago

9 comments

madarsover 3 years ago

Very nice! A tl;dr with spoilers -- for a typical compression algorithm C(x), applying it over and over again C(x), C(C(x)), C(C(C(x))), ... we would quickly arrive at an input for which the compressed length increases. Here the author proposes C(x) which either equals x (no compression), or encodes a normal compression (e.g. gzip) plus a MAC'd counter. Applying C(x) on the latter case again would simply increment the counter, so an adversary who only issues a bounded number of compression requests gets "stuck" on a chain that contains no contradiction. (The attacker could try to get to y = C(x) with a large counter value without compressor's help but that requires predicting the MAC which is infeasible.) The requirement for C and D to share a secret (here, a MAC key) can be removed by replacing MAC with a VDF proof that you spent O(counter) sequential time producing the compressed output, so a computationally bounded attacker can't overflow the counter (in practice, 128 bit counter requiring ~2^128 time) and disprove the paradox.

im3w1lover 3 years ago

Very neat, but my preference would be for it to very rarely expand the file, instead of very rarely rarely decompressing wrong. Then you could actually use it in practice. It would be pretty straightforward to change it as far as I can tell.Edit: If c+1 would wrap around, you could fallback on encoding as C0(x) || enc(0) || Mac(C0(x) || enc(0)).

评论 #28740161 未加载

评论 #28737438 未加载

a1369209993over 3 years ago

> "infinite compression", in which it is claimed that every possible input can be reduced in size.It's computationally intractable (in fact, I'm pretty sure it's computational-intractability-complete), but if you live in a deterministic universe, it actually is possible to compress any input to at most a psuedo-constant number of bits.Just run every possible program in some Turing-complete language in parallel, and pick the shortest program that outputs your input. In the worst case, this will be a program that simulates the entire history of your universe up to the point where you were provided with the input and then reads out the simulated input. No matter how large the input is, this will only take (U+logXYZT) bits, where U is the number of bits to describe the laws of physics and initial state of the universe, and logXYZT is the number of bits to describe your position in space and time (among other things, it's logarithmic in the current age of the universe).(Caution: there may be elditch abominations in your universal prior[0], but as long as you stick to programs that output a fixed input and don't try to fiddle with them to decompress more data than you originally put in, they probably can't hurt you.)0: <a href="https://ordinaryideas.wordpress.com/2016/11/30/what-does-the-universal-prior-actually-look-like/" rel="nofollow">https://ordinaryideas.wordpress.com/2016/11/30/what-does-the...</a>

评论 #28745434 未加载

Arkanosisover 3 years ago

Another implementation of paradoxical compression, which, applied recursively also allows for infinite compression (ie. down to 0 byte) of any file:<a href="https://cs.fit.edu/~mmahoney/compression/barf.html" rel="nofollow">https://cs.fit.edu/~mmahoney/compression/barf.html</a>Of course, there's a catch.

评论 #28738397 未加载

评论 #28738507 未加载

评论 #28741798 未加载

DemocracyFTWover 3 years ago

> Paradoxical compression is mathematically impossible. This repository contains a description of how to nonetheless achieve it, as well as a demo implementation.That's AWESOME show me!> Of course I am slightly cheating heremkay... :-(

willvarfarover 3 years ago

Off the top of my head, how about:Compression: gzip the file and tacking a crypto hash on the end. Discard if not shorter and store the original instead.Decompression: if has gzip header and the crypto hash at the end matches, unzip. Else return source as is.Have I just paradoxically compressed something?

评论 #28737733 未加载

评论 #28737744 未加载

评论 #28737846 未加载

评论 #28737746 未加载

slaymaker1907over 3 years ago

Paradoxical compression really isn't that bad of a problem in practice since you can just dedicate a single bit at the start of the data to indicate whether a file is compressed or not. This guarantees that a file is no more than 1 byte larger than it would be "uncompressed" (since in this case we just add on a byte to indicate we have not used our regular compression algorithm).

评论 #28740115 未加载

emilfihlmanover 3 years ago

>Of course I am slightly cheating here, but the cheat is conceptually interesting (and fun). The idea is that paradoxical compression might have only a small density of problematic cases (inputs where the algorithm misbehaves, e.g. by expanding the length or not decompressing to the original input) and tools from the field of cryptography can be used to "push away" these cases so that they mathematically exist but nobody can find one with nonnegligible probability. In essence, this is transforming the question from "is it mathematically possible?" to "can I claim to do it and not being detected as a fraud?". This change of model makes paradoxical compression possible.Isn't this trivially wrong?If you have a bitstring s and you compress it to c, where the length of s is larger than length of c, compressing c again produces an issue.For example, if you compress a bitstring to just 1, how do you compress 1? To 0? And how do you compress 0 then? We can use this principle to find uncompressable bitstrings of any length anywhere for any algorithm.An idea pops into my head regarding this, though: in signal analysis there's a concept called noise shaping (which matches OPs ideas to a great degree), the idea is to push any noise out from some band and into another, it's super neat actually.Could this idea be expanded such that there is some singular monster string that expands to an absurd size, while others stay the same or are smaller? How about multiple but still small in density?My intuition sadly says no because of what I started this comment with. It's trivial to find strings that aren't compressible.Please let me know if and where my ideas on this go wrong!

unnouinceputover 3 years ago

Quote: "A lossless compression algorithm that can actually reduce the size of at least one input file must also necessarily expand the size of at least one other possible input file. This is a direct application of the pigeonhole principle. A compression algorithm that claims to never increase the size of any input, while at the same time reducing the size of at least some inputs (it's not the trivial do-nothing compression that, by definition, preserves the length of the input), is called paradoxical compression."This paradoxical compression, as stated above is incomplete. You can achieve all input files to be compressed without having to expand any other file but you would expand time instead. Let me explain it: - Any set of data can be expressed using mathematical functions/algorithms that can be saved as one file, very small in data compared with original data, but to do so you'll need to invest a lot of time to find those functions/algorithms. One practical example would be a mission to Alpha Centauri which might want to have Earth's internet on board but no physical space for all that data. Let's say it takes 10 years to prepare the mission which in that case you'd also use this 10 years to compress, for argument sake, Google's big index, (or NSA Utah center if you like that one) and you'll have on board, on a single thumb stick, Internet version from 10 years before launch.Edit: Since some people have problems understanding this extreme compression, I am trying to clarify this further. This method is unique to that set of data. And it can be applied to only that data, only one time and that's it. Not a general set of algorithms so you can compress anything else. It will fail if not applied to that data. It is not a classic compression, it is more, if you like the term, a data generator. It will run a program that at the end of its run will generate that Internet version you started with 10 years ago. But to do that you'll need to research how to do that data generating, hence the extreme time it takes to achieve. In practice I'll envision a lot of people getting their slice of data and do one function. And the program will run those functions one after another, generating the data.

评论 #28737702 未加载

评论 #28737536 未加载

评论 #28737575 未加载

评论 #28739563 未加载