Kolmogorov Complexity and Compression Distance (2023)

149 点作者 rgbimbochamp大约 1 年前

17 条评论

tromp大约 1 年前

> let’s assume that there exists a universal language UWhy not specify it?> That gives us the true language-agnostic definition of Kolmogorov Complexity as follows:Choosing the language of Turing Machines does not make the definition language agnostic.Aiming for the simplest definition of description complexity, I instead based my definitions on the older computational model of lambda calculus in [1].Unlike the assumed UTM above, the universal lambda machine is easy to describe in detail:<pre><code> (λ 1 1) (λ λ λ 1 (λ λ λ λ 3 (λ 5 (3 (λ 2 (3 (λ λ 3 (λ 1 2 3))) (4 (λ 4 (λ 3 1 (2 1)))))) (1 (2 (λ 1 2)) (λ 4 (λ 4 (λ 2 (1 4))) 5)))) (3 3) 2) (λ 1 ((λ 1 1) (λ 1 1))) </code></pre> Furthermore, it allows almost identical definitions of various variations of descriptional complexity, namely1) plain complexity2) prefix complexity3) monotone complexityall of which have their application in Algorithmic Information Theory [2].[1] <a href="https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d861" rel="nofollow">https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...</a>[2] <a href="https://homepages.cwi.nl/~paulv/kolmogorov.html" rel="nofollow">https://homepages.cwi.nl/~paulv/kolmogorov.html</a>

评论 #39879157 未加载

评论 #39881793 未加载

评论 #39877916 未加载

arketyp大约 1 年前

Richard von Mises (brother of the economist) formulated a definition of randomness as a sequence of data that, were you a gambler, you cannot by any strategy make money on betting on the outcomes. This was before computational calculus and was later developed by Kolmogorov and others in algorithmic complexity. The modern variation would be (Wiki) "considering a finite sequence random (with respect to a class of computing systems) if any program that can generate the sequence is at least as long as the sequence itself".

评论 #39878135 未加载

评论 #39878485 未加载

评论 #39880022 未加载

评论 #39879246 未加载

rhelz大约 1 年前

I think its bootless to try to define the "minimum possible" Kolmogorov complexity. Here's why:1. Note, kolmogorov complexity is defined by the length of the shortest program which prints out the string. What counts is the number of instructions, and not the complexity of those instructions.2. So say S is a very complex spring. We can always construct a turing machine which could print out S using a zero length program: it could just start in a state which prints out S when you turn it on, and then halts.3. So there is no such thing as a turing machine which prints out every string shorter than any other turing machine prints it out, QED.That's the bad news. The good news is we don't even need to do that. For any string S, say that M and N are any two universal turing machines. Without loss of generality, specify that KM(S) <= KN(S). Then there is always some C for which KM(S) <= KN(S) + C. The constant C being the length of the program required to emulate machine M on machine N.We are used to abstracting out constant sums and constant factors like this. The strings we are dealing with (as a species) are growing in length exponentially--that's why we went from 8-bit, to 16bit, etc computers. So as the length of S goes to infinity, the difference between the its complexity for any two machines becomes negligible.

anonzzzies大约 1 年前

Kolmogorov complexity is a lovely subject and one of the more influential ones in my life.THE book <a href="https://link.springer.com/book/10.1007/978-0-387-49820-1" rel="nofollow">https://link.springer.com/book/10.1007/978-0-387-49820-1</a> is absolutely a thing to read. It was for me 30 years ago and it aged well.

评论 #39878657 未加载

评论 #39883588 未加载

评论 #39888485 未加载

wood_spirit大约 1 年前

An excellent rabbit hole to dive into is the equivalence of compression and general AI. Every programmer should make a compressor (and, separately, a ray tracer)!See <a href="http://prize.hutter1.net/" rel="nofollow">http://prize.hutter1.net/</a>

评论 #39877831 未加载

评论 #39881070 未加载

JDEW大约 1 年前

> It has been demonstrated that KC(x), can be reasonably estimated by the number of bits required to encode x using a compressor C (such as gzip)Talk about a cliffhanger :)Using [0] you get 32B for Alice and 40B for Bob.[0] It has been demonstrated that KC(x), can be reasonably estimated by the number of bits required to encode x using a compressor C (such as gzip)

causal大约 1 年前

Confused how the interesting number paradox proves KC cannot be computed.

评论 #39877707 未加载

评论 #39877584 未加载

评论 #39877741 未加载

评论 #39877768 未加载

davesque大约 1 年前

Something I've always noticed with the notion of Kolmogorov complexity is that the question of determining the lowest level of computation is problematic.For example, in the article, the author first defines the basic idea of KC. But then they correctly point out that the basic idea depends very much on the exact language that is chosen. So they describe how theorists have defined the notion of universal computation. But even this adjustment doesn't seem to escape the fact the we still depend on a system of mathematical symbols to describe the theory. And the notion of a Turing machine itself depends on other abstract concepts such as time and space, each with their own inherent, conceptual complexity. What sorts of minds (i.e. brains) are required to make sense out of the theory and what physical system is required for them to operate correctly? If the definition of KC includes a notion of how complex the Turing machine is that is required to compute a string, then the further down you go, the less the difference in complexity should be between any one string and another. After all, they all exist in the same universe!I guess it just goes to show how much the idea of KC lives in the realm of theory. As soon as you pose the question of complexity so abstractly, you invite in all kinds of theoretical considerations that make the meaning more slippery. That's why KC really doesn't deserve to be compared to Shannon entropy as it often is.But let me draw a comparison anyway like I said you shouldn't! Because Alice from the article could also have made a strong argument against Bob by just pointing out that the Shannon entropy of his string was lower, which is very relevant in terms of the number of heads or tails and the likelihood of seeing a particular count of them.

评论 #39878776 未加载

评论 #39883331 未加载

评论 #39879518 未加载

评论 #39882062 未加载

copx大约 1 年前

>Bob claims that since the probability of getting both his and Alice’s sequence is the same (2−20 ), it proves that there was no foul-play involved...and Bob is 100% right.>Bob credits his excellent luck. Alice is smart and cannot be easily convinced. She get’s back at Bob by claiming that probability cannot be used in this context as it reveals no information regarding the randomness of the obtained sequences. One can take a quick glance at the obtained sequences and easily point out that Alice’s sequence is more random than Bob’s sequence.No, it is not. Given a perfectly random coin toss Bob's sequence is indeed just as likely as Alice's sequence and in no way "less random" because both sequences result from the same randomness with equal probability.A nice example of human intuition being at odds with probability math, though. Bob's result seems less likely but it really is not. Which reminds me that I actually had to write my own computer simulation of the Monty Hall Problem before I was willing to believe the correct answer. I think (most?) human brains have a bug in the "understanding probability" subroutine.

评论 #39877886 未加载

评论 #39877965 未加载

评论 #39877955 未加载

mojomark大约 1 年前

I'm going to keep reading (because I love the KC topic), but I'd appreciate anyone confirming if the following are errors in this article:1.) Conflating usage of the term "random" and "complexity". After all, a set of "randomly" drawn sample permutations from an alphabet are all equally likely. However, their "complexity" may differ, which is basically the point of the article, but the term more or less "random" keeps being used to refer to permutations with more or less "complexity", which I think is probably going to perpetuate confusion on this topic.2.) From the article: "Moreover, a string cannot be compressed if its KC(x)≥|x|". Shouldn't the expression accompanying this statement be KC(x)=|x| ?

评论 #39877944 未加载

评论 #39877940 未加载

评论 #39877975 未加载

pizza大约 1 年前

I think maybe another way to put this is that Alice's number is in a typical set [0] of the distribution of bitstrings whereas Bob's might not be. Depending on the tolerance, the typical set can have near-total coverage of the distribution. Another way of making this about compression is that a random code that could encode typical set strings well probably will suffer some overhead when encoding Bob's, but most strings it will encode close to optimally.[0] <a href="https://en.wikipedia.org/wiki/Typical_set" rel="nofollow">https://en.wikipedia.org/wiki/Typical_set</a>

alfanick大约 1 年前

A side question: is this taught in CS curriculum you know? It was at my uni (fairly good one, in a minor European country), and this experience biases me because I assume every CS knows Kolmogorov complexity.

评论 #39879967 未加载

评论 #39879447 未加载

yamrzou大约 1 年前

Well, I reached the end of the article (interesting btw), and still not convinced why bob can't claim that there was no foul-play involved and that his got his result due to excellent luck.

评论 #39877819 未加载

avmich大约 1 年前

> Another thing to note is that Kolmogorov complexity of a string cannot be computed. There cannot exist a computer that will always guarantee the Kolmogorov complexity for all the strings.Sounds a bit puzzling. Surely for a particular programming language we can enumerate all programs, ordered by length etc. and check which is the shortest one giving the given string. So what's uncomputable here? For long strings that could take long time, but - ?

评论 #39877887 未加载

评论 #39878177 未加载

评论 #39877992 未加载

评论 #39877876 未加载

robrenaud大约 1 年前

If you want something like Kolmogorev complexity for molecules, check out assembly theory. I am a CS person, but there are interesting, related ideas here.<a href="https://en.m.wikipedia.org/wiki/Assembly_theory" rel="nofollow">https://en.m.wikipedia.org/wiki/Assembly_theory</a>

Ono-Sendai大约 1 年前

Kolmogorov Complexity does not help with giving a universal measure of complexity or randomness: <a href="https://forwardscattering.org/page/0" rel="nofollow">https://forwardscattering.org/page/0</a>

评论 #39881207 未加载

marius_k大约 1 年前

Sometimes I wonder what would be the smallest program to generate humans DNA. How many operations would it take and how would it compare to real world iterations of total evolution.

评论 #39877748 未加载

评论 #39877767 未加载

评论 #39879142 未加载