kolmogorov complexity

Kolmogorov Complexity (KC)

Kolmogorov Complexity, denoted $K (x)$ , quantifies the intrinsic algorithmic complexity of an individual object, typically a string $x$ . It is defined as the length (in bits) of the shortest computer program $p$ that, when run on a fixed universal Turing machine (UTM) $U$ , outputs $x$ and then halts.
$K_{U} (x) = p : U (p) = x min ∣ p ∣$
Here, $U (p) = x$ means the program $p$ executed on UTM $U$ produces $x$ , and $∣ p ∣$ is the length of program $p$ in bits. $K (x)$ is thus the size of the ultimate compressed version of $x$ .

Invariance Theorem

The choice of the universal Turing machine $U$ does not fundamentally alter the Kolmogorov Complexity. For any two UTMs $U_{1}$ and $U_{2}$ , there exists a constant $c$ (which depends on $U_{1}$ and $U_{2}$ but not on $x$ ) such that for all strings $x$ :
$∣ K_{U_{1}} (x) - K_{U_{2}} (x) ∣ \leq c$
This means $K (x)$ is an objective, machine-independent measure of complexity, up to an additive constant. This constant represents the “overhead” of translating programs or describing one UTM in terms of the other.

Uncomputability

Kolmogorov Complexity $K (x)$ is not computable. There is no general algorithm that can take an arbitrary string $x$ as input and output $K (x)$ . This is a consequence of the halting problem. While we can find upper bounds by compressing $x$ with a specific compressor (the length of the compressed string is an upper bound on $K (x)$ ), we can never be certain if a shorter program exists.

Meaning of KC Values

Low $K (x)$ : The string $x$ is considered simple or highly compressible. It contains regularities or patterns that can be described by a short program. For example, the string “aaaa…a” (one million ‘a’s) has low $K (x)$ because it can be generated by a program like “print ‘a’ 1,000,000 times”.

High $K (x)$ : The string $x$ is considered complex, random-like, or incompressible. Its shortest description is essentially the string itself, meaning $K (x) \approx ∣ x ∣ + c^{'}$ , where $∣ x ∣$ is the length of $x$ and $c^{'}$ is a small constant. A truly random binary string of length $n$ would have $K (x) \approx n$ .

Conditional Kolmogorov Complexity

The conditional Kolmogorov Complexity $K (x ∣ y)$ measures the complexity of string $x$ given that string $y$ is already known (provided as auxiliary input to the program).
$K_{U} (x ∣ y) = p : U (p, y) = x min ∣ p ∣$
This is the length of the shortest program $p$ that computes $x$ when given $y$ . If $y$ contains a lot of information about $x$ , then $K (x ∣ y)$ will be much smaller than $K (x)$ . For instance, $K ("ababab" ∣ "ab") \approx K ("repeat input 3 times")$ .

Significance & Role

KC provides a fundamental, objective measure of information content and randomness. It is a cornerstone in:

Algorithmic Information Theory: Defining the theoretical limits of data compression.

Foundations of Probability: Providing a basis for Solomonoff’s theory of universal inductive inference.

Philosophy of Science: Offering a formalization of Occam’s Razor, where simpler explanations (shorter programs) are preferred.

occam’s razor vs solomnoff induction vs minimum description length principle vs kolmogorov complexity

Occam’s Razor as the inspiration: Both MDL and Solomonoff Induction can be seen as mathematical formalizations of Occam’s Razor. They replace the vague notion of “simplicity” or “fewest assumptions” with the precise measure of “description length” or “program length.”
Solomonoff as the theoretical ideal: Solomonoff Induction provides the most general and theoretically optimal formalization. It essentially says the “simplest” explanation (shortest program) for the data is the most probable.
MDL as a practical application: MDL takes the core idea from Solomonoff/Kolmogorov complexity and makes it applicable to real-world statistical modeling and machine learning problems by using specific (computable) ways to define and measure the description length of models and data.
Kolmogorov Complexity: The precise, theoretical, but uncomputable measure of “algorithmic simplicity” for an individual object.

Occam’s Razor:

Nature: A philosophical principle or heuristic.
Statement (paraphrased): “Among competing hypotheses, the one with the fewest assumptions should be selected.” Or, “Entities should not be multiplied beyond necessity.”
Goal: To guide towards simpler explanations, which are often considered more likely to be true, easier to test, and more generalizable.
Formality: Informal, qualitative. It doesn’t precisely define “simplicity” or “assumption.”

Minimum Description Length (MDL) Principle:

Nature: A formal, information-theoretic, and statistical principle for inductive inference and model selection.
Statement (paraphrased): “The best hypothesis (model) for a given set of data is the one that leads to the shortest overall description of the data and the model itself.”
Goal: To find a model that compresses the data best. This involves a trade-off: a more complex model might fit the data perfectly (short description of data given the model) but the model itself will have a long description. A simpler model might have a longer description of data given the model, but the model itself is short. MDL seeks the minimum of Length(Model) + Length(Data | Model).
Formality: Formal, quantitative. Description length is measured in bits (using concepts related to Kolmogorov Complexity, though often practical approximations are used).
Computability: While true Kolmogorov complexity is uncomputable, MDL uses computable approximations and specific coding schemes, making it a practical tool in statistics and machine learning.

Solomonoff Induction (Algorithmic Probability):
Nature: A formal, theoretical framework for inductive inference and prediction, often considered the “gold standard” for optimal prediction.
Statement (paraphrased): Given a sequence of observations, the probability of a particular continuation is the sum of probabilities of all programs (for a universal Turing machine) that generate the observed sequence and then that continuation, where the probability of a program is 2^(-length of program). Essentially, shorter programs (simpler explanations) get higher prior probability.
Goal: To provide a universal and optimal method for predicting the next element in a sequence based on prior observations, by weighting all possible explanations (programs) by their simplicity (length).
Formality: Highly formal, based on algorithmic information theory and Turing machines. It provides a Bayesian framework with a universal prior (Kolmogorov complexity).
Computability: Fundamentally uncomputable due to its reliance on Kolmogorov complexity and the halting problem (we can’t know if an arbitrary program will halt, let alone its shortest description). It’s a theoretical ideal.

Link to original

Max Wolf's Second Brain

Explorer

kolmogorov complexity

Graph View

Backlinks