Kolmogorov Complexity (KC)
Kolmogorov Complexity, denoted , quantifies the intrinsic algorithmic complexity of an individual object, typically a string . It is defined as the length (in bits) of the shortest computer program that, when run on a fixed universal Turing machine (UTM) , outputs and then halts.
Here, means the program executed on UTM produces , and is the length of program in bits. is thus the size of the ultimate compressed version of .
Invariance Theorem
The choice of the universal Turing machine does not fundamentally alter the Kolmogorov Complexity. For any two UTMs and , there exists a constant (which depends on and but not on ) such that for all strings :
This means is an objective, machine-independent measure of complexity, up to an additive constant. This constant represents the “overhead” of translating programs or describing one UTM in terms of the other.
Uncomputability
Kolmogorov Complexity is not computable. There is no general algorithm that can take an arbitrary string as input and output . This is a consequence of the Halting Problem. While we can find upper bounds by compressing with a specific compressor (the length of the compressed string is an upper bound on ), we can never be certain if a shorter program exists.
Meaning of KC Values
- Low : The string is considered simple or highly compressible. It contains regularities or patterns that can be described by a short program. For example, the string “aaaa…a” (one million ‘a’s) has low because it can be generated by a program like “print ‘a’ 1,000,000 times”.
- High : The string is considered complex, random-like, or incompressible. Its shortest description is essentially the string itself, meaning , where is the length of and is a small constant. A truly random binary string of length would have .
Conditional Kolmogorov Complexity
The conditional Kolmogorov Complexity measures the complexity of string given that string is already known (provided as auxiliary input to the program).
This is the length of the shortest program that computes when given . If contains a lot of information about , then will be much smaller than . For instance, .
Significance & Role
KC provides a fundamental, objective measure of information content and randomness. It is a cornerstone in:
- Algorithmic Information Theory: Defining the theoretical limits of data compression.
- Foundations of Probability: Providing a basis for Solomonoff’s theory of universal inductive inference.
- Philosophy of Science: Offering a formalization of Occam’s Razor, where simpler explanations (shorter programs) are preferred.
Link to originalOccam’s Razor as the inspiration: Both MDL and Solomonoff Induction can be seen as mathematical formalizations of Occam’s Razor. They replace the vague notion of “simplicity” or “fewest assumptions” with the precise measure of “description length” or “program length.”
Solomonoff as the theoretical ideal: Solomonoff Induction provides the most general and theoretically optimal formalization. It essentially says the “simplest” explanation (shortest program) for the data is the most probable.
MDL as a practical application: MDL takes the core idea from Solomonoff/Kolmogorov complexity and makes it applicable to real-world statistical modeling and machine learning problems by using specific (computable) ways to define and measure the description length of models and data.
Kolmogorov Complexity: The precise, theoretical, but uncomputable measure of “algorithmic simplicity” for an individual object.
Occam’s Razor:
Nature: A philosophical principle or heuristic.
Statement (paraphrased): “Among competing hypotheses, the one with the fewest assumptions should be selected.” Or, “Entities should not be multiplied beyond necessity.”
Goal: To guide towards simpler explanations, which are often considered more likely to be true, easier to test, and more generalizable.
Formality: Informal, qualitative. It doesn’t precisely define “simplicity” or “assumption.”Minimum Description Length (MDL) Principle:
Nature: A formal, information-theoretic, and statistical principle for inductive inference and model selection.
Statement (paraphrased): “The best hypothesis (model) for a given set of data is the one that leads to the shortest overall description of the data and the model itself.”
Goal: To find a model that compresses the data best. This involves a trade-off: a more complex model might fit the data perfectly (short description of data given the model) but the model itself will have a long description. A simpler model might have a longer description of data given the model, but the model itself is short. MDL seeks the minimum of Length(Model) + Length(Data | Model).
Formality: Formal, quantitative. Description length is measured in bits (using concepts related to Kolmogorov Complexity, though often practical approximations are used).
Computability: While true Kolmogorov complexity is uncomputable, MDL uses computable approximations and specific coding schemes, making it a practical tool in statistics and machine learning.Solomonoff Induction (Algorithmic Probability):
Nature: A formal, theoretical framework for inductive inference and prediction, often considered the “gold standard” for optimal prediction.
Statement (paraphrased): Given a sequence of observations, the probability of a particular continuation is the sum of probabilities of all programs (for a universal Turing machine) that generate the observed sequence and then that continuation, where the probability of a program is 2^(-length of program). Essentially, shorter programs (simpler explanations) get higher prior probability.
Goal: To provide a universal and optimal method for predicting the next element in a sequence based on prior observations, by weighting all possible explanations (programs) by their simplicity (length).
Formality: Highly formal, based on algorithmic information theory and Turing machines. It provides a Bayesian framework with a universal prior (Kolmogorov complexity).
Computability: Fundamentally uncomputable due to its reliance on Kolmogorov complexity and the halting problem (we can’t know if an arbitrary program will halt, let alone its shortest description). It’s a theoretical ideal.