Kolmogorov Complexity (KC)

Kolmogorov Complexity, denoted , quantifies the intrinsic algorithmic complexity of an individual object, typically a string . It is defined as the length (in bits) of the shortest computer program that, when run on a fixed universal Turing machine (UTM) , outputs and then halts.

Here, means the program executed on UTM produces , and is the length of program in bits. is thus the size of the ultimate compressed version of .

Invariance Theorem

The choice of the universal Turing machine does not fundamentally alter the Kolmogorov Complexity. For any two UTMs and , there exists a constant (which depends on and but not on ) such that for all strings :

This means is an objective, machine-independent measure of complexity, up to an additive constant. This constant represents the “overhead” of translating programs or describing one UTM in terms of the other.

Uncomputability

Kolmogorov Complexity is not computable. There is no general algorithm that can take an arbitrary string as input and output . This is a consequence of the Halting Problem. While we can find upper bounds by compressing with a specific compressor (the length of the compressed string is an upper bound on ), we can never be certain if a shorter program exists.

Meaning of KC Values

  • Low : The string is considered simple or highly compressible. It contains regularities or patterns that can be described by a short program. For example, the string “aaaa…a” (one million ‘a’s) has low because it can be generated by a program like “print ‘a’ 1,000,000 times”.
  • High : The string is considered complex, random-like, or incompressible. Its shortest description is essentially the string itself, meaning , where is the length of and is a small constant. A truly random binary string of length would have .

Conditional Kolmogorov Complexity

The conditional Kolmogorov Complexity measures the complexity of string given that string is already known (provided as auxiliary input to the program).

This is the length of the shortest program that computes when given . If contains a lot of information about , then will be much smaller than . For instance, .

Significance & Role

KC provides a fundamental, objective measure of information content and randomness. It is a cornerstone in:

  • Algorithmic Information Theory: Defining the theoretical limits of data compression.
  • Foundations of Probability: Providing a basis for Solomonoff’s theory of universal inductive inference.
  • Philosophy of Science: Offering a formalization of Occam’s Razor, where simpler explanations (shorter programs) are preferred.

Link to original