Max Wolf's Second Brain

      • activation space
      • actor critic
      • Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning
      • advantage
      • agency
      • agent
      • AI-GAs - AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
      • AIXI
      • aleatoric uncertainty
      • algebra
      • Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
      • artificial life
      • artificial neural network
      • astrocyte
      • attention (general)
      • attention (ML)
      • Automating the Search for Artificial Life with Foundation Models
      • batch normalization
      • Bayes Theorem
      • bayesian neural network
      • beauty
      • bernoulli
      • bias
      • bias-variance tradeoff
      • bijective
      • binomial coefficient
      • binomial distribution
      • bootstrapping
      • borel set
      • bourgeois
      • bourgeois democracy
      • Carmack UpperBound25
      • cartesian product
      • categories
      • causal attention
      • cellular automata
      • central moment
      • chain rule of probability
      • closure
      • cma-es
      • column space
      • combinatorics
      • computational irreducibility
      • conditional probability
      • congruence
      • congruence class
      • consciousness
      • Consciousness as a coherence-inducing operator - Cosciousness is virtual
      • continuous
      • Conversation between Josh Bongard, Atoosa Parsa, Richard Watson, and I
      • correlation matrix
      • coset
      • cosine similarity
      • covariance
      • credit assignment
      • critical state
      • cross product
      • cross-attention
      • cross-entropy
      • cross-entropy loss
      • cumulative distribution function
      • curiosity
      • Curiosity-driven Exploration by Self-supervised Prediction
      • curse of dimensionality
      • cycle-consistency
      • cyclic group
      • Data-Efficient Reinforcement Learning with Self-Predictive Representations
      • determinant
      • diagonal matrix
      • differentiability
      • discipline
      • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model
      • dot product
      • DQN
      • Dream To Control - Learning Behaviours By Latent Imagination
      • DropConnect
      • dropout
      • echo state network
      • eigendecomposition
      • eigenspace
      • eigenvalue
      • eligibility trace
      • Embracing curiosity eliminates the exploration-exploitation dilemma
      • Encoding innate ability through a genomic bottleneck
      • EngramNCA - A Neural Cellular Automaton Model of Memory Transfer
      • ensembles
      • epistemic
      • epistemic uncertainty
      • epsilon greedy
      • equivariance
      • ES-HyperNEAT
      • Evolution Strategies as a Scalable Alternative to Reinforcement Learning
      • evolutionary optimization
      • expected value
      • exponential distribution
      • extended euclidian algorithm
      • factor group
      • factorial
      • factorization
      • first order optimization
      • fisher information
      • free will
      • frobenius norm
      • function
      • gaussian elimination
      • general intelligence
      • General intelligence requires rethinking exploration
      • General-Purpose In-Context Learning by Meta-Learning Transformers
      • generalization
      • Generative Adverserial Nets
      • goal
      • graph
      • group normalization
      • High-Dimensional Continuous Control Using Generalized Advantage Estimation
      • homomorphism
      • HyperNEAT
      • Increasing Liquid State Machine Performance with Edge-of-Chaos Dynamics Organized by Astrocyte-modulated Plasticity
      • independent
      • indirect encoding
      • inductive bias
      • initialization
      • intelligence
      • intrinsic curiosity module
      • Introduction to RL
      • invariance
      • inverse
      • inverse matrix
      • isomorph
      • isotropic
      • isotropic gaussian
      • KL-divergence
      • kolmogorov complexity
      • Large Memory Layers with Product Keys
      • law of total probability
      • layer normalization
      • learning
      • LeCun Initialization
      • life
      • likelihood
      • line segment
      • linear least squares regression
      • linear systems of equations
      • lipschitz continuity
      • liquid state machine
      • log-likelihood
      • log-sum-exp trick
      • logarithm
      • loss
      • loss of plasticity
      • Loss of plasticity in deep continual learning
      • machine
      • machine learning
      • markov chain monte carlo
      • Mastering Atari With Discrete World Models
      • Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
      • Mastering Diverse Domains through World Models
      • matrix
      • matrix minor
      • mean
      • mean absolute error
      • mean squared error
      • meaning of life
      • memory
      • message passing
      • mind
      • minimum description length principle
      • Mixture of A Million Experts
      • mixture of gaussians
      • mode connectivity
      • momentum
      • monte carlo dropout
      • monte carlo methods
      • monte carlo tree search
      • Motif - Intrinsic Motivation from Artificial Intelligence Feedback
      • movement
      • Multi-Agent Advantage Decomposition Theorem
      • Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
      • multi-head attention
      • multivariate gaussian distribution
      • NEAT
      • negative log-likelihood loss
      • neighbourhood
      • Neural Ordinary Differential Equations - Paper
      • neuron
      • neutral element
      • norm
      • normal subgroup
      • note-taking
      • novelty
      • null space
      • obsidian markdown features
      • occam’s razor
      • On the Measure of Intelligence
      • open-ended
      • Open-Endedness is Essential for Artificial Superhuman Intelligence
      • order
      • orthogonal
      • orthogonal complement
      • pain
      • Pascal's triangle
      • permutation
      • permutation equivariance
      • permutation invariance
      • pivot
      • poisson distribution
      • policy
      • policy gradient
      • policy gradient theorem
      • policy iteration
      • potentiation
      • PPO
      • preference-based RL
      • preimage
      • probability
      • probability density function
      • probability distribution
      • probability mass function
      • pseudo inverse
      • Q-Learning
      • Q-value
      • quality diversity
      • random variable
      • random walk
      • rank
      • REINFORCE
      • reinforcement learning
      • reinforcement learning from verifyable rewards
      • reliability
      • reproduction
      • Requirements for self organization
      • reservoir computing
      • Reservoir Computing - A New Paradigm for Neural Networks
      • ResNet
      • Resynthesizing behavior through phylogenetic refinement
      • robust
      • roots of unity
      • row space
      • sampling
      • SARSA
      • scale-free
      • scaled dot product attention
      • score function
      • second moment
      • self-attention
      • sensitivity
      • set
      • simulated annealing
      • singular value decomposition
      • softmax
      • solomnoff induction
      • songs of life and mind
      • span
      • SPARTA - Distributed Training with Sparse Parameter Averaging
      • specificity
      • spectral radius
      • spline
      • standard normal distribution
      • StarGANv2-VC A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
      • state-value
      • stationary distribution
      • statistics
      • Streaming Deep Reinforcement Learning Finally Works
      • subgroup
      • supervised learning
      • surjective
      • surprise
      • Sutton & Barto RL Book Notes
      • symmetric
      • symmetric group
      • symmetric matrix
      • TD Lambda
      • temporal difference learning
      • test
      • the big bang
      • The Bitter Lesson
      • The mean preference is a bad estimate of preferences.
      • the second brain
      • The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
      • The Unbearable Slowness of Being
      • transformer
      • Transformer Squared - Self-Adaptive LLMs
      • translation invariance
      • TRPO
      • Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
      • truth
      • TU-Wien ADM Übungen
      • unitary
      • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
      • upper confidence bound
      • value function
      • value iteration
      • variance
      • vector
      • Wasserstein GAN
      • weight space
    Home

    ❯

    general

    ❯

    occam’s razor

    occam’s razor

    May 27, 20253 min read

    occam’s razor vs solomnoff induction vs minimum description length principle vs kolmogorov complexity

    Occam’s Razor as the inspiration: Both MDL and Solomonoff Induction can be seen as mathematical formalizations of Occam’s Razor. They replace the vague notion of “simplicity” or “fewest assumptions” with the precise measure of “description length” or “program length.”
    Solomonoff as the theoretical ideal: Solomonoff Induction provides the most general and theoretically optimal formalization. It essentially says the “simplest” explanation (shortest program) for the data is the most probable.
    MDL as a practical application: MDL takes the core idea from Solomonoff/Kolmogorov complexity and makes it applicable to real-world statistical modeling and machine learning problems by using specific (computable) ways to define and measure the description length of models and data.
    Kolmogorov Complexity: The precise, theoretical, but uncomputable measure of “algorithmic simplicity” for an individual object.


    Occam’s Razor:

    Nature: A philosophical principle or heuristic.
    Statement (paraphrased): “Among competing hypotheses, the one with the fewest assumptions should be selected.” Or, “Entities should not be multiplied beyond necessity.”
    Goal: To guide towards simpler explanations, which are often considered more likely to be true, easier to test, and more generalizable.
    Formality: Informal, qualitative. It doesn’t precisely define “simplicity” or “assumption.”

    Minimum Description Length (MDL) Principle:

    Nature: A formal, information-theoretic, and statistical principle for inductive inference and model selection.
    Statement (paraphrased): “The best hypothesis (model) for a given set of data is the one that leads to the shortest overall description of the data and the model itself.”
    Goal: To find a model that compresses the data best. This involves a trade-off: a more complex model might fit the data perfectly (short description of data given the model) but the model itself will have a long description. A simpler model might have a longer description of data given the model, but the model itself is short. MDL seeks the minimum of Length(Model) + Length(Data | Model).
    Formality: Formal, quantitative. Description length is measured in bits (using concepts related to Kolmogorov Complexity, though often practical approximations are used).
    Computability: While true Kolmogorov complexity is uncomputable, MDL uses computable approximations and specific coding schemes, making it a practical tool in statistics and machine learning.

    Solomonoff Induction (Algorithmic Probability):
    Nature: A formal, theoretical framework for inductive inference and prediction, often considered the “gold standard” for optimal prediction.
    Statement (paraphrased): Given a sequence of observations, the probability of a particular continuation is the sum of probabilities of all programs (for a universal Turing machine) that generate the observed sequence and then that continuation, where the probability of a program is 2^(-length of program). Essentially, shorter programs (simpler explanations) get higher prior probability.
    Goal: To provide a universal and optimal method for predicting the next element in a sequence based on prior observations, by weighting all possible explanations (programs) by their simplicity (length).
    Formality: Highly formal, based on algorithmic information theory and Turing machines. It provides a Bayesian framework with a universal prior (Kolmogorov complexity).
    Computability: Fundamentally uncomputable due to its reliance on Kolmogorov complexity and the halting problem (we can’t know if an arbitrary program will halt, let alone its shortest description). It’s a theoretical ideal.


    Graph View

    Backlinks

    • AIXI
    • kolmogorov complexity
    • minimum description length principle
    • occam’s razor
    • solomnoff induction

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • Discord Community