A prior distribution represents our beliefs about parameters before observing any data.
In Bayes Theorem, the prior combines with the likelihood to produce the posterior:
The prior encodes what we know (or assume) about parameters before the experiment. Strong priors reflect confident domain knowledge - like using a normal distribution centered around 170cm for human heights. Weak priors express uncertainty while still providing regularization - preventing extreme parameter values when data is limited.
Uninformative priors are impossible
You must choose some parametrization to write down a prior. But “uniform” in one parametrization means non-uniform in others. There’s no canonical parametrization that represents true ignorance - the choice itself encodes information about what scale you consider “natural” for the problem.
How parametrization introduces bias
Consider a coin with unknown bias (probability of heads). A “uniform” prior seems uninformative: for (constant density, integrates to 1)
But what if we reparametrize by odds instead? Let , so .
When we change variables in a probability density, we must account for how the transformation stretches/compresses space:
Computing the derivative:
So our “uniform” prior becomes:
This heavily favors small odds! Half the probability mass is on odds less than 1 (probabilities less than 0.5).
The influence of the prior diminishes with more data - eventually the likelihood dominates. With limited data, prior choice matters significantly.
Priors make Bayesian reasoning subjective. This makes assumptions explicit rather than hidden in the choice of estimator or test statistic.