posterior

The posterior distribution $P (θ ∣ X)$ represents our updated beliefs about parameters $θ$ after observing data $X$ .

Bayes Theorem tells us how to compute it:

P (θ ∣ X) = \frac{P ( X ∣ θ ) P ( θ )}{P ( X )}

The posterior combines:

Our prior beliefs $P (θ)$ about parameters before seeing data
The likelihood $P (X ∣ θ)$ - how well different parameter values explain the observed data
The marginal likelihood $P (X)$ - a normalizing constant ensuring the posterior integrates to 1

The posterior answers the key question of inference

Given what we observed, which parameter values are most plausible?

Unlike the likelihood which tells us “if $θ$ were true, how probable is our data?”, the posterior tells us “given our data, how probable is $θ$ ?”

The posterior is often intractable to compute exactly because $P (X) = \int P (X ∣ θ) P (θ) d θ$ requires integrating over all possible parameter values. This leads to approximation methods:

Markov chain Monte Carlo - sample from the posterior without computing it explicitly
Variational inference - approximate with a simpler distribution by minimizing KL divergence
Laplace approximation - approximate as Gaussian around the posterior mode

As we collect more data, the posterior becomes increasingly concentrated around the true parameter value (assuming our model is correct). The prior’s influence diminishes - with enough data, different reasonable priors lead to similar posteriors.

The posterior is the complete answer to the inference problem in Bayesian inference. It quantifies all our uncertainty about parameters given the data.

Max Wolf's Second Brain

Explorer

posterior

Graph View

Backlinks