Likelihood function
A likelihood scores how plausible different make the observed data .
It has the same formula as the probability distribution in , but viewed as a function of , data fixed.We vary to see how likely they make the observed data.
Note: While for each , the likelihood need not integrate to 1 over : , i.e. it is not a probability distribution.
Note the equivalent and equally context dependent notation . Writing avoids this ambiguity by making the variable explicit.
Careful: (the posterior) is not a mirror image of the probability distribution regardless of whether it is viewed as a function of or . 1 They are related through Bayes Theorem:
Bayes Theorem: Normalized likelihood = posterior
The likelihood is not normalized over . That’s just another way of saying that it’s not a probability distribution in , i.e. .
Baye’s theorem normalizes the likelihood to get a proper probability distribution over (the posterior):Since does not depend on , we can write this as a proportionality:
→ The posterior is the prior weighted by the likelihood, up to multiplication by a constant .,
→ Dropping this constant changes nothing for inference.
Properties of the likelihood
Some but not all properties of the probability carry over.
Nonnegativity:
chain rule of probability still works
…
Some properties don’t carry over:
Not a distribution over .
No countable additivity over disjoint parameter sets: Likelihoods don’t add like probabilities, they multiply (for independent data). The log-likelihood adds.
Marginalization over is not defined without a prior /measure.