Likelihood function

A likelihood scores how plausible different make the observed data .
It has the same formula as the probability distribution in , but viewed as a function of , data fixed.

We vary to see how likely they make the observed data.

Note: While for each , the likelihood need not integrate to 1 over : , i.e. it is not a probability distribution.
Note the equivalent and equally context dependent notation . Writing avoids this ambiguity by making the variable explicit.

Careful: (the posterior) is not a mirror image of the probability distribution regardless of whether it is viewed as a function of or . 1 They are related through Bayes Theorem:

Bayes Theorem: Normalized likelihood = posterior

The likelihood is not normalized over . That’s just another way of saying that it’s not a probability distribution in , i.e. .
Baye’s theorem normalizes the likelihood to get a proper probability distribution over (the posterior):

Since does not depend on , we can write this as a proportionality:

→ The posterior is the prior weighted by the likelihood, up to multiplication by a constant .,
→ Dropping this constant changes nothing for inference.

Properties of the likelihood

Some but not all properties of the probability carry over.
Nonnegativity:
chain rule of probability still works

Some properties don’t carry over:
Not a distribution over .
No countable additivity over disjoint parameter sets: Likelihoods don’t add like probabilities, they multiply (for independent data). The log-likelihood adds.
Marginalization over is not defined without a prior /measure.

Footnotes

  1. ^cc8b69. I certainly would never confuse these…