Likelihood function
A likelihood scores how plausible different make the observed data .
It has the same formula as the probability distribution in , but viewed as a function of , data fixed.We vary to see how likely they make the observed data.
Note: While for each , the likelihood need not integrate to 1 over : , i.e. it is not a probability distribution (over ).
Note the equivalent and equally context dependent notation . Writing avoids this ambiguity by making the variable explicit.
Careful: (the posterior) is not a mirror image of the probability distribution regardless of whether it is viewed as a function of or . 1 They are related through Bayes Theorem:
Bayes Theorem: Normalized likelihood = posterior
The likelihood is not normalized over . That’s just another way of saying that it’s not a probability distribution in , i.e. .
Baye’s theorem normalizes the likelihood to get a proper probability distribution over (the posterior):Since does not depend on , we can write this as a proportionality:
→ The posterior is the prior weighted by the likelihood, up to multiplication by a constant .,
→ Dropping this constant changes nothing for inference.
iid factorization of the likelihood
Independence turns the joint probability of the data into a product via the chain rule of probability, without it we’d have:
Being identically distributed makes it the same in every factor (otherwise each would have its own marginal ).
What does the likelihood function represent?
It scores how plausible different make the observed data .
It’s the (joint) probability/density that drawing iid samples from — where is the random variable of a single observation — produces exactly the observed data .
Properties of the likelihood
Some but not all properties of the probability carry over.
Nonnegativity:
chain rule of probability still works
…
Some properties don’t carry over:
Not a distribution over .
No countable additivity over disjoint parameter sets: Likelihoods don’t add like probabilities, they multiply (for independent data). The log-likelihood adds.
Marginalization over is not defined without a prior /measure.