Conditional probability

The conditional probability of given is given by:

This is almost identical to the standard conditional probability (probability of AND divided by the probability of ), but more general, as it’s a conditional density function of all values of and .
This is the same formula but using density notation - denotes the conditional density function of given :

And the chain rule of probability follows directly from this:

Link to original

Marginalization and conditioning

Gaussian distributions are closed under marginalization and conditioning, i.e. they return a modified gaussian distribution.
E.g. for the following bivariate distribution:

Marginalization

Marginalization lets us extract partial information from multivariate probability distributions. Given a normal probability distribution over vectors of random variables and , we can determine their marginalized probability distributions like this:

This means that each partition and only depends on its corresponding entries in and .

Another way to express this, mathematically, is that we view every possible value of under the consideration of all possible values of , e.g.: , everaging out Y’s contribution:

Conditioning

Conditioning is used to determine the probability distribution of one variable depending on another variable, i.e. how one variable behaves when another one is known.

The mean gets shifted by how much the known variable differs from its expected value , which is normalized by (think ), and scaled by the covariance between the two variables . This product can be thought of as translating the normalized deviation in to corresponding changes in ’s scale.
represents the amount of variance in that can be explained by .

Conditioning is like taking a slice of the distribution at the known/given value of a variable.

For normal distributions, uncorrelated variables are independent if and only if they are jointly normally distributed.

A pair is jointly normal exactly when every linear combination is normally distributed, i.e. the resulting distribution is a multivariate normal distribution.

Link to original