marginalization

Marginal density

The marginal density of a variable is obtained by integrating the joint density over the other variable:
$f_{X} (x) = \int_{- \infty}^{\infty} f (x, y) d y$
Here we’re integrating out (averaging over all possible values of) $Y$ , to get the marginal density of $X$ , i.e. the density of $X$ regardless of $Y$ .
This also work with discrete distributions: $P (X = x) = \sum_{i} P (X = x, Y = y_{i})$

Link to original

Marginalization and conditioning

Gaussian distributions are closed under marginalization and conditioning, i.e. they return a modified gaussian distribution.
E.g. for the following bivariate distribution:
$P_{X, Y} = [X Y] \sim N (μ, Σ) = N ([μ_{X} μ_{Y}], [Σ_{XX} Σ_{Y X} Σ_{X Y} Σ_{YY}])$

Marginalization

Marginalization lets us extract partial information from multivariate probability distributions. Given a normal probability distribution $P (X, Y)$ over vectors of random variables $X$ and $Y$ , we can determine their marginalized probability distributions like this:
$X Y \sim N (μ_{X}, Σ XX) \sim N (μ_{Y}, Σ_{YY})$
This means that each partition $X$ and $Y$ only depends on its corresponding entries in $μ$ and $Σ$ .

Another way to express this, mathematically, is that we view every possible value of $X$ under the consideration of all possible values of $Y$ , e.g.: $P (X = x_{1}) = P (X = x_{1}, Y = y_{1}) + \dots + P (X = x_{1}, Y = y_{n}))$ , everaging out Y’s contribution:
$p_{X} (x) = \int_{y} p_{X, Y} (x, y) d y = \int_{y} p_{X ∣ Y} (x ∣ y) p_{Y} (y) d y$

Conditioning

Conditioning is used to determine the probability distribution of one variable depending on another variable, i.e. how one variable behaves when another one is known.
$X ∣ Y Y ∣ X \sim N (μ_{X} + Σ_{X Y} Σ_{YY}^{- 1} (Y - μ_{Y}), Σ_{XX} - Σ_{X Y} Σ_{YY}^{- 1} Σ_{Y X}) \sim N (μ_{Y} + Σ_{Y X} Σ_{XX}^{- 1} (X - μ_{X}), Σ_{YY} - Σ_{Y X} Σ_{XX}^{- 1} Σ_{X Y})$
The mean gets shifted by how much the known variable differs from its expected value $(Y - μ_{Y})$ , which is normalized by $Σ_{YY}^{- 1}$ (think $(Y - μ_{y}) / Y$ ), and scaled by the covariance between the two variables $Σ_{X Y}$ . This product can be thought of as translating the normalized deviation in $Y$ to corresponding changes in $X$ ’s scale.
→ $Σ_{X Y} Σ_{YY}^{- 1} Σ_{Y X}$ represents the amount of variance in $X$ that can be explained by $Y$ .

Conditioning is like taking a slice of the distribution at the known/given value of a variable.

Visualization of marginalization and conditioning

Note: There is an interactive version at distill.

The blue curve shows the entire underlying distribution of the random variable $x_{a}$ , and the red curve shows a slice of the joint distribution of $x_{a}$ and $x_{b}$ at a specific value of $x_{b}$ .
i

For normal distributions, uncorrelated variables are independent if and only if they are jointly normally distributed.

A pair $(X, Y)$ is jointly normal exactly when every linear combination $a X + bY$ is normally distributed, i.e. the resulting distribution is a multivariate normal distribution.

Visual intuition $X \sim N (0, 1)$ and $Y = W X$ where $W$ is $\pm 1$ with equal probability:

Consider

Individually $X$ and $Y$ each look normal, but together their joint distribution is unusual: When $X$ is positive, $Y$ is either $+ X$ or $- X$ with 50/50 chance, when $X$ is negative, same thing.

→ $X$ and $Y$ cannot possibly be independent, even though they’re uncorrelated!

Link to original

Max Wolf's Second Brain

Explorer

marginalization

Marginalization and conditioning

Graph View

Backlinks