multivariate gaussian distribution

A multivariate gaussian distribution contains $n$ random variables, each of which is gaussian, as is their joint distribution.

EXAMPLE

A multivariate distribution with mean $μ = [00]$ and the covariance matrix $Σ = [1 \frac{3}{5} \frac{3}{5} 2]$

The multivariate distribution $X$ is fully described by its mean vector $μ$ and a covariance matrix $Σ$ .

X = X_{1} X_{2} ⋮ X_{n} \sim N (μ, Σ)

The mean vector describes the expected value of the distribution. Each of its components describes the mean of the corresponding dimension.
The covariance matrix models the variance along each dimension, it describes the shape of the distribution.
The main diagonal of the covariance matrix contains the central moments of the $i$ -th random variable, and the off-diagonal elements contain the covariances describing the correlation between the variables.

Σ = Cov (X_{i}, X_{j}) = E [(X_{i} - μ_{i}) (X_{j} - μ_{j})] = E [X X^{T}] - μ μ^{T}

Visualization of different covariance matrices of a bivariate gaussian distribution.

Note: There is an interactive version at distill.

Depicted are multivariate gaussian distributions with a $μ$ of $0$ and covariance matrixes:

$[1 - 0.8 - 0.8 1]$ , $[1001]$ and $[1 0.8 0.8 1]$

The one in the middle is a diagonal gaussian distribution: The two variables do not correlatete with eachother. The covariance matrix is zero except on the main diagonal.

Marginalization and conditioning

Gaussian distributions are closed under marginalization and conditioning, i.e. they return a modified gaussian distribution.
E.g. for the following bivariate distribution:

P_{X, Y} = [X Y] \sim N (μ, Σ) = N ([μ_{X} μ_{Y}], [Σ_{XX} Σ_{Y X} Σ_{X Y} Σ_{YY}])

Marginalization

Marginalization lets us extract partial information from multivariate probability distributions. Given a normal probability distribution $P (X, Y)$ over vectors of random variables $X$ and $Y$ , we can determine their marginalized probability distributions like this:
$X Y \sim N (μ_{X}, Σ XX) \sim N (μ_{Y}, Σ_{YY})$
This means that each partition $X$ and $Y$ only depends on its corresponding entries in $μ$ and $Σ$ .

Another way to express this, mathematically, is that we view every possible value of $X$ under the consideration of all possible values of $Y$ , e.g.: $P (X = x_{1}) = P (X = x_{1}, Y = y_{1}) + \dots + P (X = x_{1}, Y = y_{n}))$ , everaging out Y’s contribution:
$p_{X} (x) = \int_{y} p_{X, Y} (x, y) d y = \int_{y} p_{X ∣ Y} (x ∣ y) p_{Y} (y) d y$

Conditioning

Conditioning is used to determine the probability distribution of one variable depending on another variable, i.e. how one variable behaves when another one is known.
$X ∣ Y Y ∣ X \sim N (μ_{X} + Σ_{X Y} Σ_{YY}^{- 1} (Y - μ_{Y}), Σ_{XX} - Σ_{X Y} Σ_{YY}^{- 1} Σ_{Y X}) \sim N (μ_{Y} + Σ_{Y X} Σ_{XX}^{- 1} (X - μ_{X}), Σ_{YY} - Σ_{Y X} Σ_{XX}^{- 1} Σ_{X Y})$
The mean gets shifted by how much the known variable differs from its expected value $(Y - μ_{Y})$ , which is normalized by $Σ_{YY}^{- 1}$ (think $(Y - μ_{y}) / Y$ ), and scaled by the covariance between the two variables $Σ_{X Y}$ . This product can be thought of as translating the normalized deviation in $Y$ to corresponding changes in $X$ ’s scale.
→ $Σ_{X Y} Σ_{YY}^{- 1} Σ_{Y X}$ represents the amount of variance in $X$ that can be explained by $Y$ .

Conditioning is like taking a slice of the distribution at the known value.

Visualization of marginalization and conditioning

Note: There is an interactive version at distill.

The blue curve shows the entire underlying distribution of the random variable $x_{a}$ , and the red curve shows a slice of the joint distribution of $x_{a}$ and $x_{b}$ at a specific value of $x_{b}$ .
i

For normal distributions, uncorrelated variables are independent if and only if they are jointly normally distributed.

A pair $(X, Y)$ is jointly normal exactly when every linear combination $a X + bY$ is normally distributed, i.e. the resulting distribution is a multivariate normal distribution.

Visual intuition $X \sim N (0, 1)$ and $Y = W X$ where $W$ is $\pm 1$ with equal probability:

Consider

Individually $X$ and $Y$ each look normal, but together their joint distribution is unusual: When $X$ is positive, $Y$ is either $+ X$ or $- X$ with 50/50 chance, when $X$ is negative, same thing.

→ $X$ and $Y$ cannot possibly be independent, even though they’re uncorrelated!

References

Lesson 4: Multivariate Normal Distribution
probability distribution

Max Wolf's Second Brain

Explorer

multivariate gaussian distribution

Marginalization and conditioning

References

Graph View

Backlinks