Covariance is just like correlation - it tells you how two random variables change together - with the difference that covariance is not necessarily bounded / normalized between .

In fact, the covariance matrix is the unnormalized correlation matrix.

Covariance

The covariance between two variables and is calculated as the average of the product of their deviations from their respective means and .

This final version gives us a nice interpretation:
Covariance is the difference in expected value of two random variables taken from their joint behavior vs as if they are independent. So if they are independent, both terms will be the same and the covariance 0. See this.

Covariance can be interpreted as:

→when increases, tends to increase as well, and vice versa. It’s more likely to find and both above or both below their means, and to it’s more likely to sample where both are large/small.
→ when increases, tends to decrease, and vice versa. It’s less likely to find and both above or both below their means, and to it’s more likely to sample where one is large and the other is small.
→ no linear relationship between and , i.e. you can’t tell how large will be if you know .

The covariance of a random variable and itself is its variance

Covariance is commutative

Limitations

Covariance doesn’t indicate the strength of the relationship, nor its causality. So if two distributions are independent, their covariance will be zero but the reverse it not true because it does not take non-linear relationsips into account.
However, for multivariate normal distribution, a zero covariance implies independence.

It is also sensitive to the scale of the variables, making it difficult to compare covariances across different datasets.

![[covariance-20240731122500640.webp|center|https://distill.pub/2019/visual-exploration-gaussian-processes/]]

Covariance Matrix

Split this + all the references into separate note…

Covariance matrix

The second moment of a random variable is → The second moment matrix contains all pairwise dot products ().
When centered (mean subtracted), this becomes the covariance matrix (or “central second moment”):

The diagonal elements represent the variance of each variable, while the off-diagonal elements represent the covariance between pairs of variables:

The covariance matrix is always positive semidefinite.

See also multivariate gaussian distribution.

Illustrated with a simple example

How much somone likes some fruite example table (higher score means they like the fruit more):

AppleBanana
11
30
-1-1

The means of both random variables:

The covariance matrix of these two random variables has the following form:

Since, (we know the elements of one triangle, we can just mirror it).

References

gpt
cov matrix yt