VICReg

year: 2022
paper: https://arxiv.org/pdf/2105.04906.pdf
website:
code:

Builds upon Barlow Twins - Self-Supervised Learning via Redundancy Reduction

Invariance

Agreement between positive examples should be high (cosine simmilarity could be used/ euclidian distance is actually used in VICReg)

Variance

Keep variance over a certain threshold: This term forces the embedding vectors of samples within a batch to be different.

They don’t try to maximize the standard deviation but just try to stop it from being very low with the help of this Hinge Loss:

\displaylines v (Z) = \frac{1}{d} j = 1 \sum d max (0, γ - S (z^{j}, ϵ))

where $S$ is the regularized standard deviation defined by:

\displaylines S (x, ϵ) = V a r (x) +

Covariance

We don’t want dimensions in the embedding to be correlated. We want the output embeddings to hold different information → Like linear dependance (maximizing the diagonal).

Highly correlating dimension in the output embedding matrix have a high covariance term, uncorrelating dims have covariance of $0$ and negatively correlated ones have a negative covariance. Hence, we want to minimize the covariance matrix.

SimCLR → Same things same embeddings (SimCLR specifically maximizes difference between negative samples)
Simple Framework for Contrastive Lear → Different things different embeddings (Black and white)
VICReg → Explicitly try to regularize
Duality between contr and non-contr →
VICReg Code and tutorial: TODO READ!
https://imbue.com/open-source/2022-04-21-vicreg/

self-supervised learning

Graph View

VICReg

Invariance

Variance

Covariance