permutation equivariance

If its input is permuted, a permutation equivariant function will apply the same permutation of the output:

h (x) : R^{n} \mapsto R^{n} h (x [s]) = h (x) [s]

Unlike in permutation invariance, the dimensions of input and output must be equal

MHA is permutation equivariant (without positional encodings)

One crucial characteristic of the multi-head attention is that it is permutation-equivariant with respect to its inputs. This means that if we switch two input elements in the sequence, e.g. $X_{1} \leftrightarrow X_{2}$ (neglecting the batch dimension for now), the output is exactly the same besides the elements 1 and 2 switched. Hence, the multi-head attention is actually looking at the input not as a sequence, but as a set of elements.

Link to original

Max Wolf's Second Brain

Explorer

permutation equivariance

Graph View

Backlinks