An MLP is a feedforward neural network consisting of multiple fully connected layers, with nonlinear activation functions in between.

They are usually applied position wise to input features for sequential data (especially if the sequence length varies and sliding window doesn’t make sense).

MLP block with residual connection




… expansion factor. Usually

… kd key vectors
… kd value vectors
keeps only the strong matches