Max Wolf's Second Brain

❯

❯

cross attention

cross-attention

Jul 13, 20251 min read

self-attention, but the keys and values come from a different source, not from the same input.

Cross-Attention

Queries correspond to the rows of the attention matrix.
Keys correspond to the columns of the attention matrix.

Note: Neither the number of tokens nor the dimension of the two “modalities” need to match up. Hower in practice they often do (we use a single d_model, as in the image above).

HuggingFaceDiffusers code

Cross-Attention in Transformer Architecture (also stable diff, …)

Graph View

Backlinks

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
attention (ML)
scaled dot product attention
self-attention
transformer

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community