Associative memory is a type of memory that can be retrieved by association, i.e. by similarity or relatedness to other memories. Importantly, the full memory can be reconstructed from partial information
For example, if you see a certain piece of clothing, it might remind you of a person which frequently wears it.

memory in the brain is associative. It is stored in reference frames based on grid cells.

This also explains why it is easier to remember things if you assign them to spatial locations, e.g. in a “thought palace” / “method of loci”.
The brain prefers to store things in reference frames.
The method of recalling things (“or thinking if you will” … not so sure about that yet. I think this is just the interpolative memory part - where is the “program synthesis” part?) is to mentally move through those reference frames.
One can physically move through reference frames, i.e. by moving through a room, touching an object, … or by thinking about them.

Link to original

Outer-product memory

The rank-1 matrix produced by an outer product can be interpreted as storing an association from pattern (key) to pattern (value).
is a linear map that takes a query which, if similar to , returns something similar to :

If the keys are unit norm, is exactly scaled by the cosine similarity of and .

outer product stores association from key to value
inner product measures similarity of query to key
→ Together, they implement content-based addressing, aka associative memory.

For many pairs, we stack keys and values and build a weight matrix that stores all associations . Then

So computes the similarity of the query to all keys, and returns a similarity-weighted sum of all values.
If the keys are orthonormal, this exactly returns the value corresponding to the most similar key.
Too many non-orthogonal keys clutter recall.
Mitigations incude:

Attention is an associative memory

In a self-attention head, a query retrieves content by matching against keys and mixing the corresponding values → content-addressable recall aka associative memory in a differentiable key-value memory.

Link to original

Hopfield Network
Boltzman Machines

Linear Transformers Are Secretly Fast Weight Programmers