Graph View

memory in transformers

1 min read

Learned prompt/prefix, aka memory tokens
Feeding those processed tokens back: Recurrent Memory Transformer
Extending memory by keeping old KV: transformer-xl
RMT extended / applied to RL: Recurrent Action Transformer with Memory

some things i wanna look into / compare some time in more depth

leave-no-context-behind-efficient-infinite-context-transformers-with-infini-attention
https://github.com/lucidrains/memory-transformer-xl/tree/master (readme descr seems similar to infini attention on a skim)
https://github.com/lucidrains/memorizing-transformers-pytorch

Backlinks

Titans - Learning to Memorize at Test Time

Created with Quartz v4.5.2 © 2026