year: 2024/11
paper: https://arxiv.org/pdf/2411.13676
website:
code:
connections: memory token, transformer, state space model


Meta Tokens – A set of 128 pretrained embeddings prepended to inputs, functioning as learned cache initialization to enhance focus on relevant information. These tokens serve a dual purpose: (i) they mitigate attention drain by acting as backstop tokens, redistributing attention effectively, and (ii) they encapsulate compressed world knowledge.