Graph View

Hymba - A Hybrid-head Architecture for Small Language Models

1 min read

year: 2024/11
paper: https://arxiv.org/pdf/2411.13676
website:
code:
connections: memory token, transformer, state space model

Meta Tokens – A set of 128 pretrained embeddings prepended to inputs, functioning as learned cache initialization to enhance focus on relevant information. These tokens serve a dual purpose: (i) they mitigate attention drain by acting as backstop tokens, redistributing attention effectively, and (ii) they encapsulate compressed world knowledge.

Backlinks

memory token

Created with Quartz v4.5.2 © 2026