A chunk is whatever working memory treats as “one thing”. It’s a (familiar) pattern encoded as a unit.
Working memory capacity is roughly independent of chunk complexity, e.g. “C-E-G” (3 chunks) → “C major chord” (1 chunk).
A chunk can be as simple as a digit or a multi-step algorithm.
Working memory holds ~4 chunks.
The effective cognitive capacity scales with chunk complexity (bits per chunk) / better compression.
Chunks are recursive
A chunk’s components are themselves chunks. Each layer compresses the previous layer’s outputs into new units.
→ Expertise compounds
→ learaning order shapes representation quality
Origins of the concept and the number
Miller (1956) “The Magical Number Seven, Plus or Minus Two.” Introduced the concept of chunks and the idea that capacity increases with bits/chunk. The original estimate.
Cowan (2001) “The Magical Number 4 in Short-Term Memory” (the paper you linked). Revised the number down to ~4, arguing Miller’s higher estimates reflected subjects sneaking in recoding/chunking strategies (unconsciously grouping items (1,4,9,2 → 1492, Columbus), inflating the apparent capacity. Cowan’s claim: when you control for chunking, the true limit is about 4.
Chase and Simon (1973) “Perception in Chess.” Show a real game position for 5 seconds, remove it, ask subjects to reproduce it. Masters reproduce ~16 pieces; novices ~4. With random arrangements, both get ~4. The masters don’t have more slots, but they have deeper compression trees. Where a novice sees 16 individual pieces (overflowing 4 slots), a master sees 3-4 familiar multi-piece configurations.
Simon and Gilmartin (1973) Built MAPP (Memory-Aided Pattern Perceiver), a simulation using a discrimination tree: each node is a chunk, children are its sub-patterns. The model scans a board, matches what it sees against stored patterns, and loads recognized patterns into a simulated ~7-slot STM. Calibrating “how large must the tree be to reproduce master-level recall?” yielded ~50,000–100,000 stored patterns. Consistent with the ~10 years of dedicated study typically needed to reach master level.
Understanding bottlenecks are decompression failures
Complex ideas are hard to learn because they build on other ideas.
Learning = compression
for i in range(len(array))→iterate
32 pieces→a familiar openingWhen a complex idea eludes you, it’s usually because its parts/prerequisites haven’t been compressed yet into a few enough chunks. Like reading a book that starts in English but gradually shifts to Spanish.
There's debate about whether people are truly holding complex chunks as single units or sneaking in sub-chunk maintenance strategies.
Memorizing means forming chunks
You create a stable pattern that can be retrieved as a unit.
Basically anything you can think about is a chunk, and chunks are on a spectrum of compressibility.
A freshly memorized fact is a flimsy chunk. It might take many slots/cycles effort to retrieve and reconstruct.