memorization bootstraps generalization

Before you can do any meaningful cognition with a concept, you need to memorize it.
To effectively reason about a concept, it needs to fit inside your working memory, i.e. you need to compress the concept into a chunk. A chunk is a composable primitive.
learaning order shapes representation quality – memorizing the primitives first enables you to effectively reason about a concept. Or in other words: you to generalize more quickly.
Because memorization is an active, interpretative process, it can help accelerate the unpleasant early stages of learning as it makes it easier to learn complex topics by decreasing working memory load.

It’s easier to first memorize and then generalize, rather than trying to generalize right away!

The more you bias your batch, the faster you can learn! There is a sweetspot, where you completely avoid the loss plateau phase, by allowing the model to memorize more quickly and then generalize.
This means curriculum learning works here!

Link to original

Circular transclusion detected: general/General-Purpose-In-Context-Learning-by-Meta-Learning-Transformers

On a personal note, I think I actually developed an aversion to rote learning / memorization due to school, which I’m now consciously trying to revert, because it really is… harder to learn new thing well/deeply/quickly (mid- to long-term), if you don’t learn to memorize the vocabs/chords/terms/methods. Forcing yourself to memorize sucks for a shorter period of time than trying to talk/play/understand without having those primitives.
Like for instruments it’s most obvious to me, I always avoided learning anything by heart unless I absolutely had to to progress (or i just stopped progressing…).

Graph View

memorization bootstraps generalization

Backlinks