year: 2022
paper: arxiv
website:
code:
connections: in-context learning, bayesian inference, pretraining
TLDR
Proposes that in-context learning is implicit bayesian inference: during pretraining, each document is generated by a latent concept, and next-token prediction trains the model to infer a posterior over these latent concepts from context. At inference time, the prompt examples narrow down which concept is active, and the model predicts accordingly. ICL thus emerges as a natural consequence of the pretraining objective, not a separate learned skill.