in-context learning

In-context learning (ICL) occurs when models adapt their behavior based on examples provided in the prompt, without weight updates. One-shot and few-shot prompting are forms of ICL, while zero-shot uses only instructions without examples.

ICL differs from few-shot learning as a training paradigm - ICL is an emergent capability where models learn patterns from prompt examples, while few-shot learning explicitly trains models to learn from limited examples.

Training a network to be better at in-context learning is essentially meta learning - learning to learn.

Transclude of Language-Models-are-Few-Shot-Learners#^260d49

Different approaches:

The variable ratio problem: $\frac{learned variables V _{L}}{meta variables V _{M}}$

Meta RNNs are simple, but they have much more meta variables than learned variables ( $∣ V_{L} ∣ \in O (N), ∣ V_{M} ∣ \in O (N^{2}) ⟹ ∣ V_{L} ∣ ≪ ∣ V_{M} ∣$ ) leading them to be overparametrized, prone to overfitting.

Learned learning rules / Fast Weight Networks have $∣ V_{L} ∣ ≫ ∣ V_{M} ∣$ , but introduce a lot of complexity in the meta-learning network etc.

→A variable-sharing and sparcity principle can be used to unify these approaches into a simple framework: VSML.

Link to original

Max Wolf's Second Brain

Explorer

in-context learning

Graph View

Backlinks