few-shot learning

Few shot learning is a training paradigm, where the model is trained on fulfilling tasks given only a few examples, at test time.
In contrast to in-context learning, few-short learning is an explicit training (meta learning) paradigm, while ICL is an emergent capability (from scale, …), without being meta-trained for it.

Few-Shot Classification

Training a classifier

We train a classifier $f_{θ}$ with parameters $θ$ on a dataset $D = {(x_{i}, y_{i})}_{i = 1}^{n}$ to output the probability of a datapoint belonging to the class $y$ given the feature vector $x$ , $P_{θ} (y ∣ x)$ .
The optimal parameters maximize the probability of the true labels:
$θ^{*} = ar g θ max E_{(x, y) \in D} [P_{θ} (y ∣ x)])]$
When training with mini-batches $B \subset D$ :
$θ^{*} = ar g θ max E_{B \subset D} (x, y) \in B \sum P_{θ} (y ∣ x)$

Link to original

In few-shot classification, we sample two disjoint sets from $D^{T} \subset D$ each epoch, where $T \subset T$ is a subset of labels (tasks) of $D$ : A support set $S \subset D$ and a training batch $B \subset D$ .
The support set is part of the model input. The goal is that the model learns to generalize to other datasets, by learning how to learn from the support set.
$θ = ar g θ max E_{T \subset T} E_{(S, B) \subset D^{T}} (x, y) \in B \sum P_{θ} (y ∣ x, S)$
How would such a network represent never-seen classes? It would need to output embeddings.

Max Wolf's Second Brain

Explorer

few-shot learning

Graph View

Backlinks