Few shot learning is a training paradigm, where the model is trained on fulfilling tasks given only a few examples, at test time.
In contrast to in-context learning, few-short learning is an explicit training (meta learning) paradigm, while ICL is an emergent capability (from scale, …), without being meta-trained for it.

Few-Shot Classification

Training a classifier

We train a classifier with parameters on a dataset to output the probability of a datapoint belonging to the class given the feature vector , .
The optimal parameters maximize the probability of the true labels:

When training with mini-batches :

Link to original

In few-shot classification, we sample two disjoint sets from each epoch, where is a subset of labels (tasks) of : A support set and a training batch .
The support set is part of the model input. The goal is that the model learns to generalize to other datasets, by learning how to learn from the support set.

How would such a network represent never-seen classes? It would need to output embeddings.