supervised learning

Learning a function that maps an input to an output (target value).
Learning is based on example input values with corresponding target values (also called supervisory signals).

Supervised Learning (SL) – Summary from General intelligence requires rethinking exploration

Supervised learning aims to learn a function $f$ mapping points in domain $X$ to domain $Y$ given $N$ training examples $(x_{i}, y_{i})_{i = 0}^{N}$ reflecting this mapping, where $x_{i} \in X$ and $y_{i} \in Y$ .
In modern ML, this function is typically a large ANN with parameters denoted $θ \in Θ$ .
The defining feature of SL is that learning proceeds by optimizing a loss function $L : Θ \times X \times Y \to R$ that provides the error between the model’s prediction $f (x)$ and the true value $y$ paired with $x$ . In general, each training example $(x, y)$ can be seen as a sample from a ground-truth distribution $P (x, y)$ , because the true data generating function $f^{*} : X \to Y$ can be stochastic. Therefore, the goal of SL can be seen as learning an approximator $f : X \to Y$ that produces samples $y$ consistent with $P (y ∣ x)$ , for example, by using a loss function that encourages $f$ to deterministically predict the mean or mode of $P (y ∣ x)$ , or that matches the distribution of outputs of $f (x)$ to $P (y ∣ x)$ .

Crucially, this definition of SL assumes the model $f$ is trained once on a static, finite training dataset $D_{train}$ . We should thus not expect $f$ to accurately model data that differs significantly from its training data.

Nevertheless, with massive amounts of training data, large models can exhibit impressive generality and recent scaling laws suggest that test performance should improve further with even more data. Given the benefits of data scale, contemporary state-of-the-art SL models are trained on internet-scale, offline datasets, typically harvested via webcrawling.

While such datasets may capture an impressive amount of information about the world, they inevitably fall short in containing all relevant information that a model may need when deployed in the wild. All finite, offline datasets share two key shortcomings: incompleteness, as the set of all facts about the world is infinite, and stationarity, as such datasets are by definition fixed.
For example, our virtual assistant, if trained on a static conversational corpus, would soon see its predictions grow irrelevant, as its model falls out of date with culture, world events, and even language usage itself. Indeed, all ML systems deployed in an open-world setting, with real users and peers, must continually explore and train on new data, or risk fading into irrelevance. What data should the system designer (or the system itself ) collect next for further training? This is the complementary — and equally important — problem of exploration that sits beneath all ML systems in deployment, one that has been considered at length in the field of RL. [^ref]

Max Wolf's Second Brain

Explorer

supervised learning

Graph View

Backlinks