Supervised Learning (SL)
Supervised learning aims to learn a function mapping points in domain to domain given training examples reflecting this mapping, where and .
In modern ML, this function is typically a large ANN with parameters denoted .
The defining feature of SL is that learning proceeds by optimizing a loss function that provides the error between the model’s prediction and the true value paired with . In general, each training example can be seen as a sample from a ground-truth distribution , because the true data generating function can be stochastic. Therefore, the goal of SL can be seen as learning an approximator that produces samples consistent with , for example, by using a loss function that encourages to deterministically predict the mean or mode of , or that matches the distribution of outputs of to .Crucially, this definition of SL assumes the model is trained once on a static, finite training dataset . We should thus not expect to accurately model data that differs significantly from its training data.
Nevertheless, with massive amounts of training data, large models can exhibit impressive generality and recent scaling laws suggest that test performance should improve further with even more data. Given the benefits of data scale, contemporary state-of-the-art SL models are trained on internet-scale, offline datasets, typically harvested via webcrawling.
While such datasets may capture an impressive amount of information about the world, they inevitably fall short in containing all relevant information that a model may need when deployed in the wild. All finite, offline datasets share two key shortcomings: incompleteness, as the set of all facts about the world is infinite, and stationarity, as such datasets are by definition fixed.
For example, our virtual assistant, if trained on a static conversational corpus, would soon see its predictions grow irrelevant, as its model falls out of date with culture, world events, and even language usage itself. Indeed, all ML systems deployed in an open-world setting, with real users and peers, must continually explore and train on new data, or risk fading into irrelevance. What data should the system designer (or the system itself ) collect next for further training? This is the complementary — and equally important — problem of exploration that sits beneath all ML systems in deployment, one that has been considered at length in the field of RL. 1
References
Footnotes
-
The entire callout is adapted from: General intelligence requires rethinking exploration ↩