Epistemic uncertainty (also known as model uncertainty or knowledge uncertainty) refers to the uncertainty that arises from a lack of knowledge or data. It's the uncertainty that, in principle, can be reduced if we had more data or a better model.
Think of it as the model saying, “I’m not sure about this prediction because I haven’t seen enough examples like this, or my internal understanding (my learned parameters/weights) isn’t well-constrained for this type of input.”
In contrast to aleatoric uncertainty:
Link to originalAleatoric Uncertainty (Data Uncertainty): This is the inherent randomness or noise in the data itself.
It’s the uncertainty that cannot be reduced even with infinite data. For example, if you’re predicting a coin flip, even with perfect knowledge of the coin, there’s inherent randomness. Or, if your sensor measuring temperature has some inherent noise, that’s aleatoric.
Using variance of model’s predictions to quantify epistemic uncertainty
Since we can’t directly ask the model “how much do you not know?”, we use techniques that produce a distribution of predictions for the same input, rather than a single point estimate. The variance of this distribution is then taken as a proxy for epistemic uncertainty.
Common ways this variance is generated:
ensembles: Train multiple identical network architectures but with different random weight initializations or on different bootstrapped samples of the training data. For a new input, pass it through all networks in the ensemble.
monte carlo dropout: Train a neural network with dropout layers. At test/inference time, keep dropout active. Pass the same input through the network multiple times. Each pass will have a different set of neurons “dropped out,” effectively creating slightly different sub-networks.
bayesian neural networks: Instead of learning single point estimates for weights, BNNs learn a probability distribution over each weight. To make a prediction, you sample multiple sets of weights from these distributions, run the input through the network for each sample, and get a distribution of outputs. The variance of this output distribution reflects epistemic uncertainty (and often aleatoric too, depending on how the output layer is modeled).