Bias refers to the error introduced by approximating a real-world problem with a simplified model. A high-bias model makes strong assumptions about the data and tends to underfit - it’s too simple to capture the underlying patterns. Think of a linear regression trying to fit clearly non-linear data.
Variance refers to how much the model’s predictions would change if we trained it on different training data. A high-variance model is very sensitive to the training data and tends to overfit - it learns the noise in the training data rather than the true underlying patterns. Think of a complex polynomial that perfectly fits every training point but wildly oscillates between them.
In a nutshell
→ As we make our model more complex (reducing bias), it becomes more sensitive to the training data (increasing variance).
→ As we make our model simpler (reducing variance), it makes stronger assumptions (increasing bias).
“model” can also refer to different things, like a reward function:
Bias variance tradeoff in policy optimization
→ Using actual returns (vanilla policy gradient) gives high variance because returns depend on many random events (and timing / credit assignment is unclear), but no bias because we’re using real data.
→ Using a value function (like actor critic) reduces variance because we’re using a simpler, more stable estimate, but introduces bias because our value function doesn’t match to the true returns.
→ While variance is annoying as it makes your training potentially unstable and requires more examples to train on, bias can be a bigger problem, as you model might learn the wrong thing, and even with infinite samples never find the optimal policy (fail to converge, or converge to a suboptimal solution).