Helps to prevent Overfitting
Overfitting means high variance: It tells us the amount the prediction will change if you change the training data (sensitivity).
Link to original
High variance can come from lots of parameters / flexibility (without regularization).
When overfitting, NN exaggerates importance of a datapoint → high weight.
Regularization limits this flexibility by lowering model weights.
- Experience shows that trained NNs often have small weights.
- Smaller weights lead to less extreme outputs (in classification, less extreme probabilities), which is desirable for an untrained model.
- It’s a known property of prediction models that adding a component to the loss function, which prefers small weights, often helps to get a higher prediction performance. This approach is also known as regularization or weight decay in non-Bayesian NNs. 1
L1 & L2 regularization add the weights into cost calculation to punish high weights.
Penalty hyperparameter between 0 and 1.
L1 regularization
L2 regularization
dropout
Early Stopping
Stop once the validation loss doesn’t improve.
(One could argue this is not the best approach) why? (→ it kinda is a form of overfitting again)
Data Augmentation
Just add more data or artificially generate data by transforming it (rotating, skewing, differnt colors, …).