Helps to prevent Overfitting

Overfitting means high variance: It tells us the amount the prediction will change if you change the training data (sensitivity).
High variance can come from lots of parameters / flexibility (without regularization).

Link to original

When overfitting, NN exaggerates importance of a datapoint → high weight.
Regularization limits this flexibility by lowering model weights.

  • Experience shows that trained NNs often have small weights.
  • Smaller weights lead to less extreme outputs (in classification, less extreme probabilities), which is desirable for an untrained model.
  • It’s a known property of prediction models that adding a component to the loss function, which prefers small weights, often helps to get a higher prediction performance. This approach is also known as regularization or weight decay in non-Bayesian NNs. 1

YT src

L1 & L2 regularization add the weights into cost calculation to punish high weights.
Penalty hyperparameter between 0 and 1.

L1 regularization
L2 regularization
dropout

Early Stopping

Stop once the validation loss doesn’t improve.
(One could argue this is not the best approach) why? (→ it kinda is a form of overfitting again)

Data Augmentation

Just add more data or artificially generate data by transforming it (rotating, skewing, differnt colors, …).

Footnotes

  1. https://stats.stackexchange.com/a/263523/372225