regularization

Helps to prevent Overfitting

Overfitting means high variance: It tells us the amount the prediction will change if you change the training data (sensitivity).
High variance can come from lots of parameters / flexibility (without regularization).
Link to original

When overfitting, NN exaggerates importance of a datapoint → high weight.
Regularization limits this flexibility by lowering model weights.

Experience shows that trained NNs often have small weights.

Smaller weights lead to less extreme outputs (in classification, less extreme probabilities), which is desirable for an untrained model.

It’s a known property of prediction models that adding a component to the loss function, which prefers small weights, often helps to get a higher prediction performance. This approach is also known as regularization or weight decay in non-Bayesian NNs. ¹

YT src

L1 & L2 regularization add the weights into cost calculation to punish high weights.
Penalty hyperparameter between 0 and 1.

L1 regularization
L2 regularization
dropout

Early Stopping

Stop once the validation loss doesn’t improve.
(One could argue this is not the best approach) why? (→ it kinda is a form of overfitting again)

Data Augmentation

Just add more data or artificially generate data by transforming it (rotating, skewing, differnt colors, …).

https://stats.stackexchange.com/a/263523/372225 ↩

Max Wolf's Second Brain

Explorer

regularization

Early Stopping

Data Augmentation

Graph View

Backlinks

Max Wolf's Second Brain

Explorer

regularization

Early Stopping

Data Augmentation

Footnotes

Graph View

Backlinks