The generalizatino error or risk is the exptected loss on future data for a given model with parameters , defined as:

where is the loss function, and is the true (unknown) joint distribution of inputs and outputs .

We estimate the generalization error using the empirical risk (or training error) computed on a finite training dataset :

This works under the assumption that the training data is drawn iid from the same distribution as future data, and by the law of large numbers, the empirical risk converges to the true risk as .

If we train plain empirical risk minimization (ERM), we run the risk of overfitting, where the model fits the training data well (low empirical risk) but performs poorly on unseen data (high generalization error) .
→ Split data into training and validation/test sets to estimate generalization error, e.g. using cross-validation.