Statistics computed over the entire dataset (as opposed to sample statistic).
E.g. in batch normalization:
During the inference phase, the network processes one input at a time, and it is not possible to compute the mean and variance of a mini-batch. Therefore, during the training process, a running average of the mean and variance is kept to estimate the population statistics. These population statistics are then used to normalize the activations during the inference phase.
E.g. in Welfords method:
you can divide M2
either by n_samples
or by n_samples - 1
.
Dividing by n_samples - 1
is known as Bessel’s correction , to get the unbiased sample variance.
Dividing by n_samples
is used for calculating the population variance. If you have the entire population (all data), you can divide by n_samples
to get the exact variance and standard deviation of the population.