If we don’t have access to the entire dataset like with the population - statistic, but only a sample, we need to apply Bessel’s correction, e.g. for the variance or the std, by dividing by instead of .
Why we need Bessel's correction
The sample mean is the value that minimizes the sum of squared deviations from the sample. For any other value :
This means where is the true population mean.
The sample always clusters tightly around its own mean compared to the population mean “overfit”, systematically underestimating variance.Additionally, the deviations must sum to zero: (since ). This constraint means only deviations are free to vary - if you know of them, the last is determined.
Dividing by exactly compensates for both effects, making (unbiased).
Practical impact
Small samples (n < 30): correction is crucial (10% difference at n=10)
Medium samples (n = 30-100): still noticeable (3.3% at n=30, 1% at n=100)
Large samples (n > 100): essentially negligible
As : , correction vanishes