sample statistic

If we don’t have access to the entire dataset like with the population statistic, but only a sample, we need to apply Bessel’s correction, e.g. for the variance or the std, by dividing by $(n - 1)$ instead of $n$ .

s^{2} s = \frac{1}{n - 1} i = 1 \sum n (X_{i} - \overset{ˉ}{X})^{2} = s^{2}

Why we need Bessel's correction

The sample mean $\overset{ˉ}{X}$ is the value that minimizes the sum of squared deviations from the sample. For any other value $c$ :
$\sum (X_{i} - c)^{2} > \sum (X_{i} - \overset{ˉ}{X})^{2}$
This means $\sum (X_{i} - \overset{ˉ}{X})^{2} < \sum (X_{i} - μ)^{2}$ where $μ$ is the true population mean.
The sample always clusters tightly around its own mean compared to the population mean “overfit”, systematically underestimating variance.

Additionally, the deviations must sum to zero: $\sum (X_{i} - \overset{ˉ}{X}) = 0$ (since $\overset{ˉ}{X} = \frac{1}{n} \sum X_{i}$ ). This constraint means only $(n - 1)$ deviations are free to vary - if you know $(n - 1)$ of them, the last is determined.

Dividing by $(n - 1)$ exactly compensates for both effects, making $E [s^{2}] = σ^{2}$ (unbiased).

Practical impact

Small samples (n < 30): correction is crucial (10% difference at n=10)
Medium samples (n = 30-100): still noticeable (3.3% at n=30, 1% at n=100)
Large samples (n > 100): essentially negligible
As $n \to \infty$ : $\frac{n}{n - 1} \to 1$ , correction vanishes

Max Wolf's Second Brain

Explorer

sample statistic

Graph View

Backlinks