p-value

A p-value is the probability of observing data as extreme as (or more extreme than) what you got, assuming the null hypothesis is true.

The fundamental misinterpretation

$P (data ∣ null) \neq = P (null ∣ data)$

You test a coin, get 14 heads in 20 flips (p=0.057). This means “if the coin were fair, you’d see results this extreme 5.7% of the time”, NOT “there’s a 5.7% chance the coin is fair”.

The actual probability the coin is fair depends on your prior beliefs:

Random quarter from someone’s pocket → probably still ~98% chance it’s fair

Coin from a magic shop → maybe only ~2% chance it’s fair

Same data, same p-value, completely different conclusions! To get P(null|data) you’d need Bayes theorem and a prior.

Common pitfalls

Arbitrary thresholds: The 0.05 cutoff is just convention.
P-hacking: Running many tests until p < 0.05, then only reporting the “significant” ones. This invalidates the entire interpretation.
Binary thinking: Treating p=0.049 as fundamentally different from p=0.051, when they’re practically identical.

The American Statistical Association took the unusual step of releasing an official warning about p-value misuse:

P-values can indicate incompatibility between data and the null hypothesis

P-values do NOT measure the probability the hypothesis is true

Scientific conclusions shouldn’t be based solely on whether p < 0.05

Proper inference requires full reporting (no cherry-picking)

P-values don’t measure effect size or importance

By itself, a p-value provides limited information

Max Wolf's Second Brain

Explorer

p-value

Graph View

Backlinks