A p-value is the probability of observing data as extreme as (or more extreme than) what you got, assuming the null hypothesis is true.
The fundamental misinterpretation
You test a coin, get 14 heads in 20 flips (p=0.057). This means “if the coin were fair, you’d see results this extreme 5.7% of the time”, NOT “there’s a 5.7% chance the coin is fair”.
The actual probability the coin is fair depends on your prior beliefs:
- Random quarter from someone’s pocket → probably still ~98% chance it’s fair
- Coin from a magic shop → maybe only ~2% chance it’s fair
Same data, same p-value, completely different conclusions! To get P(null|data) you’d need Bayes theorem and a prior.
Common pitfalls
Arbitrary thresholds: The 0.05 cutoff is just convention.
P-hacking: Running many tests until p < 0.05, then only reporting the “significant” ones. This invalidates the entire interpretation.
Binary thinking: Treating p=0.049 as fundamentally different from p=0.051, when they’re practically identical.The American Statistical Association took the unusual step of releasing an official warning about p-value misuse:
- P-values can indicate incompatibility between data and the null hypothesis
- P-values do NOT measure the probability the hypothesis is true
- Scientific conclusions shouldn’t be based solely on whether p < 0.05
- Proper inference requires full reporting (no cherry-picking)
- P-values don’t measure effect size or importance
- By itself, a p-value provides limited information