TLDR: Compression doesn’t cause generality. Evolution finds modular patterns; finite space forces compression. You find general patterns by not optimizing for specifics.

The Weakness Principle

W-maxing (Weakness Maximization)

Optimal adaptation requires choosing the weakest correct hypothesis - the one that constrains future behavior the least while still being correct.

Weakness = size of hypothesis extension (how many possible futures it allows)
Simplicity = brevity of description (occam’s razor)

Bennett proves weakness is necessary and sufficient for optimal learning

Occam’s Razor Fails: Simplicity is neither necessary nor sufficient for generalization.
In the experiments, sometimes the weakest hypothesis was complex, sometimes the simplest hypothesis was overly specific.
Weakness always won.

Why does simplicity seem to work? Because finite space forces a correlation

The universe has spatial limits → finite vocabularies → to fit useful patterns, weak constraints often take simple forms.
But this is correlation, not causation! Simplicity doesn’t cause generalization - weakness does.
Simplicity is just a side effect when space is limited.

The Experiments

Bennett tested learning binary arithmetic with limited examples:

Training ExamplesW-maxing SuccessSimplicity Success
611%10%
1027%13%
1468%24%

W-maxing showed 110-500% better generalization - choosing general patterns beats choosing simple ones.

Swiss Army Knife vs Scalpel

Weak hypothesis: “Things bounce” (applies many situations)
Strong hypothesis: “Balls bounce exactly 5.2 inches on Tuesdays” (very specific)

The weak hypothesis is like a Swiss Army knife - less precise but useful everywhere. When you have limited data, betting on generality beats betting on simplicity.

As tasks get harder (or more numerous), fewer hypotheses fit → eventually only one remains. We can skip straight to choosing the weakest hypothesis rather than waiting for data to force convergence

It’s like solving a general problem that includes your specific case - often easier than solving just the specific case!

Takeaway

For AI: Stop optimizing for compression/simplicity. Optimize for generality.
For Biology: Life looks complex because general solutions need complex implementations in finite space
For Science: We should prefer theories that constrain least while still explaining phenomena

Link to original