Evolution as Backstop for Reinforcement Learning

Read/listen to the thing again and take deeper notes

The essay proposes a multi-level optimization view

A slow, sample-inefficient, unbiased “outer loss” (e.g., death, bankruptcy, reproductive fitness) trains and constrains a fast, sample-efficient but potentially mis-specified “inner loss” (e.g., neural control, corporate planning, policy gradients), yielding robust performance despite inner imperfections.

Corporations

Internal planning optimizes complex logistics and pricing, but bankruptcy/profit acts as an outer signal that ultimately selects firms and cultures of decision-making.
But firms don’t replicate like organisms, so selection is slow and limited → outer backstop matters.

Multicellular life

Genes implement a developmental program and brain that enable fast in-lifetime learning, while evolution selects across generations, suppressing inner defection (e.g., cancer) and shaping pain/reward as ground-truth signals.

RL and meta learning

Population-based training and two-tier setups use win/loss as an outer objective to sculpt dense inner rewards; if inner rewards mislead learning, outer selection mutates or replaces them, combining evolution’s doggedness with gradients’ speed for more reliable progress.

curiosity and pain

An intrinsic curiosity drive alone would interact badly with a total absence of painful pain: after all, what is more novel or harder to predict than the strange and unique states which can be reached by self-injury or recklessness?

Planned economy bad?

5 says: “This framing explains why non-market planning can work well locally yet still relies on market/evolutionary “backstops,” because outer processes operate on ground-truth outcomes while inner processes use proxies that can go off course without outer correction.”
→ But rn we are optimizing for a global proxy while doing local planning because it’s more efficient, all while following the same proxy.
→ Socialist planning wouldn’t be central planning according to proxies, but democratically optimizing human preferences.

gwern
RL
evolution

Max Wolf's Second Brain

Explorer

Evolution as Backstop for Reinforcement Learning

Graph View

Backlinks