Read/listen to the thing again and take deeper notes
The essay proposes a multi-level optimization view
A slow, sample-inefficient, unbiased “outer loss” (e.g., death, bankruptcy, reproductive fitness) trains and constrains a fast, sample-efficient but potentially mis-specified “inner loss” (e.g., neural control, corporate planning, policy gradients), yielding robust performance despite inner imperfections.
Corporations
Internal planning optimizes complex logistics and pricing, but bankruptcy/profit acts as an outer signal that ultimately selects firms and cultures of decision-making.
But firms don’t replicate like organisms, so selection is slow and limited → outer backstop matters.
Multicellular life
Genes implement a developmental program and brain that enable fast in-lifetime learning, while evolution selects across generations, suppressing inner defection (e.g., cancer) and shaping pain/reward as ground-truth signals.
RL and meta learning
Population-based training and two-tier setups use win/loss as an outer objective to sculpt dense inner rewards; if inner rewards mislead learning, outer selection mutates or replaces them, combining evolution’s doggedness with gradients’ speed for more reliable progress.
An intrinsic curiosity drive alone would interact badly with a total absence of painful pain: after all, what is more novel or harder to predict than the strange and unique states which can be reached by self-injury or recklessness?
Planned economy bad?
5 says: “This framing explains why non-market planning can work well locally yet still relies on market/evolutionary “backstops,” because outer processes operate on ground-truth outcomes while inner processes use proxies that can go off course without outer correction.”
→ But rn we are optimizing for a global proxy while doing local planning because it’s more efficient, all while following the same proxy.
→ Socialist planning wouldn’t be central planning according to proxies, but democratically optimizing human preferences.