Optimal decision making, at its core, requires considering counterfactuals.

If I did X instead of Y, would it be better? You can answer that question using a learned simulator, a value function, a reward model, … in the end it’s kinda all the same, as long as you have some mechanism for figuring out which counterfactual is better.
→ The key is not necessarily to do really good simulations, but how to answer counterfactuals.

Simulation through a learned model seems to be the way the brain figures out counterfactuals (mostly during sleep).

Link to original