year: 2018
paper: https://sci-hub.st/https://www.nature.com/articles/s41593-018-0200-7
website:
code:
connections: predictive coding, karl friston
There are still swathes of computational neuroscience that concern themselves almost exclusively with learning and ignore the inference problem (for example, reinforcement learning). Conversely, vanilla predictive processing can often overlook the experience-dependent learning that accompanies evidence accumulation, as well as the Bayesian model selection (a.k.a. structure learning) of models per se. This polarization may reflect the differences in conceptual lineage: predictive coding takes its lead from perceptual psychology, while reinforcement learning is a legacy of behaviorism. This dialectic is also seen in machine learning, with deep learning on the one hand and problems of data assimilation and uncertainty quantification on the other. The have been heroic attempts to bridge this gap (for example, amortization procedures in machine learning that, effectively, learn how to infer). However, these attempts do not appear to reflect the way that the brain has gracefully integrated perception and learning within the same computational anatomy. This may be important, if we aspire to create artificial intelligence along neuromimetic lines. In short, perhaps the insight afforded by Rao and Ballard — that learning and perception are two sides of the same coin—may still have something important to tell us.
The theme that runs through this legacy is inference and learning the best explanation for our sensorium.
In other words, the brain is in the game of optimizing neuronal dynamics and connectivity to maximize the evidence for its model of the world. So what form does this evidence take?
→ For a statistician, it is just bayesian model evidence: the probability of observing some data given a model of how those data were generated.
→ In machine learning, the evidence comprises a variational bound on log-evidence.
→ In engineering, it is the cost functions associated with Kalman filters.
→ For an information theorist, it would be the efficiency or minimum description length.
→ Finally, in the realm of predictive coding, the evidence is taken as the (precision weighted) prediction error.
→ Crucially, these are all the same thing, which, in my writing, is variational free energy.
Precision
In predictive coding, precision corresponds to the best estimate of the reliability or inverse variance of prediction errors. Heuristically, only precise prediction errors matter for belief updating, where estimating the precision is like estimating the error variance in statistics (i.e., a small standard error corresponds to high precision). Technically, getting the precision right corresponds to optimizing the Kalman gain in Bayesian or Kalman filters.
…
In short, most of the interesting bits of predictive coding are about getting the precision right: selecting newsworthy, uncertainty-resolving prediction errors.