year: 2020/03
paper: https://arxiv.org/pdf/2003.11642
website:
code: https://github.com/Multi-Agent-Networks/NaRLA
connections: biologically inspired, multi-agent networks, spiking neural networks, Jordan Ott, the neuron as an agent


Key Points

  1. We frame individual neurons as reinforcement learning agents
  2. Each neuron attempts to maximize its own rewards
  3. Neurons compete with their neighbors while simultaneously cooperating
  4. Networks of independent agents solve high-level reinforcement learning tasks
  5. While maximizing their own self-interest neurons learn to cooperate for the betterment of the collective
  6. Biologically motivated - local - reward schemes improve network dynamics
  7. Biological exactness regarding ions, channels, and proteins are not as important as the competition, rewards, and computation paradigms they induce

Neuron Competition

Neurons in cortex are individual cells defined by a semi-permeable lipid
bilayer. This membrane encapsulates the cell and isolates its contents from
other neurons and the extracellular material. Neurons make connections to each other via synaptic junctions, where chemical or electrical information can be exchanged.
These individual actors compete with their neighbors for resources, inhibition,
and activity expression. For neurons to remain healthy, they must receive adequate resources from the extracellular solution as well as homeostatic cells (i.e., glial cells).
As in nature, resources may often be scarce, requiring trade-offs in their allocation.

Neuron level rewards

Neurons may be rewarded based on a variety of events and interactions with other agents in their environment. Rewards proposed in this paper come from a variety of interactions. Neurons receive rewards via global task-based performance signals or local signals within neighborhoods of multi-agent networks. The rewards proposed in this subsection are derived from observations regarding the function of cortical circuits.
Activity: In order for neurons to communicate information to one another, they must become active. This formulation states the goal of neurons is to fire signals, thereby encoding information via their spikes. As a result, neurons that fire receive a positive reward, whereas neurons that do not receive a negative reward.
Sparsity: The amount of activity, or lack thereof, is an important aspect of cortical networks. Layers that are too sparse do not provide enough information and are highly susceptible to noise. If only a few neurons are active, the semantic meaning of these neurons can be changed easily by a few erroneously active neurons. However, if layers are not sparse enough, similar issues arise. A population with all neurons being active cannot code any meaningful information. Consequently, neurons optimize for a specific sparsity level. Within the population, neurons that become active are rewarded for meeting sparsity levels, and neurons that violate sparsity levels are penalized.
Prediction: In order to learn high-level stimuli, neurons must be temporally aware. This temporal awareness manifests itself by learning sequences. Instead of learning sequences at the network-level, we propose learning sequences within the network. In order to incentivize this, agents must identify which neurons will become active next. Consequently, neurons receive rewards when their activity predicts the activity of their postsynaptic contact. Conversely, neurons that erroneously predict inactive neurons receive penalties.
Activity Trace: The rewards described thus far lead to an obvious attractor in the network dynamics. The same group of neurons will always become active, thus achieving the correct sparsity level and correctly predicting the next active neuron downstream. In order to remedy this, we introduce an activity trace. This reward penalizes neurons that are always active or always silent. Biological neurons can only fire so often in a short period, as local resources and the health of the cell serve as activity constraints. Similarly, neurons that never fire are unlikely to be maintained and kept healthy. This reward enforces diversity in the coding of activity across the population.
Neurons must learn to balance these rewards, maximizing their own self-interest while making trade-offs for the good of the population. Neurons may want to become active at every time step based on the activity reward. However, this will lead to a violation of sparsity, activity trace, and a decrease in the task level reward. As a result, agents must learn to make trade-offs between these rewards. Ultimately, finding the optimal time to become active to acquire rewards while encoding useful information about the stimuli. With neurons, multi-agent networks, and reward schemes now introduced, we now describe settings for experiments conducted.

No environment specific reward - shift towards agent internal rewards

By moving the locality of reward generation inside the agent, there is no longer a strict reliance for the agent to directly maximize the designated reward. Instead, it allows the agent to asses stimuli, assign a value to it, and take actions to achieve their intrinsic goals.