year: 2022
paper: https://royalsocietypublishing.org/doi/10.1098/rsfs.2022.0072, https://arxiv.org/pdf/2211.08522
website:
code: https://github.com/LPioL/scalefreecognition/tree/main
connections: michael levin, goal, scaling, homeostasis
All intelligences are collective intelligences, they are made up of parts that are themselves biological agents.
The term goal is not used in some meta-cognitive, self-aware, complex sense, but that a system can expan effort in order to reduce error from a specified homeostatic setpoint.
Goal of this paper
Make individual cells with simple goals cooperate to reach a bigger goal (morphogenetic formation in three different stripes, aka “french flag”), just by local communication, reinforcement learning, evolution, and top-down rewards w.r.t morphogenetic goal.
Paper to soup terminology translation & explanation of biological terms
- morphogen “molecules” : (simple form of) messages / communication system; their concentration within a cell defines the cell state (“changes cell state”)
- gap junction : synapse / edge
- stress and stress reduction “molecules” : (scalar) messages (that “don’t change cell state”, but change the stress level)
- “ontogenetic”: the development of an organism
- “homeostasis”: the maintenance of a stable internal environment
- “gene-regulatory networks”: a system in the cell that controls the activity of genes (i.e. the cell state / behaviour)
Cell model details summary
Cells are modelled by an ANN.
Each cell can contain the following:
- the energy state, necessary for survival and for state change
- morphogens that can go through gap junctions to define the cell state and stress and stress-reduction
- molecules, which can also move between cells via gap junctions, and which define the stress state.
Neural network input:
- morphogens
- energy at time t and t-1
- stress at time t and t-1
- internal state at time t
- size of the cell collective the cell is part of (number of cells of the same state connected by opened gap junc-
tions in the tissue)- the error between the size of the collective the cell is part of and the target size in percentage
- the perception of the cell neighbourhood ( geometrical frustration or how similar the cell is to its neighbours)
- a bias set at 0.5
Outputs of the ANN:
- number of morphogen molecules to send to neighbours
- number of stress molecules to send to neighbours and to be applied to the cell itself
- the number of stress-reduction molecules to send to its neighbours and to be applied to the cell itself
- the opening of gap junctions in the four directions
The level of stress of one cell is bounded between 0 and 100.
Cell
Each cell contains an ANN (representing the computational functions carried out by gene-regulatory networks and pathways operating within cells) which controls the opening and closure of gap junctions in four directions: up, down, left, right.
Each cell has one type of morphogen molecules that can go through the gap junctions depending on their openings and that will trigger different kinds of genomic expression leading to three different cell states: blue, white, or red. These states depend only on the level of morphogen inside the cells.
We also imposed an energy cost on state change. Each cell also has a stress molecule and an antistress counterpart that can be sent to its neighbours.Cells need to keep their energy level > 0 in order to survive.
Stress system
The model does not enforce the use of the stress system: it is completely evolved since the number of stress and stress-reduction molecules to be transferred is chosen by the embedded neural network. In addition, evolution can decide to make the stress molecule instructive or not. We have chosen to refer to it as the stress molecule but, before evolution, it is just another communication system for cell signalling that does not directly affect the cell state.
We implemented a communication system that enables diffusion of a stress molecule through the tissue to allow other cells to feel stress that was not caused by their own internal state.
When the ANN of a cell makes the decision to send stress, that cell will diffuse a molecule to its neighbours, equally increasing its own stress level and that of its neighbours. In the same manner, the ANN can also cause the cell to send a stress-reduction molecule to the neighbouring cells that will decrease the cell stress levels. Importantly, whether or not (and how) the stress system is used to solve the patterning problem is not determined in the code. Evolution can produce agents that do or do not use this set of signals as a communication system. In other words, before evolution, what we call the stress system here is just another communication system in the tissue where a specific cell signalling will increase (or decrease) the level of a molecule inside the cell.
Multi-Agent RL with Reward uncertainty - Two homeostatic loops
We tied the single-cell and the anatomical homeostatic loops. Each cell has only one goal—to survive—and that corresponds to being in the appropriate state in order to receive energy (with the other members of the collective). On the other hand, the collective/ tissue has a morphogenetic goal, which is to reach the French flag.
What they mean with “tied the loops” is, that the goal of the individual cell - survival - is dependent on achieving the larger goal: Assembling in the french flag formation.
At each step, each cell receives a reward in the form of an amount of energy which is proportional to how close its corresponding sub-collective (corresponding to one stripe and therefore bigger that the immediate neighbourhood of one cell) is to reaching the appropriate anatomical goal. Each cell therefore has a reward uncertainty, because the reward depends not only on that cell’s own behaviour but also on the behaviour of the (sub)-collective; in other words, the reward is affected by the decisions of distant cells. Therefore, the environment of the cells is of high uncertainty. In a sense, this scheme can be understood as a problem of multi-agent reinforcement learning under reward uncertainty. Goal-directed systems have the ability to focus on relevant information and ignore distracting information. To do so, they rely on selective attention and/or interference suppression. Selective attention would rely on top-down biasing mechanisms as proposed by Desimone & Duncan. In our case, the top-down biasing mechanism is represented by the reward in energy that ties the two homeostatic levels. We also imposed an energy cost for communication and state changes (respectively, of 0.8 and 0.25 per step).
Goal-directed systems have the ability to focus on relevant information and ignore distracting information.
To do so, they rely on selective attention and/or interference suppression. Selective attention would rely on top-down biasing mechanisms as proposed by Neural Mechanisms of Selective Visual Attention.
Each cell received energy according to is location on the tissue, and its energy reward was proportional to how well the other cells of one stripe of the French flag were resolving the French flag pattern (in other words, evolution gives partial credit for imperfect primary axial patterning, selecting for embryos with optimal morphogenesis).
Todo
- Neat
- Information / Analysis measures
- Implement
He talks about MARL, but in the code they don’t use any kind of RL?