year: 2022
paper: https://royalsocietypublishing.org/doi/10.1098/rsfs.2022.0072, https://arxiv.org/pdf/2211.08522
website:
code: https://github.com/LPioL/scalefreecognition/tree/main
connections: michael levin, goal, scaling, homeostasis


All intelligences are collective intelligences, they are made up of parts that are themselves biological agents.

The term goal is not used in some meta-cognitive, self-aware, complex sense, but that a system can expan effort in order to reduce error from a specified homeostatic setpoint.

Goal of this paper

Make individual cells with simple goals cooperate to reach a bigger goal (morphogenetic formation in three different stripes, aka “french flag”), just by local communication, reinforcement learning, evolution, and top-down rewards w.r.t morphogenetic goal.

Paper to soup terminology translation & explanation of biological terms

  • morphogen “molecules” : (simple form of) messages / communication system; their concentration within a cell defines the cell state (“changes cell state”)
  • gap junction : synapse / edge
  • stress and stress reduction “molecules” : (scalar) messages (that “don’t change cell state”, but change the stress level)
  • “ontogenetic”: the development of an organism
  • “homeostasis”: the maintenance of a stable internal environment
  • “gene-regulatory networks”: a system in the cell that controls the activity of genes (i.e. the cell state / behaviour)

Cell

Each cell contains an ANN (representing the computational functions carried out by gene-regulatory networks and pathways operating within cells) which controls the opening and closure of gap junctions in four directions: up, down, left, right.
Each cell has one type of morphogen molecules that can go through the gap junctions depending on their openings and that will trigger different kinds of genomic expression leading to three different cell states: blue, white, or red. These states depend only on the level of morphogen inside the cells.
We also imposed an energy cost on state change. Each cell also has a stress molecule and an antistress counterpart that can be sent to its neighbours.

Cells need to keep their energy level > 0 in order to survive.

Stress system

The model does not enforce the use of the stress system: it is completely evolved since the number of stress and stress-reduction molecules to be transferred is chosen by the embedded neural network. In addition, evolution can decide to make the stress molecule instructive or not. We have chosen to refer to it as the stress molecule but, before evolution, it is just another communication system for cell signalling that does not directly affect the cell state.


We implemented a communication system that enables diffusion of a stress molecule through the tissue to allow other cells to feel stress that was not caused by their own internal state.
When the ANN of a cell makes the decision to send stress, that cell will diffuse a molecule to its neighbours, equally increasing its own stress level and that of its neighbours. In the same manner, the ANN can also cause the cell to send a stress-reduction molecule to the neighbouring cells that will decrease the cell stress levels. Importantly, whether or not (and how) the stress system is used to solve the patterning problem is not determined in the code. Evolution can produce agents that do or do not use this set of signals as a communication system. In other words, before evolution, what we call the stress system here is just another communication system in the tissue where a specific cell signalling will increase (or decrease) the level of a molecule inside the cell.

Multi-Agent RL with Reward uncertainty - Two homeostatic loops

We tied the single-cell and the anatomical homeostatic loops. Each cell has only one goal—to survive—and that corresponds to being in the appropriate state in order to receive energy (with the other members of the collective). On the other hand, the collective/ tissue has a morphogenetic goal, which is to reach the French flag.

What they mean with “tied the loops” is, that the goal of the individual cell - survival - is dependent on achieving the larger goal: Assembling in the french flag formation.

At each step, each cell receives a reward in the form of an amount of energy which is proportional to how close its corresponding sub-collective (corresponding to one stripe and therefore bigger that the immediate neighbourhood of one cell) is to reaching the appropriate anatomical goal. Each cell therefore has a reward uncertainty, because the reward depends not only on that cell’s own behaviour but also on the behaviour of the (sub)-collective; in other words, the reward is affected by the decisions of distant cells. Therefore, the environment of the cells is of high uncertainty. In a sense, this scheme can be understood as a problem of multi-agent reinforcement learning under reward uncertainty. Goal-directed systems have the ability to focus on relevant information and ignore distracting information. To do so, they rely on selective attention and/or interference suppression. Selective attention would rely on top-down biasing mechanisms as proposed by Desimone & Duncan. In our case, the top-down biasing mechanism is represented by the reward in energy that ties the two homeostatic levels. We also imposed an energy cost for communication and state changes (respectively, of 0.8 and 0.25 per step).

Goal-directed systems have the ability to focus on relevant information and ignore distracting information.

To do so, they rely on selective attention and/or interference suppression. Selective attention would rely on top-down biasing mechanisms as proposed by Neural Mechanisms of Selective Visual Attention.

Each cell received energy according to is location on the tissue, and its energy reward was proportional to how well the other cells of one stripe of the French flag were resolving the French flag pattern (in other words, evolution gives partial credit for imperfect primary axial patterning, selecting for embryos with optimal morphogenesis).

Todo

  • Neat
  • Information / Analysis measures
  • Implement

He talks about MARL, but in the code they don’t use any kind of RL?