BraiNCA - brain-inspired neural cellular automata and applications to morphogenesis and motor control

year: 2026
paper: brainca-brain-inspired-neural-cellular-automata-and-applications-to-morphogenesis-and-motor-control
website:
code: https://github.com/LPioL/BraiNCA | https://github.com/MaxWolf-01/brainca
connections: NCA, Benedikt Hartl, levin

Notation

$c_{i}^{t} \in R^{C}$ … cell state of cell $i$ at time $t$
$ρ_{i}^{t}$ … optional conditioning input (positional encoding, prev action, …)
$s_{i}^{t} = [c_{i}^{t}; ρ_{i}^{t}] \in R^{S}$ … extended state
$N_{i}$ … local neighborhood indices
$L_{i}$ … long-range neighborhood indices
$a_{ij}^{t} \in R^{A}$ … raw local attention score
$b_{ij}^{t} \in R^{A}$ … raw long-range attention score
$α_{ij}^{t}$ … normalized local attention weight
$β_{ij}^{t}$ … normalized long-range attention weight
$f_{attn}^{(N)}$ … local attention MLP, $R^{2 S} \to R^{A}$
$f_{attn}^{(L)}$ … long-range attention MLP, $R^{2 S} \to R^{A}$
$h_{k}^{t} \in R^{C}$ … GRU hidden state (values aggregated by attention); in the code this is the cell state $c_{k}^{t}$
$n_{i}^{t}$ … aggregated local signal
$l_{i}^{t}$ … aggregated long-range signal
$z_{i}^{t}$ … interaction vector
$f_{Z}$ … composition operator for $z$
$m_{i}^{t} \in R^{M}$ … message MLP output, GRU input
$f_{msg}$ … message MLP
$u_{t}$ … system-level feedback (e.g. one-hot prev action)
$o_{t}$ … external observation
$f_{GRU}$ … GRU cell
$r_{i}^{t} \in R^{C}$ … refinement residual
$f_{refine}$ … refinement MLP
$ℓ_{i}^{t}$ … per-cell action logits (lunar lander)
$w_{i}^{t}$ … per-cell fire probability
$R_{r}$ … cells in action region $r$
$L_{r}^{t}$ … region-level action logit
$π (a_{t} = r ∣ o_{t})$ … action policy
$N$ … number of cells
$C$ … cell state dim
$S$ … extended state dim
$A$ … attention score dim
$M$ … message dim
$T$ … final timestep

Limitations / Ideas / Obvious next steps

GRU, not QKV-attention-memory

With graph attn there’s also no any2any between the msgs per node (not sure if it matters - prlly not if u factor efficiency and diffusion over time)

Hidden state is public / the same as the public message, not compressed / …

Long-range connectivity is task-specific/engineered and fixed (Zipf’s law… tho it’s more complicated than that and different per task and… you get the point, it could be much simpler, bitter lesson pilled)

All cells get (the same) observations…

All cells are active every step

Arrangement affects performance and learning speed - let the cells arrange themselves

No growth and death; weights are the genome, evolved slowly over many generations of growth, growth happening during rollouts (i.e. genome only meta learns). This prlly (definitely) needs some pre-training to have any chance of working. Perhaps… pre-pre-training the NCA on CAs…

Unclarities

Why does the morpho experiment use additive fusion $[c; n + l]$ for the interaction vector while lunar lander concatenates $[c; n; l]$ ? Not motivated in the paper.

why dont they do all to all comms in the attn agg? efficiency? unnecessary? (same info is spread out through timesteps) - maybe full attn only worth it for internal states

Lunar lander readout: paper prose says ” $f_{refine}$ output is split into two heads” but the equations write both heads as reading from $h_{i}^{t}$ directly. $f_{act}$ : $M \to M \to 2$ , but $h_{i}^{t} \in R^{C}$ , $C \neq = M$ suggests the prose is right and the equations are wrong. And it also makes more sense that way…

BraiNCA

Init:
$c_{i}^{0} \sim N (0, 0.1 \cdot I)$ (morpho) or $c_{i}^{0} = 0$ (lunar)
Obs $o_{t}$ and optional system-feedback $u_{t - 1}$ broadcast to all cells (last action taken by lander, one-hot; empty in morpho).
Connections
Repeat for $T$ steps (morpho: $T = 35$ total; lunar: $n_{steps} = 3$ per env step), $i \in {1, \dots, N}$ cells ( $N = 256$ ):
$s_{i}^{t} a_{ij}^{t} b_{ij}^{t} z_{i}^{t} m_{i}^{t} h_{i}^{t} \tilde{h}_{i}^{t} c_{i}^{t + 1} ℓ_{i}^{t} = [c_{i}^{t}; ρ_{i}^{t}] = f_{attn}^{(N)} ([s_{i}^{t}; s_{j}^{t}]), α_{ij}^{t} = softmax_{j} (a_{ij}^{t}), n_{i}^{t} = \sum_{k \in N_{i}} α_{ik}^{t} h_{k}^{t} = f_{attn}^{(L)} ([s_{i}^{t}; s_{j}^{t}]), β_{ij}^{t} = softmax_{j} (b_{ij}^{t}), l_{i}^{t} = \sum_{k^{'} \in L_{i}} β_{i k^{'}}^{t} h_{k^{'}}^{t} = f_{Z} (s_{i}^{t}, n_{i}^{t}, l_{i}^{t}) = f_{msg} (z_{i}^{t}; u_{t}, o_{t}) = f_{GRU} (m_{i}^{t}, h_{i}^{t - 1}) = h_{i}^{t} + f_{refine} (h_{i}^{t}) = {\tilde{h}_{i}^{t} W_{C} \tilde{h}_{i}^{t} (morphogenesis) (lunar lander) = f_{act} (\tilde{h}_{i}^{t}) (lunar lander only)$

Lunar lander

cells are placed in regions, quadrants or somatotopic (T-shaped, like the body of a lander, left right thrusts up top, main thruster down bottom, noop action regions bottom left and right). The noop region is excluded from LR connections, which is not motivated.

softmax over logits $(ℓ_{noop}, ℓ_{fire})$ → fire probability $w_{i}$

majority vote, confidence weighted

Within each region, compute a weighted average of fire logits , weighted by the fire probability/confidence $w_{i}$ :

$L_{r} = \frac{\sum _{i \in R r} w _{i} \cdot ℓ _{i, fire}}{\sum _{i \in R_{r}} w _{i}} = L_{NOOP}, L_{LEFT}, L_{MAIN}, L_{RIGHT}$

softmax over $L_{r}$ to get action dist to sample from

Detailed

Extended state $s_{i}^{t} = [c_{i}^{t}; ρ_{i}^{t}]$ $ρ_{i}^{t}$ bundles all non-cell-state inputs. What it contains is task-dependent: Morphogenesis: $ρ =$ nothing $s = c$ , $S = C = 9$ … 3 visible channels (cell type logits) + 6 hidden channels. (Code uses $C = 16$ : 3 visible + 13 hidden.) Lunar lander: $ρ_{i}^{t} = [ℓ_{i}^{t - 1}; ρ_{i}]$ where $ℓ_{i}^{t - 1} \in {0, 1}$ is the previous fire/noop decision, and $ρ_{i} \in R^{2}$ is normalized grid position ¹ (static). $S = 15$ … extended state. $C + 1 + 2 = 12 + 1 + 2$ . The 1 is the previous fire/noop decision, the 2 is the positional encoding. $C = 12$ … latent dim (no "visible" channels, the readout is via a separate action head).

Neighbor scoring ( graph attention) $f_{attn}$ : Linear( $2 S$ , 64) → GELU → Linear(64, $A$ ), applied to the concatenated pair $[s_{i}^{t}; s_{j}^{t}]$ , with $A = 1$ (a scalar score per neighbour). Each cell scores each of its neighbours independently, and local and long-range use separate $f_{attn}$ . In the official implementation, the local scorer uses GELU but the long-range one uses ReLU, even though the paper says GELU for both.

\begin{align*}
a_{ij}^t &= f_{\text{attn}}^{(\mathcal{N})}([\mathbf{s}i^t;\mathbf{s}j^t]), \quad \alpha{ij}^t = \text{softmax}j(a{ij}^t), \quad \mathbf{n}i^t = \sum{k\in\mathcal{N}i}\alpha{ik}^t,\mathbf{h}k^t \
b{ij}^t &= f{\text{attn}}^{(\mathcal{L})}([\mathbf{s}i^t;\mathbf{s}j^t]), \quad \beta{ij}^t = \text{softmax}j(b{ij}^t), \quad \mathbf{l}i^t = \sum{k’\in\mathcal{L}i}\beta{ik’}^t,\mathbf{h}{k’}^t
\end{align*}

$Scoring uses the extended states $\mathbf{s}$; the aggregated values are the hiddens $\mathbf{h}_k$, i.e. the cell states $\mathbf{c}_k$ (see GRU note). Grid edges: the original impl. zero-pads, so a boundary cell's missing neighbours enter the softmax as zero vectors instead of being dropped. That gives every cell a distance-from-edge signal, which for morphogenesis is its only positional information (besides the init … which is supposed to be random… according to the paper, contrary to the code, but non-random would be a silly memorization task so all my final experiments use random inits). I first tried masking the edges instead but morphogenesis convergence was much more fragile (lunar has positional encodings).$

"Interaction vector" (cell input)

\mathbf{z}{i}^{t}=f{Z}(\mathbf{s}{i}^{t},\mathbf{n}{i}^{t},\mathbf{l}_{i}^{t})

Morphogenesis vanilla: $z_{i}^{t} = [c_{i}^{t}; n_{i}^{t}]$
Morphogenesis long-range: $z_{i}^{t} = [c_{i}^{t}; n_{i}^{t} + l_{i}^{t}]$
Lunar lander long-range: $z_{i}^{t} = [c_{i}^{t}; n_{i}^{t}; l_{i}^{t}]$

"Message MLP"

$m_{i}^{t} = f_{msg} (z_{i}^{t}; u_{t}, o_{t})$
In morphogenesis, $u_{t}$ and $o_{t}$ are empty so it’s just $f_{msg} (z_{i}^{t})$ . The code makes it a stack of 1×1 convs, $2 C \to 64 \to 32 \to C$ , so $m$ comes out $C$ -dim and the GRU input matches its hidden width. The paper never states these dims.
In lunar lander, they’re simply concatted to $z_{i}^{t}$ , all cells receive the same obs.

GRU update + readout $h_{i}^{t - 1}$ as its recurrent input, so it reads like $h$ is a second state that persists on its own, next to $c$ . The code keeps no separate $h$ : it feeds $c_{i}^{t}$ in as the hidden; wherever the paper writes $h^{t - 1}$ the implementation uses $c^{t}$ . That includes the $h_{k}$ that attention aggregates, which are just the neighbours' cell states $c_{k}$ .

The paper’s GRU takes

\begin{align*}
\mathbf{h}{i}^{t} &= f{\textnormal{GRU}}(\mathbf{m}{i}^{t},;\mathbf{c}{i}^{t}) \
\tilde{\mathbf{h}}_i^t &= \mathbf{h}i^t + f{\textnormal{refine}}(\mathbf{h}_i^t)
\end{align*}

$No $\mathbf{c}_t$ skip connection is needed because the gate already holds the old state. $\mathbf{h} = (1-\mathbf{z})\tilde{\mathbf{h}} + \mathbf{z}\,\mathbf{c}_t$ is an interpolation, so it's bounded, and $\mathbf{c}_{t+1} = \mathbf{h} + f_\text{refine}(\mathbf{h})$ doesn't blow up. If you take the paper literally instead (a separate $\mathbf{h}$) and add a $\mathbf{c}_t$ skip connection on top, which is what our first reimpl did, you get $\mathbf{c}_{t+1} = \mathbf{c}_t$ plus an $O(1)$ term that never goes to zero, and it blows up. $f_\text{refine}$: the "refinement MLP" (two dense layers, $C$ units, GELU), a [[ResNet|resblock]] on the GRU output. Morphogenesis: $\mathbf{c}_i^{t+1} = \tilde{\mathbf{h}}_i^t$ Lunar lander: $\mathbf{c}_i^{t+1} = W_C \, \tilde{\mathbf{h}}_i^t$ (state projection) and $\boldsymbol{\ell}_i^t = f_\text{act}(\tilde{\mathbf{h}}_i^t)$ (action logits). $f_\text{act}$: two-layer MLP (GELU), outputs 2D (noop, fire).$

Graph View

BraiNCA - brain-inspired neural cellular automata and applications to morphogenesis and motor control