NCA – Summary from: Neural cellular automata - applications to biology and beyond classical AI
Neural Cellular Automata (NCA) represent a powerful framework for modeling biological self-organization, extending classical rule-based systems with trainable, differentiable (or evolvable) update rules that capture the adaptive self-regulatory dynamics of living matter. By embedding Artificial Neural Networks (ANNs) as local decision-making centers and interaction rules between localized agents, NCA can simulate processes across molecular, cellular, tissue, and system-level scales, offering a multi-scale competency architecture perspective on evolution, development, regeneration, aging, morphogenesis, and robotic control. These models not only reproduce canonical, biologically inspired target patterns but also generalize to novel conditions, demonstrating robustness to perturbations and the capacity for open-ended adaptation and reasoning through embodiment. Given their immense success in recent developments, we here review current literature of NCAs that are relevant primarily for biological or bioengineering applications. Moreover, we emphasize that beyond biology, NCAs display robust and generalizing goal-directed dynamics without centralized control, e.g., in controlling or regenerating composite robotic morphologies or even on cutting-edge reasoning tasks such as ARC-AGI-1. In addition, the same principles of iterative state-refinement is reminiscent to modern generative AI, such as probabilistic diffusion models. Their governing self-regulatory behavior is constraint to fully localized interactions, yet their collective behavior scales into coordinated system-level outcomes. We thus argue that NCAs constitute a unifying computationally lean paradigm that not only bridges fundamental insights from multiscale biology with modern generative AI, but have the potential to design truly bio-inspired collective intelligence capable of hierarchical reasoning and control.
NCA
A CA comprises a discrete (continuous version: lenia), typically 2D grid of cells , each maintaining a numerical, vector-valued state . The state of each cell evolves over discrete timesteps via local transition rules, by considering the cell’s own state and all the states of its neighbourhood:
… neighbourhood of cell , often the Moore neighbourhood.
Even hardcoded update functions give rise to elaborate complex behaviors, but the framework allows any local 1 udpate function (deterministic, stochastic, etc.).
NCAs extend CAs by using learnable – often differentiable – update rules that are realized by an ANN with parameters and enable each cell on the NCA to self-regularize its continuous vector valued states based on local perceptions of its local neighbourhood on the grid (which, for cognitive tasks really shouldn’t be a grid, but a self-organizing graph).
In vanilla NCAs, all cells use the same ANN architecture for their updates (parameter-sharing), and still give rise to assymetric and potentially incredibly rich cell-state dynamics.Most basic form: CNN → dense ffwd → ,
are weighs & biases; usually on the order of to parameters.
Transclude of Growing-Neural-Cellular-Automata#^b1adaf
Asynchronycity & Robustnesss
To increase robustness, stochastic cell updates have been introduced – via an update probability – that constrains the cells’ reliability to regulate their own state by discarding proposed updates across stochastically chosen cells and time steps. This explicitly implements an asynchronous update process across the NCA’s grid that requires cells to generalize across cellular neighborhoods by learning how to distinguish between signal and noise. This avoids overfitting cellular policies on local cell-state dynamics and, in turn, promotes global patterns as collective attractor states enabling robust morphogenesis and self-repair.
Notably, this renders NCAs as closely related to denoising diffusion models.Explicitly adding noise to the update function has proven effective to enhance learning and generalization capabiltiies.
The possibilities this opens for distributed computing is quite exciting to me.
Do we still need to artificially add noise if we have true asynchronous distributed computing? (i.e. if the substrate is naturally unreliabe – adding additional noise is redundant / counterproductive?)
What about other kinds of stochasticity, like occasionally receiving messages outside the local neighbourhood?
Breaking the locality assumption to create small-world network rather than a regular grid – like we have in the brain.
Levin did mention it in Neural cellular automata - applications to biology and beyond classical AI.
Pool poisoning?
NCAs can still be difficult to train, as solutions can collapse into trivial attractors (especially with gradient-based methods via pool-poisoning)
Answer: Pool-based training samples diverse intermediate states during morphogenesis to prevent overfitting to a single developmental trajectory. The NCA is trained on randomly sampled states from this pool to learn robust self-repair. However, gradient descent can exploit this: if the loss landscape admits a trivial solution where all cells converge to a homogeneous state (a point attractor), the pool gradually fills with these degenerate samples. Once poisoned, the pool no longer provides useful training signal—every sample reinforces the collapse rather than diverse pattern formation.
This is fundamentally a mode collapse problem. The NCA’s update rule can satisfy the loss function by finding the nearest fixed point in state space rather than learning the intended morphogenetic attractor basin. Stochastic updates (the asynchronous update mask ) and explicit noise injection () help by requiring the policy to be robust across perturbed neighborhoods, but they don’t fundamentally prevent gradient descent from finding trivial solutions.
→ Evolutionary methods for NCA avoids this and is more robust: Evolutionary Implications of Self-Assembling Cybernetic Materials with Collective Problem-Solving Intelligence at Multiple Scales
I don't get all the explicitly hierarchical approaches.
I mean as an inductive bias, sure, but like isn’t the entire point / hope that NCA interaction dynamics lead to higher order coordination / loops by themselves?
Is there something funamentally missing for that from NCAs, or is it just a matter of evolutionary-scale compute / time?
I like claude’s answer here, but yh my hypothesis is also that more flexible graph comm structure need to be allowed to develop:The pragmatic answer: current training limitations. We lack the evolutionary timescales and selection pressures that biological systems had. The French flag experiment shows that when you properly tie cellular and tissue-level homeostatic loops through reward structure, hierarchy emerges—individual cells learn to coordinate into collectives without explicit hierarchical architecture. Similarly, NCAs pretrained to operate at criticality develop long-range correlations and scale-free dynamics that naturally support hierarchical information processing.
The deeper issue is timescale separation. Biological systems exploit multiple characteristic timescales—fast molecular interactions, slower cellular decisions, even slower tissue remodeling—that create natural hierarchical structure through dynamical system rather than architecture. EngramNCA approximates this by having GenePropCA operate on private genes at potentially different rates than GeneCA’s public state updates. Explicitly hierarchical NCAs (stacked layers at different resolutions) are essentially admitting: “we can’t afford to evolve the timescale separation, so we’ll hard-code the spatial scale separation as an inductive bias.”
But you’re fundamentally correct: if we had evolutionary-scale compute, proper multi-objective selection pressures (survival, reproduction, evolvability), and allowed for sufficient architectural diversity (not just fixed grids), hierarchy should emerge. The paper notes that NCAs “might undergo a similar transition from multicellular communication pathways observed in basal tissue to scale-free activation patterns observed in neuronal networks”—this is the dream. Explicit hierarchy is a scaffold we use because we’re impatient, not because it’s necessary in principle. See also Emergence of Hierarchical Layers in a Single Sheet of Self-Organizing Spiking Neurons for evidence that flat architectures can develop functional hierarchy through learning alone.
read Evolving Hierarchical Neural Cellular Automata to better answer the above
Todo
Learning Global Rules from Local Patches - Scaling Neural Cellular Automata Training
AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer skim / summary
Recurrent Neural Cellular Automata with Self-Attention for Multi-agent System skim / summary
Attention-based Neural Cellular Automata
https://github.com/maxencefaldor/cax
VARIATIONAL NEURAL CELLULAR AUTOMATA
Selection for short-term empowerment accelerates the evolutionof homeostatic neural cellular automata
https://distill.pub/selforg/2021/textures/
https://google-research.github.io/self-organising-systems/isonca/
Stateless (recurrent) NCA?(what was my point here? ncas are alrdy reccurent, no? and this suggestion doesn't make it stateless?)So instead of outputing a state delta, we output a hidden state, and feed it recurrently back into the network.
The picture I have in mind is appending memory latents to the conext recursively [memwindow, messages, (obs)] → [hidden] → [next memory, …]!!!
This is essentially what EngramNCA implements through its separation of private and public cell states. The private “gene” channels act as internal recurrent memory that’s updated by GenePropCA while the public channels evolve through GeneCA. The key insight is that you need two separate update functions operating at potentially different timescales: one for observable states (analogous to fast synaptic updates) and one for hidden memory (analogous to slower molecular/epigenetic changes).But why use two separate nets instead of context window?
!!! I should take a look at Recurrent Memory Transformer
Link to originalNCA's might be a perfect candidates for modelling cortical columns
This might stand in stark contrast to biological neurons, especially to cortical columns in the human neocortex. The neocortex is formed by an integrated 2D grid of copies of the same neuronal circuit, the cortical columns, which have been argued to be capable of learning arbitrary concepts of our reality (objects, animals, other human beings, mental constructs, etc.). More importantly, these cortical columns might literally “model” the learned concepts and thus represent interactable reference frames or world models for relevant features of our Umwelt.
Remembering would thus trigger models of past experiences, and thinking would translate not only into a navigation process through an associative conceptual space, but dynamically construct a network of interacting world models that are relevant under a certain context. NCAs might be excellent candidates to model such an architecture: they maintain a trainable ANN in each cell of an integrated grid which are in principle capable of representing reference frames of arbitrary concepts. In turn, NCAs might not only be excellent models for biological self-organization as a multi-scale competency architecture, but even for higher-level cognitive processes of the human neocortex such as active perception, raising fascinating questions about the parallels of morphogenesis and cognition.
Link to originalBiological agents – across scales – are embodied agential multi-agent networks with world models performing active inference (and NCA is a good model for that).
Viewed through this lens, biological agents at every level function not merely as components or effectors, but as embodied representations of their respective environment, while pursuing developmental goals and systemic integrity. For example, a bistable enzyme may be an embodied model of ligand-specific decision-making; its conformational switch encodes a dynamic inference about molecular context. The transcriptome functions may be seen as an embodied intracellular model of cell-fate specification within the morphogenetic field. Bioelectric fields – coordinated by collective cellular behavior – serve as coarse-grained, spatiotemporally models guiding embodying cellular collectives toward large-scale pattern formation of target system-level outcomes. Even animals can be understood as living world models (JEPA/Recurrent World Models Facilitate Policy Evolution): while their bodies are evolutionary fitted for the challenges of their Umwelt, their brains do not simply simulate the environment but embody its structure through continuous interaction. Neural circuits are not merely predictive tools, they function as dynamic, embodied representations of environmental regularities, integrating perception, action, and anticipation in real time (thousand brains theory, NOW model?); this might even scale to swarms or groups as distributed models of environmental stability and multi-agential coordination .
Crucially, these modeling aspects are not invented anew at every scale, but are regularities themselves – and if this process applies at every scale, this necessarily includes repurposing lower-level organizational units as building blocks to form higher-level embodied models. Evolution is tinkering, i.e., life does not design from scratch, it reuses, refactors, and repurposes across scales. Biological systems are not just composites of passive parts but are layers within layers of repurposable agential substrates – each unit simultaneously acting as an agent, a model, but also as signal. Proteins become signals in GRNs; the GRN state informs cell identity; cellular states coalesce into tissue-scale morphogenetic patterns, etc., all structured by self-modeling dynamics.
Thus, biological organization might fundamentally be rooted in self-modelling of scale-dependent (relevant) environmental contexts through repurposable embodiments – scale-bridging representations that allow nested agents to maintain their physiological integrity against diffusive, entropic processes; this has been formalized on thermodynamic grounds via the variational free energy principle, or active inference. This suggests that biology might be understood as models- within-models rather than merely layers-within-layers of organization – a multi-scale competency architecture where agency and embodied representation are inseparable.NCAs provide a computational framework that mirrors this hierarchical representational architecture: flexible local rules allow to generate global structure through embodied learning and coordination, offering a tractable way to explore how scale-bridging models emerge from simple, repurposable units and providing insight into the dynamics of nested biological self-modeling.
References
- (YT, Emergent Garden) What are neural cellular automata?
- Random code : https://github.com/erikhelmut/neural-cellular-automata/tree/main
- Attention-based Neural Cellular Automata
- Levin BP thread on differentiable self-organizing systems (with full code in notebooks!)
Footnotes
-
Well, you define the neighbourhood function however you want (spatial/functional/… relation beteween nodes… potentially changing across time). So you can have non-local interactions if you want (how far do hormones go / neuron-like long-range connections / …). But all-to-all communication or e.g. unique encodings for every node in the graph is not really in the spirit of CAs, you add global supervision/signal, but as long as you have a finite subset of the graph communicating locally, it’s still a CA. Or like in general if you have GNCA, you can have arbitrary graph structures / break free of the grid / use any geometry you like while still satisfying the local interaction constraint of CAs. https://youtu.be/ilrl_opwpEw?t=3814 ↩