Neural cellular automata - applications to biology and beyond classical AI

year: 2025/09

paper: https://arxiv.org/pdf/2509.11131
website:
code:
connections: Benedikt Hartl, michael levin, NCA, multi-scale, TAME

read these references (preprint version)

The concept of embedding spaces for morphogenetic processes was introduced through the NCA Manifold framework [52], enabling the representation of different developmental trajectories in structured latent spaces. Tesfaldet et al. incorporated self-attention mechanisms directly into the cellular update rules, allowing individual cells to selectively focus on relevant neighborhood features [53]. Information-theoretic principles were integrated into NCA design (2022), by introducing empowerment as an auxiliary objective function to encourage coordinated cellular behaviors that enhance system robustness and adaptability [54, 55]. The stochastic modeling of emergent dynamics was advanced by Palm et al. through their Variational Neural Cellular Automata framework, which employs variational inference to capture the probabilistic nature of pattern formation processes [56]. Multi-scale emergent phenomena were developed by Pande and Grattarola (2023), who proposed hierarchical NCA architectures capable of systematically modeling intercellular behaviors across different levels of resolution of hierarchically stacked NCAs [11].

Currently, the focus for scaling intelligence is based on the scaling of training data, model parameters, and computational resources. This approach is incredibly resource intensive, and so-far mimics but a fraction of the capabilities of what biology achieves seemingly effortlessly. Several prominent AI researchers criticized the costs associated with exploiting scaling laws for commercializing large language models (LLMs) and ask for the development of new AI architectures beyond transformers. While LeCun promotes a world-model based approach, NCA architectures and their self-regulatory dynamics during inference represent a potential solution to this architectural gap by implementing truly biologically inspired multiscale competency architecture.

Indeed, biological evolution modularly repurposed minimal collective goal-directed behavior (homeostatic loops) into ever higher-level problem- solving agents (composed of competent sub-agents) through competency amplification and an internal motivation for novelty, rather than data accumulation. By design, NCAs architecture implements the scaling of intelligence from local cellular communication pathways to system-level behavior and potentially to integrated world models

Current AI architectures treat neurons just as filters, uncapable of forming memories themselves.

NCA's might be a perfect candidates for modelling cortical columns

This might stand in stark contrast to biological neurons, especially to cortical columns in the human neocortex. The neocortex is formed by an integrated 2D grid of copies of the same neuronal circuit, the cortical columns, which have been argued to be capable of learning arbitrary concepts of our reality (objects, animals, other human beings, mental constructs, etc.). More importantly, these cortical columns might literally “model” the learned concepts and thus represent interactable reference frames or world models for relevant features of our Umwelt.
Remembering would thus trigger models of past experiences, and thinking would translate not only into a navigation process through an associative conceptual space, but dynamically construct a network of interacting world models that are relevant under a certain context. NCAs might be excellent candidates to model such an architecture: they maintain a trainable ANN in each cell of an integrated grid which are in principle capable of representing reference frames of arbitrary concepts. In turn, NCAs might not only be excellent models for biological self-organization as a multi-scale competency architecture, but even for higher-level cognitive processes of the human neocortex such as active perception, raising fascinating questions about the parallels of morphogenesis and cognition.

→ And what’s a better fit for “dynamically construct a network of interacting world models that are relevant under a certain context” than transformers?

Biological agents – across scales – are embodied agential multi-agent networks with world models performing active inference (and NCA is a good model for that).

Viewed through this lens, biological agents at every level function not merely as components or effectors, but as embodied representations of their respective environment, while pursuing developmental goals and systemic integrity. For example, a bistable enzyme may be an embodied model of ligand-specific decision-making; its conformational switch encodes a dynamic inference about molecular context. The transcriptome functions may be seen as an embodied intracellular model of cell-fate specification within the morphogenetic field. Bioelectric fields – coordinated by collective cellular behavior – serve as coarse-grained, spatiotemporally models guiding embodying cellular collectives toward large-scale pattern formation of target system-level outcomes. Even animals can be understood as living world models (JEPA/Recurrent World Models Facilitate Policy Evolution): while their bodies are evolutionary fitted for the challenges of their Umwelt, their brains do not simply simulate the environment but embody its structure through continuous interaction. Neural circuits are not merely predictive tools, they function as dynamic, embodied representations of environmental regularities, integrating perception, action, and anticipation in real time (thousand brains theory, NOW model?); this might even scale to swarms or groups as distributed models of environmental stability and multi-agential coordination .

Crucially, these modeling aspects are not invented anew at every scale, but are regularities themselves – and if this process applies at every scale, this necessarily includes repurposing lower-level organizational units as building blocks to form higher-level embodied models. Evolution is tinkering, i.e., life does not design from scratch, it reuses, refactors, and repurposes across scales. Biological systems are not just composites of passive parts but are layers within layers of repurposable agential substrates – each unit simultaneously acting as an agent, a model, but also as signal. Proteins become signals in GRNs; the GRN state informs cell identity; cellular states coalesce into tissue-scale morphogenetic patterns, etc., all structured by self-modeling dynamics.
Thus, biological organization might fundamentally be rooted in self-modelling of scale-dependent (relevant) environmental contexts through repurposable embodiments – scale-bridging representations that allow nested agents to maintain their physiological integrity against diffusive, entropic processes; this has been formalized on thermodynamic grounds via the variational free energy principle, or active inference. This suggests that biology might be understood as models- within-models rather than merely layers-within-layers of organization – a multi-scale competency architecture where agency and embodied representation are inseparable.

NCAs provide a computational framework that mirrors this hierarchical representational architecture: flexible local rules allow to generate global structure through embodied learning and coordination, offering a tractable way to explore how scale-bridging models emerge from simple, repurposable units and providing insight into the dynamics of nested biological self-modeling.

Future work stuff

Future work: Criticality, hierarchy, and grid vs graph-like organization? Evolving Hierarchical Neural Cellular Automata), or graph-like organization comprising short and long-range connections (small-world topology) as opposed to the fixed grid layout of current NCAs, and might bring about novel understandings of the multiplicity of the computational power of different type of biological matter. For instance, recent promising studies argue that even vanilla NCAs that are pretrained to exhibit critical dynamics in their cell-state expressions represent more efficient initial conditions for learning down-stream tasks. Criticality in the cortical brain networks is hypothesized to maximize computational capacity by sustaining long-range correlations, power-law distributed neuronal avalanches, and efficient signal propagation across scales. Such a near critical state offers the exact mixture of structured signaling and operational flexibility necessary for optimal learning and information processing, balancing sensitivity to input with stability in network dynamics. It has been argued that criticality is a universal setpoint for brain functions. However, cognition is a spectrum and does not seem to be restricted to neuronal dynamics, the latter are only speed optimized for agents navigating our 3D Umwelt but the governing bioelectric dynamics are much more ancient: understanding biology as a multiscale competency architecture increasingly blurs the line between collective biological coordination and cognitive emergence. Criticality may not be merely a dynamical regime but a functional scaffold enabling self-regulatory agency across scales (from ion channels to tissues to behavior).

Future studies will show whether the dynamics in computational medium of NCAs will undergo a similar transition from multicellular communication pathways observed in basal tissue to scale-free activation patterns observed in neuronal networks. This transition might involve explicit hierarchical architectures (

Hybrid? Training approaches: Differentiable learning is efficient, EA are more robust & transferable etc.

Increasing the capabilities of NCAs might even call for hybrid training paradigms, combining differentiable learning with evolutionary strategies. Differentiable learning in NCAs, i.e., via backpropagation, can be highly efficient to learn task-specific dynamics (such as target pattern formation) with high precision. Evolutionary algorithms, on the other hand, not only offer a much greater diversity of potential solutions but bring forth NCAs that are naturally highly robust against noise, corruptions, and show enhanced evolvability and transferability to novel problems. Moreover, different ANN architectures – feed forward, circuit based, attention, recurrent, world models, etc. – might also strongly influence what the NCA can learn. A promising approach to not only combine the advantages of different learning paradigms (gradient-based and evolution) might be the recent conditional diffusion evolution approach. This approach offers unprecedented quality-diversity scores, and through conditional sampling even allows to bias different evolutionary lineages towards a multitude of target outcomes (HADES).

Explicit hierachy … but how to decouple scales?; Multi agent RL techniques

As already discussed above, the integrated multi-scale nature of biological organization poses significant challenges from a modelling and computational perspective. Hierarchical NCAs [10, 11] try to overcome this obstacle by imposing a fixed architecture of hierarchically stacked NCA layers of varying resolution (lower levels implementing the dynamics, while successive higher levels exerting control over lower levels). It remains however unclear, how to best decouple different scales of organization, or how to identify sub-groups of cells at certain that group together under different contexts. NCAs could learn from hierarchical reasoning approaches [124], from other multiagent-based RL techniques such as [125-128] or the recently proposed TAG framework for multi-agent hierarchical RL [129]. Another promising architecture are self-supervised techniques such as Joint Embedding Predictive Architectures
(JEPA) which are closely related to nested world model architectures [96].

The last part talks abt how diffusion models have an explicit hierarchy / guiding denoising hierarchically through conditioning on time; this temporal parameter enables structure multi-scale learning (shown by this paper), which is missing from NCAs.
It’s a challenge to learn this temporal structureend-to-end from spatiotemporal patterns alone, a significant challenge in capturing hierarchical organization through collective interactions.

In turn, future NCA architectures could benefit from hybrid designs, enabling them to emulate diffusion-like hierarchical denoising while preserving their strengths in distributed, spatially aware computation. Such models may not only improve generative quality but also reveal how biological systems achieve complex morphogenesis through internalized temporal inference and embodied cognition via self-modeling across scales.

Idea

→ Supplying clocks like brain rhythms?
Maybe not accurate for models of morphogenesis, but maybe for brains.

Limiatations / Challenges / Future work required

NCAs can still be difficult to train, as solutions can collapse into trivial attractors (especially with gradient-based methods via pool-poisoning) or get trapped in suboptimal solutions that lack feature precision (predominantly in the case of neuroevolution training).
They suffer from limitations in storage capacity (of multiple system-level target outcomes, or functional modalities) and in scaling; simulating realistic organ-level complexity at unicellular resolutions is currently infeasible.
Relations of NCAs with reinforcement learning, active inference, or neuroevolution techniques remain underexplored. Moreover, their descriptive power, especially of biological systems, is still largely oversimplified: NCAs treat cells largely as homogeneous units and their numerical states and interactions are at best abstractions (or model composites) of most biological processes. And while progress has been made, it is difficult to integrate (and identify) NCA dynamics with molecular pathways, gene regulatory networks, or biomechanical forces.
Training NCAs is mostly concerned with the accuracy of the final system-level outcomes, largely neglecting energetic, metabolic, or other physical/chemical/biological constraints necessary for developing viable morphologies; the learned update rules are still largely opaque black-boxes and difficult to interpret at the parameter and interaction level; update dynamics don’t follow physical or biologically relevant ODEs; it is unclear how to implement reasonable multiscale coupling principles in HNCAs; in robotics, real-world experimental realization are in their infancies; and improving our theoretical understanding in stability, conditional dynamics, universality, criticality, and hierarchical phase-transitions still requires significant effort.

Different trainings may lead to the same system-level outcome but implemented by completely different microlevel dynamics. This limits modularity and compatibility across multiple NCAs that operate in the same environment but developed not necessarily compatible communication strategies, and thus limits the analogy to biology which all talks the same language of biochemistry.

Current NCAs don't interface well → Ground in LLM/pretrained representations?

They mention JEPA intrgrated as a hierarchical NCA might be a good fit for this…

Read Latent neural cellular automata for resource-efficient image restoration

![[neural-cellular-automata-applications-to-biology-and-beyond-classical-ai-img-0.jpeg]]
![[neural-cellular-automata-applications-to-biology-and-beyond-classical-ai-img-2.jpeg]]
![[neural-cellular-automata-applications-to-biology-and-beyond-classical-ai-img-3.jpeg]]

Max Wolf's Second Brain

Explorer

Neural cellular automata - applications to biology and beyond classical AI

Graph View

Backlinks