Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings

year: 2023
paper: https://www.nature.com/articles/s42256-023-00748-9
website: https://www.jachterberg.com/seRNN
code: https://github.com/8erberg/spatially-embedded-RNNe
connections: spatial organization, communicability, regularization, strength matrix, sparsity, biologically inspired

Conference Talk (25min + 25min discussion)

quickly look thru recent cites: https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2466081976576789482&as_sdt=5

Spatial embedding promotes a specific form of modularity with low entropy and heterogeneous spectral dynamics

Regularization method based on spatial graph structure.

Structural prior / nudge towards communication: Prefer to keep those connections around that you use a lot to communicate.

Summary

Spatially-embedded Recurrent Neural Networks (seRNNs) link structural and functional neuroscience by simultaneously optimizing networks for task performance while constraining them to exist in 3D space with “metabolic costs.” This joint optimization causes diverse brain-like features—modularity, small-worldness, functional clustering, mixed selectivity, and energy efficiency —to emerge together in a “sweet spot” of the parameter space.

How seRNNs work

seRNNs add spatial regularization to RNN loss:
$L = L_{Task} + λ ∣∣ W ⊙ D ⊙ C ∣∣$

$W$ … weight magnitudes (element-wise, like L1 regularization)

$D$ … euclidian distance matrix between units (5×5×4 grid)

$C = e^{S^{- 1/2} ∣ W ∣ S^{- 1/2}}$ … weighted communicability (LOW for core connections, HIGH for peripheral)

$λ$ … regularization strength

Minimizing $∣∣ W ⊙ D ⊙ C ∣∣$ prunes connections with large values in all three terms. The weighted communicability formulation assigns low values to core connections supporting efficient global paths and high values to redundant peripheral connections. Thus connections survive if they have small weight, short distance, or are part of the efficient communication backbone.

Structural finding: Higher modularity and small-worldness than L1 regularization:

These features all arise in unison:

Brain features emerge from a single optimization process when trained with spatial and communication constraints:

Emergent brain-like properties

Structure:

Modularity (Q ≈ 0.3-0.7): Dense within-module connections, sparse between-module connections

Small-worldness: High clustering with short path lengths

Homophilic wiring: Similar nodes preferentially connect (validated across macro/micro scales)

Function:

Functional clustering: Persistent info (goals) clusters spatially; transient info (choices) distributes

Mixed selectivity: Units respond to multiple variables (vs pure selectivity in L1 networks)

Energy efficiency: Lower neural activity for same performance

These co-emerge from the same optimization pressure—networks can’t achieve modularity without small-worldness.

The spatial organization creates a division of labor in this navigation-like task: a spatially clustered “core” maintains the goal location (which must be remembered throughout the trial), while distributed units process the choice directions (which only appear later and need immediate processing). The network’s physical structure literally determines its computational strategy—persistent information clusters, transient information distributes.

The "sweet spot" phenomenon

Brain-like features only coexist within a critical parameter window ( $λ \in [0.01, 0.1]$ ). Networks are simultaneously:

Task-accurate

Sparse but connected

Modular and small-world

Mixed-selective

Energy-efficient

Outside this range: too sparse → task failure; too dense → no modularity. Physical constraints dictate this unique viable solution.

Note

Dense networks (weak regularization) preferentially maintain goal information.
Sparse networks (strong regularization) focus on current choice inputs.

Experimental Details

The study compared 1000 seRNNs against 1000 standard L1-regularized RNNs, all with 100 hidden units. Networks were trained on a simple but cognitively demanding task:

Goal presentation (20 steps): Show a location on a 2×2 grid

Delay period (10 steps): No input, must maintain goal in memory

Choice presentation (20 steps): Show two directional options (e.g., “left” and “right” arrows) simultaneously. The network processes these inputs over 20 time steps, integrating them with the remembered goal location

Decision: After step 50, read out which direction the network chooses. Correct choice moves toward the remembered goal

This task requires two fundamental cognitive abilities: working memory (maintaining the goal) and integration (combining remembered information with new sensory input). Despite its simplicity, it captures the essence of many real-world cognitive tasks.

Networks started fully connected and learned through weight pruning. The key innovation was comparing networks matched for overall sparsity but differing in how they achieved it—seRNNs through spatial/communication constraints versus L1 through simple weight minimization.

Weighted communicability

$C = e^{S^{- 1/2} ∣ W ∣ S^{- 1/2}}$ quantifies information flow through all paths (exponentially weighted by length). Degree normalization $S^{- 1/2}$ prevents hub dominance. Implements small-world prior by favoring short communication paths.

Modularity measure

The modularity $Q = \frac{1}{l} \sum_{i, j \in N} (a_{i, j} - \frac{k _{i} k _{j}}{l}) δ_{m_{i} m_{j}}$ compares actual connections to what you’d expect by chance:

$a_{i, j}$ = actual connection between nodes i and j

$\frac{k _{i} k _{j}}{l}$ = expected connection probability if wiring was random (where $k_{i}$ is node i’s degree, $l$ is total connections)

$δ_{m_{i} m_{j}}$ = 1 if nodes are in same module, 0 otherwise

Q > 0 means more within-module connections than random. Q ≈ 0.3-0.7 is typical for real brain networks.

Modules are discovered algorithmically by finding node groupings that maximize Q.

Graph View

Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings