year: 2024/06
paper: https://arxiv.org/pdf/2406.09787
website:
code: https://github.com/erwanplantec/LNDP
connections: neuroevolution, structural plasticity, synaptic plasticity self-organization, ITU Copenhagen, spontaneous actctivity, indirect encoding, Sebastian Risi
Importantly, all nodes share the same GRU parameters. Synapses are also modeled as GRUs with shared parameters.
Spontaneous Activity
Spontaneous activity in this paper refers to a pre-experience developmental phase where input neurons generate activity patterns without environmental input. The authors model SA using a learnable Ornstein-Uhlenbeck stochastic process. SA serves several purposes:
It drives pre-experience network development, allowing the network to self-organize before interacting with the environment.
It enables the network to develop “innate skills” before environmental interaction.
It helps differentiate input neurons, as the network learns to recognize specific activation patterns.
Claude’s high-level explanation of the OU process, seems at least somewhat wrong in this case - need to look into the code
The input layer of neurons creates its own patterns of activity.
They use mathematical models (like the Ornstein-Uhlenbeck process, a stochastic differential equation that describes the velocity of a massive Brownian particle under the influence of friction) to create this activity, which has some temporal and spatial structure rather than being completely random / white noise.
This activity spreads through the network in waves.
This creates patterns of activity across the whole input layer.
The connections between neurons change based on this activity.
If two neurons are active at the same time, the connection between them gets stronger.
In the next layer (the output layer), neurons compete with each other.
Only the most active neuron in this layer gets to update its connections.
This helps create distinct groups or “pools” of input neurons that connect to each output neuron.
Structural plasticity
The number of nodes is a constant, sampled from a truncated normal distribution with hparams , and .
The structural plasticity in this model is limited to adding or removing synapses (edges) between existing neurons.
Node structure
There is a difference between “activations” and “state” of the nodes
Node states are computed based on the complete graph state: .
Node activations TODOInput features:
Edge structure:
Input features:
- Pre- and post-synaptic neuron states ()
- Last received reward ()
There is essentially no directionality here? As the edge receives both input pre and post synaptc state? THERE ISN’T, but I don’t get why yet.
Role of the GRU at the synapse:
a) Memory: It allows synapses to maintain a state over time, potentially capturing temporal patterns in the network’s activity.
b) Adaptive weight changes: The GRU updates the synaptic state based on the pre- and post-synaptic node states and the reward signal. This state determines the synaptic weight, allowing for complex, history-dependent weight changes.
c) Integration of global and local information: By taking the reward signal as input, the GRU can modulate synaptic changes based on both local (node states) and global (reward) information.
Replication (date of creation: 2024/06)
- replicate results
Modification ideas:
- add spatial structure: distance regularization & 3D lattice (update: 2025/06: would be made obsolete by abolishing edges, see below)
decentralize information: Don’t share weights + don’t give them full graph knowledge → How much worse?(update 2025/06: that’s a bad idea, weight explosion, worse generalization, harder to optimize, local memory / context should be enough .. see biological neurons/humans: same hardware, different context → wildly different function)
Here is where it is starting to get soupy / totally different to the paper except for the base structure:
decentralize optimization: each neuron is an independent reinforcement learner
→ decentralize rewards … this is were it gets very tricky (and where the least prior research has been done), this step kind of depends on this(no need: The neurons need to figure out how to communicate reward as part of their message passing)all the concepts of energy regularization and activation sparsity would come into play here- structural plasticity for nodes
- exitatory vs inhibitory neurons? (update: can this be learned? is it necessary?)
- Astrocytes? (update: don’t they primarily provide structure / are a biological implementation detail?)
More remarks on replication / roadmap to soup:
Nodes are the “what”, and Edges are the “who” for sending information.
Especially as the number of edges grows large, they pose a significant overhead.
One obvious optimization is having a common stem, a little bit like a dendrite.
…
Or to remove edges alltogether and do an efficient PKM match between QK matrices.
References
https://claude.ai/chat/61f782ff-1742-4328-8459-7405fedad851
Comments on edge and structural features:
**Structural features** are the in-degree, out-degree, and total degree, as well as a one-hot encoding indicating if the node is an input, hidden, or output node.
Moreover, we introduce **edge features** in the attention layer as proposed in Dwivedi and Bresson (2021). We also augment edge features with structural features which are 2 bits indicating if there is a **forward** or **backward** connection between nodes and a bit indicating if the edge is a self-loop.
-----
de | edge | features 4
-----
Confusing nomenclature mixups:
Node states can be used to define neuron parameters such as
biases.
dh | node features | 8
de | edge features | 4
Node features ht
ht ∈ HN and et ∈ EN 2 are the nodes and edges states respectively with H ≡ Rdh and E ≡ Rde. Edge states are masked by the adjacency matrix, i.e. set to zero wherever the adjacency matrix is 0.
Moreover, we introduce edge features in the attention layer as proposed in Dwivedi and Bresson (2021). We also augment edge features with structural features (this, in the implementation, is misnamed as
get_node_features
)
get_edge_features
does not make sense to me at all yet