Transferred notes from 2023-09-01
From Short clip from Bach X Levin (Jan 23):
Consciousness: (self-)reflexive attention as a tool to form coherence in a representation. - Joscha Bach
The NN in the brain is like a complex system, a social net, a society of agents wanting to get fed but needing to collaborate or they all die.
So they have to form an organization that distributes rewards among eachother.
This gives us a search space.
Mostly given by the minimal agent that is able to learn how to distribute rewards efficiently (while using these rewards to make us do something useful).
But this is not fully decentralized!
democratic centralism:
Hierarchical form of governance is emergent, also in the brain.
See dopaminoergic system. There are centralized structures that distribute rewards in a top-down manner.
Not a control from outside like in traditional ML, but an emergent pattern amongst RL agents.
Every regulation has an optimal layer where it needs to take place. Some stuff needs to be decided very high up. Some stuff needs to be optimally regulated very low down, depending on the incentives.
Game theoretically, a government is an agent that imposes an offset on your payoff matrix to make your nash-equilibrium compatible with the globally best outcome.
Potential of Social media to be a global brain (“emergent collective intelligence in real time”) and motor in planning.
Neural darwinism among different forms of organization in the brain, until you have a model of a self organizing agent that discovers what it is computing is driving the behaviour of an agent in the real world (discovered in a first person perspective).
genetic algorithm for architecture search / etc. in the GWS in the brain?
From the full Michael Levin Λ Joscha Bach: Collective Intelligence
Weight-sharing e.g. like in convnets … is it necessary to have some special types of neurons @ the visual cortex?
The master class for soup, would of course be to somehow search the entire architecture space for the neuron-agents. But in a first instance, we prlly have to experiment with a few select types (inhibitor, exitatory) and complexities (extremely simple, shallow nn, deep transformer).
synaptic pruning does not hinder learning ability of brain, it only optimizes the computational efficiency
Bach
Neurons are not learning a local function over its neighbours, but they are learning how to respond to the shape of an incoming activation front - the spatio-temporal pattern in their neighbourhood.
“Densely enough conneccted, so that the neighbourhood is just a space around them”. And in this space they basically interpret this according to a certain topology e.g. 2.5D, 2D, 1D convolution “or whatever the type of function they want to compute”. And they learn how to fire in response to those patterns and thereby modulate the patterns.
Neurons store computational primitives (responses to incoming activations), which can be distributed to other neurons via RNA, if they find them useful.
Neurons of the same type will gain the knowledge to apply the same computational primitives (not certain how much this aspect actually utilized, but this is heavily underexplored in AI and ties into the entire evolutionary architecture search aspect of soup).
Caterpillar brains undergo a complete reassemlbing, when they transform from living in a 2D to 3D world - a complete restructuring/remodeling - but (core?) memories stay intact nontheless.
memories can move from the brain to other places!
They taught planaria to recognize that’s where their food was and then cut of their brains - as the brains regenerated, the information moved back to the brain!
Link to originalDalle vs Baby: Moving vs. Static World
A baby couldn’t learn from being shown 600 million static, utterly disconnected images in a dark room (and remember most of them!). Our statistical models way surpass the brain in that respect.
For us, information stays the same, but gets transmogrified over time, so we only need to learn the transmogrification. We mostly need to learn change.→ Learning in a moving, dynamic environment is much easier, as it imposes constraints on the world (world model).
Cats, for example, can track moving objects much better than static objects.In moving vs static, the semantics of features change:
Static: Scene is composed of features / objects, which are in a static relationship, based on which you need to interpret the scene (ambiguous, hard, …). The features are classifiers of the scene.
Dynamic: Features become change operators, tracking the transformation of a scene. They tell you how your world-model needs to change. To process that, you need controllers at different hierarchical, spatial and temporal levels, which can turn on/off/change at the level of the scene. → Features become self-organizing, self-stabilizing entities, that shift themselves around to communicate with other features in the organism, until they negotiate a valid interpretation of reality.A similar thing happens also in the spatial domain (i.e. in biological development as opposed to neuroscience, which usually deals with the temporal domain): The delta of voltage gradients modifies the controllers for cells (gene expressions etc.).
Open question: Encoding and Decoding
How do molecules communicate information - without a pre-existing shared evolutionary decoding?
Or even: How do the neurons in your brain deocde memory-information, i.e. messages from your past self to your future self?
I would say, that the soup simply learns how to deal with multi-spatial and temporal patterns over times but yh idk. Esp. how encoding of input signal works is the biggest question mark for me. (upadte: isn’t that the easiest part? pre-training encoders from data to leverage patterns? fixed incoming spatial structure note below supports this assumption? And regarding memory encoding / decoding, isn’t this possible because the hardware and the resulting algorithms are shared / co-evolved. As one part of the picture at least.)
noisy samples → feature hierarchies → objects → scene interpretation / object controllers
darkness? → (scary) shapes, things, etc. spawn into our brain caus we have no data to disprove those hypotheses (at least / especially as a child, when world model is still wanky).
visual information is concentrated at visual nerves (fixed incoming spatial structure).
connectome dead end
A lot of resources are dedicated towards precisely mapping the neural connections of brains…
But it might very well be (and I think so too) that it doesn’t matter at all - that what matters is just the density of the neurons arranged stochastically and how they communicate with eachother.
They are acting as if we can get the entire story out of the brain as if it was circutry (nothing in nature is rigid like that).
Neuroscience seems stuck in that regard, to how the brain processes information.
None of the recent advances in AI were due to advances in neuroscience, but much more statistical insights into information processing.This podcast was also literally recorded >1yr ago. Sutton has been saying this stuff for years. How is nobody researching this? (it is time for soup)
There is incredible, non-genetic memory, e.g. in deers: You carve sth into their antlers, they drop off, and every year they re-grow in the following ~5 years, they will re-grow with a tuffer spot or sth at this exact location. So the skull not only has to remember that exact location for years, but also the bone cells need to know exactly where to put it.
Planaria: The animal with the “worst genome” (all sorts of mutations, different number of chromosomes, …) has the best anatomical fidelity, is immortal, incredible at regenerating, very resistant to cancer, etc.
This again, goes completely against the standard view of genomics, which tells you that good genome is what determines your fitness and abilities in all ways etc.
Evolution vs. Competency
Competency of individual can complement for subpar hardware.
So in a way, having a really good algorithm hinders evolution in generating the best hardware, since it doesn’t know whether it’s doing well or whether the self-organization of the sub-units is just too good (analogy: if error correction algorithm super good, don’t need to focus too much on improving storage medium).
→ Emphasis is really put on the correct algorithm, which can make up for all kinds of bad HW.
→ Again: soup
“Best effort computing”: Don’t rely on the other neurons around you working perfectly, but make an effort to be better than random.
→ Evaluating by stacking probabilities with high error tolerance / evolving a system which learns to measure the unrealiability of components until it becomes deterministic enough (/or maybe there is a continuum, maybe there is a phase shift from this to that between organisms).
Spatially, for information processing between neurons:
Absolute voltage doesn’t matter, but the difference between regions.
Cells have the competency to recruit other cells autonomously to get a job done.
Competency Definition
Levin: Given a problem space → Ability to navigate in this problem space to get towards a goal.
Two magnets vs Romeo & Julia. Both try to get together. Degree of flexible problem solving is incredibly different (can avoid local optima, have memory where they’ve been, look further than local environment, …).
Simple Controller / Reactive System vs. Agent
Bach: Goals can be emergent.
Agent == controller for future staes
Difference between reactive systems, like a thermostat which doesn’t have a goal by itself, but only has a target value and a deviation from it, to which it reacts if it passes a threshold, and agents is, that agents are proactive: They also try to minimize future deviations (integral over a timespan).
The ability to create counterfactuals, causal simulations, e.g. possible future universes, reasoning over alternative past trajectories, etc. needs to be present.
For this, you need a turing machine - a computer. Cells have this ability.
Full emergence or encoded representations?
If a system were to be fully emergent, e.g. there are local rules that always lead to the formation of the correct planaria, then, in order to change the resulting system, you would need to understand / correctly manipulate those local rules to spit out what you like (hard).
However, Levin found, that there is a distributed encoded representation of 1 head, 1 tail, … within the cells!
In fact, planaria can store at least 2 different representations, as you can inject a voltage gradient to change the pattern to say 2 heads, 0 tail, while the planaria is still intact with one head, but if it would get cut off, it would grow 2.
So one important ability is to also store counterfactual states (= something that is not true now but may be true in future or may have been true under some conditions, etc.).
Bach’s definition of intelligence
Intelligence is the ability to make models.
This definition also accounts for the fact that many very intelligent people are not good at getting things done.
Intelligence and Goal-Rationality are orthogonal to eachother.
→ Excessive intelligence is often a prothesis for bad regulation.
Levin is obviously also working on soup / creating a model of how autopoiesis (self-organization) / collective agents first come from.
They already have a model of cell rewarding eachother with neurotransmitters and things like that to keep copies of themselves nearby to reduce surprise because the least surprising thing is a copy of yourself.
An embrio is an embrio, because there is 1 cell that decides it’s gonna be the lead and break the symmetry (local activation, long range inhibition) and every cell has to decide where it ends or the outside begins in biological systems.
Levin:
one part for agi is this plasticity about where you end and where the outside world begins. you have to make this decision as an organism. have to make a model of your boundaries, your structure, you are energy- and time-limited, … All of these are self-constructions from the very beginning.
What gives rise to a lot of the plasticity and inteligence in nature is that you cannot assume that the hardware is what it is and that you also always have a completely different environement (also e.g. bacteria in a sugar vat could go up the gradient to the source, maximizing it or work in metabolic space to break down the adjacent molecules to sugar or sth).
I think some of these constraints are crucial to take up for self-organizing aspects (namely: energy, time, spatial, …), but others, like being basically hardware-agnostic.. are assumptions or generalizations we cannot make or afford with silicon hardware, where we do need to exploit stuff in order to be efficient enough and where there is not such an incredible flexibility as in nature.
Soup - How?
Minimal agent to learn how to distribute rewards efficiently, while doing something useful.
How can we get a system that is looking for the right incentive architectue?
Emergence of GI in a bunch of cells / units:
- each is an agent, able to behave with an expectation of minimizing future target value deviations
- agents are connected to eachother, communicating messages with multiple types (sending and recognizing to a certain degree of reliability)
Open questions:
- a type in the language (like different neuro-transmitters) is simply one more / less bit in a general latent message
- the rewards / proxy rewards also need to come from the connected agents which are also adaptive
- enough agents
how deterministic do units need to be
how much memory do they need / what state can they store
how deep in time does their recollection need to go / how much forward in time do they need to be able to form expectation
how big is the latent dimension the agents communicate in?
(personal question) Is “exchange of functions (‘RNA’)” necessary and - if yes - how to implement it?
The conditions that are necessary are relatively simple. If you just wait for long enough for the system to percolate, compound agency will emerge from the system through competition.