Aim: New theory of living things
A comprehensive framework for understanding origins and adaptation across all scales:
New theory of development focusing on shape and form
New theory of evolution centered on adaptation and optimization
New theory of cognition exploring problem solving, agency, learning, and intelligence
Top-down causation: viewing organisms as actors (not products) of adaptation
Fundamentally scale-invariant – causes/adaptation/cognition/agency operate at all scales (acting down as well as up)
A theory of biological organisation and dynamics (program and data intertwined)
Starting from nothing but physics
Things that seem discontinuous might be continuous in higher dimensional spaces.
Learning = Reverse Engineering
The problem of biology and mind can be understood as deep reverse-engineering.
adaptation requires this capability.Reverse Engineering: Given an example of a system that works, create a new system that does the same thing.
- e.g., Reverse engineer Atari game code.
- e.g., Biology: Required changes to phenotype → find changes to genotype that does that.
Shallow vs. Deep Reverse Engineering
Reverse engineering is easy when it’s shallow.
- Shallow: The desired outcome directly shapes the substrate. The forces are reversible and propagate directly.
- Examples:
- Leaf imprint in clay: Leaf (example) pushed into clay (substrate) creates a mold (new system).
- Jelly in a mold.
- River carving a canyon.
- Pushing a pylon: Force applied → structure accommodates. Parts bend locally in response to propagated stress.
- Mobile adjusting to weight change: Forces transmitted find new equilibrium.
- “Just push it where you want it to go.”
- Accommodation to stress/forcing is easy with reversibility. The push on each part is directed and specific; the bits that need to change, change, and in the right direction. - e.g., Mind: Required changes to behaviour → find changes to mental model that does that.
Conditions for Easy Reversibility (Shallow Systems)
- Relationships are smooth, monotonic, differentiable gradients.
- No discontinuities, no bifurcations.
- Decomposable into independently additive components.
- No hidden constraints/structure.
Reversibility vs. Cognition
Systems with the above smooth/reversible properties are not programs or ‘machines’ of interest. They are not cognitive.
Cognition requires non-linearities, discontinuities, logic, branching, hidden state. These systems are not easily reversible.
The Cognitive Reverse-Engineering Problem
There appears to be an intrinsic opposition:
- Learnability: Needs continuous space → gradients, directional error; naturally reversible, but limited relations (no folds).
- Cognition: Needs discontinuous space, c.f. hidden state → arbitrary complexity; arbitrary relations (with branching), but not reversible.
This suggests complex cognitive systems might require search/trial-and-error rather than direct induction via gradients.
Example: Code cannot be easily “pushed” into the desired form like clay. Modifying it often requires search or complete rewriting (catastrophic forgetting). Where does the code (program) come from?
Induction (in this context)
- Core Idea: The process of inferring or generating general rules, hidden states, or deep causal structures from specific, often limited, observable data or experiences. It’s about going beyond direct, surface-level information. The speaker often equates learning (especially deep learning) with induction.
- Necessity: Required when the relationship between observables (inputs) and desired outcomes is under-determined – meaning the surface data isn’t enough to specify the internal structure needed. This is crucial for:
- Dealing with hidden states or unobservable factors.
- Learning non-linearly separable functions (like XOR, 10:51).
- Contrast with Direct Processes: Shallow reverse engineering (like clay imprints) doesn’t require induction. Deep reverse engineering (learning complex programs/structures) does require induction.
- Mechanism: The speaker proposes it arises “spontaneously” from the interaction (stress/forcing) between the environment and a suitably “impressionable” substrate, without needing explicit design or search.
- Outcome: Leads to the formation of general classes or concepts from specific instances.
Deep Learning Excursion: Bridging the Gap?
How does deep learning handle this tension?
- Focus on:
- Easily learnable functions (linearly separable)
- Difficult-to-learn functions (non-linearly separable, requiring hidden state)
- Concept of folding feature space = finding hidden order = symmetry breaking
- recursion
Folding (in this context)
- Core Idea: A process, particularly in deep learning (ANNs) but analogous in biology, where the system transforms its internal representation of the input space to handle complexity, especially non-linear relationships that cannot be solved by simple linear separation.
- Mechanism: It involves creating internal representations (hidden states) through non-linear transformations, discontinuities, or bifurcations. This effectively “folds” the input space so that previously inseparable patterns become separable by later stages.
- Driving Force: Driven by “stress” (error between actual and desired output), which is pushed back (e.g., via backpropagation) to deform the internal structure (weights/activations).
- Contrast with Shallow: Unlike simple, reversible deformations (like a leaf imprint in clay, 1:14), deep folding creates complex internal organization.
- Key Distinction: Useful folding is subtle and elastic (maintains integrity, can spring back if stress is removed, 6:26-7:01), unlike “catastrophic folding” (like crushing origami or a pylon collapsing) which makes the structure inert and unable to push back.
- Outcome: Creates abstractions, mapping many instances to one concept (compression) and allowing one concept to unfold into many instances (decompression).
The folding of the feature space (especially in ANNs via backpropagation) is the mechanism that performs or enables the induction required to learn complex, under-determined relationships and infer hidden structures from experience.
Shallow Learning (Single perceptron)
- Uses monotonic but non-linear activation function (e.g., sigmoid), which is differentiable/reversible.
- Can learn linearly separable functions (e.g., A AND NOT B) via a single decision boundary.
- Learning (e.g., gradient descent, backpropagation) uses error to “push” weights in the right direction to reduce mismatch.
- This is analogous to shallow reverse engineering (pushing clay).
Deep Learning (Multi-Layer Perceptron - MLP)
- Still uses reversible activation functions and backpropagation.
- Backprop apportions error “push” back through layers based on contribution to error.
- This process alleviates internal “stress” (mismatch with environment/target).
- A deep network is still reversible in this sense and can compute the same (linearly separable) functions as a shallow network.
Deep Learning & Hidden State (Handling Non-Linear Separability)
- Challenge: Non-linearly separable functions (e.g., XOR) cannot be solved by a single linear boundary. Learning gets “frustrated,” oscillating as fixing one error creates another.
- Solution: Symmetry Breaking / Folding Feature Space
- Conflicting error signals “push” hard enough to cause the internal representation (decision boundary in hidden layers) to bifurcate or split.
- The network effectively creates multiple decision boundaries using hidden nodes.
- Example (XOR): Hidden node H1 learns one boundary (e.g., B AND NOT A), H2 learns the complementary boundary (A AND NOT B). They become mirrors of each other.
- Complementarity: Hidden nodes (H1, H2) take their meaning from each other, defined by the internal symmetry breaking, not solely by the external input-output task.
- The output layer combines these hidden representations (e.g., H1 OR H2).
- Abstraction: This folding creates a many-to-one mapping (multiple input instances map to one internal concept) and allows one-to-many mapping (one internal concept unfolds to multiple instances). Learning = compression; Generation = decompression.
- Deep learning is a transformative process, not just search. It’s like taking an impression, but it’s deep, creating folds by resolving stress through internal symmetry breaking. (walnut in a vice analogy).
- Symmetry breaking moves into an additional degree of freedom / orthogonal dimension (like a buckling strut). The direction is determined by internal dynamics, not the external force alone.
Under-determination = induction
The forcing from the outside (the learning task/data) under-determines the internal structure required to solve it (especially for complex tasks).
This under-determination is what makes the process deep and requires induction (inferring hidden structure).
The specific way the internal structure (e.g., hidden node functions, symmetry breaking direction) resolves the stress is not prescribed by the external forcing alone.
Symmetrical Interaction & recursion
The learner-environment relationship is symmetrical: Environment changes organism (learning), but organism also acts on/changes environment (behaviour). Both push on each other. Mutual co-construction.
This interaction continues until dynamic equilibrium is reached (stress alleviated, surprise minimized - karl friston).
The internal components (e.g., hidden nodes H1, H2) also co-define each other symmetrically.
This suggests a recursive process: Who/what is pushing the learner and environment together? Can that process be pushed back on? (Meta, meta, meta…).
Multi-scale Autonomy
In biological systems, changes to deep/macro structure (evolution) don’t break all the details (development, physiology).
Details accommodate and adjust autonomously at their own scale. Each level “knows what to do” locally.
This works across vast physical and temporal scales. How?
Recursive Reverse Engineering
Macro-scale stress causes coarse-grained changes in internal structure.
These changes create stress at the next level down causing smaller changes there.
… Recursively down to the micro-scale (e.g., gene expression).
Each ‘grain’ at each level performs its own local symmetry breaking / stress accommodation.
Stress is passed down (top-down), but the response is locally determined and underdetermined by the level above. Lower levels also push back up, constraining higher levels.
Requires simultaneous top-down and bottom-up engagement across all scales.
Non-Catastrophic Folding is Key
For this recursive process to work, each step/fold must be shallow and subtle, staying within the elastic/reversible limit.
It must not be a catastrophic failure (like the collapsed pylon).
This maintains internal integrity and allows the system (and its parts) to “push back” and “spring back” (unfold) when stress is released.
Catastrophic Folds → Inert Information
Contrast with origami: Folds on folds create an inert structure. Deep ANNs might be like this – compressing feature space catastrophically.
They can’t easily spring back or be reversed/controlled from the top level alone due to the massive under-determination and collapsed concept space.
Modifying them becomes exponentially difficult (catastrophic forgetting). You might have to flatten and start again, or just add more folds on top.
Clay sculpture: Pushing to change the macro-pose (turn head) can obliterate fine details if not done carefully. Ideally, fine details adjust themselves autonomously.
Folding Process, Not Just Folded Structure
The goal isn’t a static fold-of-a-fold structure.
It’s a folding process acting on a folding process recursively across scales.
This creates a radically fluid space of possibilities, opening new dimensions as needed.
Learning Deep Model vs. Recursive Learning Process
ANNs: Weights in layer N+1 process outputs of layer N. They aren’t ‘meta-weights’ controlling the organization of layer N’s weights.
Needed: A system where layer N+1 does organize/rewire layer N, recursively.
This implies a totally dynamic topology.
How to design such a system?
You don’t… ‘Design’ implies top-down control seeing all layers.
This system relies on local, underdetermined responses and symmetry breaking at each level, trusted by the levels above. Lower levels push back up, constraining higher levels. All levels are engaged simultaneously.
How could top/bottom levels communicate effectively without losing control or causing catastrophic failure?
Requires folding space (many-to-one) but losing control of deeper dimensions.
How can top ‘reach’ deep? How can deep ‘push back’ effectively?
If not fully engaged, leads to catastrophic failure or loss of control.
Is it possible to have the whole stack engaged upward and downward simultaneously?
The Real Problem of Biology and Mind = Recursive reversible inductive engineering:
How can you naturally and easily induce deep causal structure (programs) from experience/contact with the world, spontaneously (via transformation, not search)?
In such a way that:
- Subsequent change (re-programming, deep program adjustment) remains easy.
- Contact with the world can change deep structure, and all the details accommodate and adjust autonomously without being broken.
- All the details know what to do – even for situations they have never seen before.
Desiderata for a Physical Theory of Life and Mind (Summary List)
- Direct induction in a suitably impressionable substrate.
- Deep induction – same internal causal architecture as the world it models.
- Embedded and empowered – not just a movie (same sensitivities and effects).
- Substrate independence.
- Scale invarianc (coupling across scales = retains embedding and empowerment; same sensitivities and effects).
- Abstraction: Many instances ↔ One concept (many-to-one / one-to-many).
- Under-determination: Induce general class: various small instances fold into one large concept (constrained space) / Generate specific instances: one large concept unfolds into various small instances (de-constraining).
- Agential behaviour in unfolded space = projection of ordinary behaviour in folded space
- Multiple points in high-dim unfolded (explicate order), corr. to a point in lower folded space (implicate order); entities experience causes in both spaces simoultaneously; movements in either space are ordinary. Spooky ‘inner motives’ = straight line in curved (folded) space.
- To the well-informed (chess master), the next move is the obvious move / forced move. No search required. Looks magical to the uninformed. But in the conditional mental representation, it’s just least action.
- A prime directive – a meaningful direction to life.
- Life suggests directionality contrary to physics: life != entropy increase
- Natural selection suggestions of “trying to” maximise (inclusive) fitness
- Some movements are better than others? Otherwise just stuff happens …No intelligent action without goals, just different attractors, no adaptation without value, just different stuff.
- Possibilities: “What persists exists” (stability, robustness)? → Order for free, self-organization → Adaptation – for survival/reproduction? → Problem solving (conflict, frustration)? → Creative? (not the same as “what persists exists”, polar opposite, prime mover)
- Agential, recursive, linguistic competence
- Agential matter, concepts DO the work of what they mean, execute their own function, have causal power. Not inert information that needs a separate “interpreter”; Linguistinc compence, self-error-correction, …
- Explains biology (as we think we know it).
- Arises spontaneously from nothing! (i.e., from physics/geometry/interplay of space and time, without special design).