year: 2024/12
paper: https://arxiv.org/abs/2412.17799
website: https://pub.sakana.ai/asal/ | Yt: “Automating the search for artificial life with foundation models” by Akarsh Kumar
code: https://github.com/SakanaAI/asal
connections: sakana AI, IDSIA, ALIFE, open-ended, The Platonic Representation Hypothesis, CA, NCA, computational irreducibility, cma-es, Akarsh Kumar, Louis Kirsch, Kenneth O. Stanley, david ha
Motivation
We need to automate simulation design (because hand crafting the periodic table with hall pairwise interactions s.t. you get life after running it long enough).
This requires somem way to define what’s interesting / novel.
LLMs feature space more and more represents human representations (?) as you scale them.
Many naive things become possible in/with CLIP’s/LLM representation space, such as
- specifying goal states in natural language
- stopping simulation/training once change in latent space plateaus (magnitude of clip vector change)
- (heuristic metrics that fall short in other cases)
![[automating-the-search-for-artificial-life-with-foundation-models-img-0.jpeg]]
![[automating-the-search-for-artificial-life-with-foundation-models-img-1.jpeg]]
![[automating-the-search-for-artificial-life-with-foundation-models-img-4.jpeg]]CLIP encodes image(s) & text goal-prompt, cosine similarity is the fitness/objective.
They evolve interaction rules, e.g. parametrized through neural nets.
The boids use weight-sharing: simple rules + local interaction → complex behavior
Init rule
Step rule
Render rule
→
Objectives
They optimize CLIP-space similarity/diversity, depending on the search mode.
samples an initial state, runs the simulation for steps, and renders an image; / are the foundation-model encoders and is the similarity in that space.
Supervised Target (find a specific phenomenon): choose simulation parameters to maximize the alignment between the FM image embedding of the rendered simulation frame and the text-prompt embedding at the (prompted) timesteps : maximize
Optimized with sep-CMA-ES for Lenia/Boids/Particle Life; and Adam/BPTT for temporal sequences in NCA.
Open-Endedness (keep producing new stuff): choose to minimize similarity to the nearest past frame from the same run—i.e., maximize historical novelty in the FM space: minimize
Optimized with brute-force over Life-like CA rules; they don’t run this on Lenia/Boids/NCA
Illumination (cover diverse behaviors): choose a set to minimize each simulation’s similarity to its nearest neighbor in the set—i.e., maximize diversity in FM space: minimize
Optimized with a custom genetic algorithm on Lenia and Boids to maximize nearest-neighbor diversity.
What’s the current limiting factor for this kind of evolution reaching the complexity we find in nature?
Bottlenecks:
- Expressivity of the substrates (the simulations being searched over by current research…)
- Search algorithms (need better search algorithms than cma-es or gradient descent)
- Scale … we’ve invested billions into training LLMs, but many orders of magnitude less in ALIFE simulations.
Open-ended curriculum learning for agentic AI:
Compare LLM summary with current knowledge/interaction-history/capabilities & goal-prompt.
Goal prompt can (needs to) also be meta, with an LLM automatically increasing difficulty. Memorization → Generalization, as per GPICL
EDIT: Simply plug this into an OMNI-EPIC-style system, allows to merge the objective setup above into natural language interestingness/success judges. Or rather: OMNI-X’s outer loop covers open-endedness and illumination, whereas during the rollout step, you use the similarity metric to train/evolve networks.
EDIT2: Open-Ended Evolution of Artifcial Life using Video-Language Models did this! (sorta / partially; iiuc, they don’t rlly like have an archive of prompts that they do the prompt evolutino against?)
→ Keeping it confined to VLMs is inherently limiting the simulation space,
EDIT3: Toward Artificial Open-Ended Evolution within Lenia using Quality-Diversity also sorta did this, but on again not via language.
You can quantitatively analyse ALIFE stuff:
linearly interpolating boid network paramers and comparing img similarity
“more is different” in particle life
The dino VLM model series is a good successor to CLIP.
Auto-PicBreeder