year: 2024/11
paper: https://arxiv.org/abs/2411.10125
website: https://claire-labo.github.io/EvoTune/
code: https://github.com/CLAIRE-Labo/EvoTune/tree/main?tab=readme-ov-file (“coming soon” :( )
connections: FunSearch, evolutionary search, RL, DPO, algorithm discovery
In a nutshell
EvoTune combines evolutionary search with RL to discover algorithms more efficiently than pure search-based approaches like FunSearch-
The core loop:
- Evolutionary search phase: LLM generates candidate programs from prompts containing previous high-scoring programs
- Programs are evaluated, scored, and stored in an island-based database
- RL training phase: Every iterations, the LLM is fine-tuned using DPO on preference pairs constructed from the accumulated programs
- The updated LLM becomes better at generating high-scoring programs in subsequent iterations
Treating the LLM as a trainable search operator (not just a static generator) allows it to learn patterns from successful discoveries. The method uses forward KL-regularized DPO to maintain output diversity.
Extra details
refactor the slop
Island-based program database
Programs clustered into separate “islands” that evolve independently. Within islands, programs are grouped by score into clusters. Sampling procedure:
- Select island uniformly
- Sample clusters from island using softmax over scores
- Pick shortest program from each cluster
- Construct prompt with program-score pairs + task description
Preference dataset construction
From outputs generated per prompt:
- Valid programs divided into higher/lower scoring halves
- Random pairing creates preference triplets
- Failed programs paired with any valid program
- Additional filtering: exclude pairs where (dynamically adjusted threshold)
Ideas / Thoughts / Comments
The Bitter Lesson: Combining search (exploration via evolution) with learning (pattern extraction via RL).
Neither alone is sufficient: pure search is inefficient, pure learning on fixed data limits exploration.
The island structure resembles OMNI-EPIC’s task-specific agents - each island maintains its own evolutionary trajectory while the shared LLM accumulates meta-knowledge across all islands.