year: 2024/11
paper: https://arxiv.org/abs/2411.10125
website: https://claire-labo.github.io/EvoTune/
code: https://github.com/CLAIRE-Labo/EvoTune/tree/main?tab=readme-ov-file (“coming soon” :( )
connections: FunSearch, evolutionary search, RL, DPO, algorithm discovery


In a nutshell

EvoTune combines evolutionary search with RL to discover algorithms more efficiently than pure search-based approaches like FunSearch-

The core loop:

  1. Evolutionary search phase: LLM generates candidate programs from prompts containing previous high-scoring programs
  2. Programs are evaluated, scored, and stored in an island-based database
  3. RL training phase: Every iterations, the LLM is fine-tuned using DPO on preference pairs constructed from the accumulated programs
  4. The updated LLM becomes better at generating high-scoring programs in subsequent iterations

Treating the LLM as a trainable search operator (not just a static generator) allows it to learn patterns from successful discoveries. The method uses forward KL-regularized DPO to maintain output diversity.

Extra details

refactor the slop

Ideas / Thoughts / Comments

The Bitter Lesson: Combining search (exploration via evolution) with learning (pattern extraction via RL).
Neither alone is sufficient: pure search is inefficient, pure learning on fixed data limits exploration.

The island structure resembles OMNI-EPIC’s task-specific agents - each island maintains its own evolutionary trajectory while the shared LLM accumulates meta-knowledge across all islands.