Algorithm Discovery With LLMs - Evolutionary Search Meets Reinforcement Learning

year: 2024/11
paper: https://arxiv.org/abs/2411.10125
website: https://claire-labo.github.io/EvoTune/
code: https://github.com/CLAIRE-Labo/EvoTune/tree/main?tab=readme-ov-file (“coming soon” :( )
connections: FunSearch, evolutionary search, RL, DPO, algorithm discovery

In a nutshell

EvoTune combines evolutionary search with RL to discover algorithms more efficiently than pure search-based approaches like FunSearch-

The core loop:

Evolutionary search phase: LLM generates $K$ candidate programs from prompts containing previous high-scoring programs
Programs are evaluated, scored, and stored in an island-based database
RL training phase: Every $f_{R L}$ iterations, the LLM is fine-tuned using DPO on preference pairs constructed from the accumulated programs
The updated LLM becomes better at generating high-scoring programs in subsequent iterations

Treating the LLM as a trainable search operator (not just a static generator) allows it to learn patterns from successful discoveries. The method uses forward KL-regularized DPO to maintain output diversity.

Extra details

refactor the slop

Island-based program database

Programs clustered into separate “islands” that evolve independently. Within islands, programs are grouped by score into clusters. Sampling procedure:

Select island uniformly

Sample $m = 2$ clusters from island using softmax over scores

Pick shortest program from each cluster

Construct prompt with program-score pairs + task description

Preference dataset construction

From $K$ outputs generated per prompt:

Valid programs divided into higher/lower scoring halves

Random pairing creates preference triplets $(x, y_{+}, y_{-})$

Failed programs paired with any valid program

Additional filtering: exclude pairs where $r (y_{+}) < τ^{t}$ (dynamically adjusted threshold)

Ideas / Thoughts / Comments

The Bitter Lesson: Combining search (exploration via evolution) with learning (pattern extraction via RL).
Neither alone is sufficient: pure search is inefficient, pure learning on fixed data limits exploration.

The island structure resembles OMNI-EPIC’s task-specific agents - each island maintains its own evolutionary trajectory while the shared LLM accumulates meta-knowledge across all islands.

Max Wolf's Second Brain

Explorer

Algorithm Discovery With LLMs - Evolutionary Search Meets Reinforcement Learning

In a nutshell

Extra details

Ideas / Thoughts / Comments

Graph View

Backlinks