HyperAgents

year: 2026/03
paper: https://arxiv.org/abs/2603.19461 | hyperagents
website:
code: https://github.com/facebookresearch/Hyperagents
connections: Gödel Machine, Darwin Gödel Machine, self-referential, coding agent, Jenny Zhang, Jeff Clune, meta learning, open-endedness

https://x.com/jennyzhangzt/status/2036099940456206759?s=20

TLDR

Extends the DGM. DGM’s self-improvement only worked because coding is a special case: the task (solve coding problems) and the self-modification skill (edit your own code) are the same. Better coder = better self-modifier. This alignment doesn’t hold for other domains (a better poetry agent isn’t better at editing its own code).

A hyperagent is a single editable program containing both a task agent (solves the task) and a meta agent (proposes modifications). Both are part of the same codebase, both are editable. The meta agent can modify the task agent’s code, but also its own code.

In DGM, the mechanism that decides what to edit (the instruction generator: reads eval logs, proposes changes) was hardcoded, never modified. Hyperagents make it explicit and domain-general by putting the instruction generator inside the editable codebase. They call this metacognitive self-modification.

Results

Works beyond coding: paper review (0 → 0.71), robotics reward design (0.06 → 0.37), Olympiad math grading. Ablations confirm both self-improvement and open-ended exploration remain necessary (same as DGM).

DGM (original, hardcoded for coding) scores ~0 on non-coding domains. DGM-custom (manually adapted per domain) does better but requires human engineering per domain. DGM-H matches or beats DGM-custom without any domain-specific engineering.

Meta-level transfer

Meta-improvements transfer across domains: hyperagents evolved on paper review + robotics, when transferred to math grading (never seen), generate much better task agents than starting from scratch (imp@50 of 0.63 vs ~0 for DGM transfer agents). DGM’s transfer agents show no improvement because their gains were domain-specific customizations, not general meta-skills.

Improvements compound across runs: starting a new DGM-H run from a transferred hyperagent reaches higher performance than starting fresh.

What it discovered at the meta-level

Performance tracking: logging metrics across iterations, computing improvement trends, identifying which generations worked and which over-corrected.

Persistent memory: storing causal hypotheses, recording what worked and why, consulting this memory during future modifications.

These are general-purpose meta-skills, not domain-specific tricks. That’s why they transfer.

Max Wolf's Second Brain

Explorer

HyperAgents

Graph View