Evolutionary Optimization of Model Merging Recipes

year: 2024/03
paper: https://arxiv.org/pdf/2403.13187 | Evolutionary Optimization of Model Merging Recipes (copy)
website: https://sakana.ai/evolutionary-model-merge/
code: https://github.com/SakanaAI/evolutionary-model-merge
connections: model merging, sakana AI, cma-es, neural architecture search, PMA

Further unslopify

Two-space merging approach

Parameter Space (PS): Optimizes layer-wise merging coefficients using TIES-Merging + DARE. CMA-ES evolves sparsification and weight mixing parameters for each layer independently.

Data Flow Space (DFS): Evolves the inference path through layers from different models. Tokens traverse non-sequential paths (e.g., layer i in model A → layer j in model B). An indicator array I and scaling matrix W are evolved to determine layer inclusion and handle distribution shifts.

Results that surprised the community

Their 7B Japanese Math LLM outperformed 70B Japanese models on benchmarks despite being 10× smaller. The merged model achieved 52.0% on MGSM-JA (Japanese math) vs 30% for the best source model.

Evolution discovered that certain layers were actively harmful - removing layer 30 from MetaMath-13B improved performance by 2%. This “subtraction” capability is unique to DFS merging.

Why evolution works for merging

Unlike traditional NAS which requires training each candidate, model merging evaluates combinations immediately. Evolution explores a vast combinatorial space efficiently:

PS: Optimizes 2N parameters for N models

DFS: Searches $2^{T}$ configurations where T = layers × repetitions

CMA-ES handles the continuous optimization in PS while indicator arrays enable discrete search in DFS.

Max Wolf's Second Brain

Explorer

Evolutionary Optimization of Model Merging Recipes

Graph View