year: 2024/03
paper: https://arxiv.org/pdf/2403.13187 | Evolutionary Optimization of Model Merging Recipes (copy)
website: https://sakana.ai/evolutionary-model-merge/
code: https://github.com/SakanaAI/evolutionary-model-merge
connections: model merging, sakana AI, cma-es, neural architecture search, PMA
Further unslopify
Two-space merging approach
Parameter Space (PS): Optimizes layer-wise merging coefficients using TIES-Merging + DARE. CMA-ES evolves sparsification and weight mixing parameters for each layer independently.
Data Flow Space (DFS): Evolves the inference path through layers from different models. Tokens traverse non-sequential paths (e.g., layer i in model A → layer j in model B). An indicator array I and scaling matrix W are evolved to determine layer inclusion and handle distribution shifts.
Results that surprised the community
Their 7B Japanese Math LLM outperformed 70B Japanese models on benchmarks despite being 10× smaller. The merged model achieved 52.0% on MGSM-JA (Japanese math) vs 30% for the best source model.
Evolution discovered that certain layers were actively harmful - removing layer 30 from MetaMath-13B improved performance by 2%. This “subtraction” capability is unique to DFS merging.
Why evolution works for merging
Unlike traditional NAS which requires training each candidate, model merging evaluates combinations immediately. Evolution explores a vast combinatorial space efficiently:
- PS: Optimizes 2N parameters for N models
- DFS: Searches configurations where T = layers × repetitions
CMA-ES handles the continuous optimization in PS while indicator arrays enable discrete search in DFS.