year: 2025/09
paper: evolution-strategies-at-scale-llm-fine-tuning-beyond-reinforcement-learning
website:
code:
connections: evolution strategies, LLM, fine-tuning, cognizant


check out the paper in detail

We also released three intriguing follow-up works on this new direction: (1) Quantized Evolution Strategies (QES) extends ES to post-training of quantized LLMs. With a frugal memory usage at low-precision inference level, QES achieves a high-precision optimization trajectory in quantized parameter space. (arXiv: https://arxiv.org/abs/2602.03120, alphaXiv: https://alphaxiv.org/abs/2602.03120) (2) The “Blessing of Dimensionality” paper tries to explain why ES only needs a population size of ~30 to fine-tune billions of parameters. It discovers that larger models may have lower intrinsic dimensionality, which makes parameter-space search in ES easier. (arXiv: https://arxiv.org/abs/2602.00170, alphaXiv: https://alphaxiv.org/abs/2602.00170) (3) Evolution Strategy for Metacognitive Alignment (ESMA)” uses ES to fine-tune LLMs to know what they know. That is, using alignment between “whether LLM answers one question correctly” and “whether LLM knows it can answer one question correctly” as the objective of fine-tuning, strengthening the metacognitive alignment of LLMs. (arXiv: https://arxiv.org/abs/2602.02605, alphaXiv: https://alphaxiv.org/abs/2602.02605)