Motivation

There are many problems backpropagation cannot be used for or has limitaitons, e.g. in RL the credit assignment problem makes gradient estimates difficult: rewards are often delayed, sparse, or noisy, and policies can get trapped in local optima.
In such settings, black-box optimization methods like genetic algorithms or evolution strategies provide a gradient-free alternative.

See also: Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Link to original

Black-box optimization

You want to minimize an objective, but you don’t know its internal form.
You can only query a box: give it an input and it returns an output (e.g., a fitness/loss/…). The box may be expensive or noisy.

Problem. Minimize over . The functions is unknown to the algorithm.
Oracle. On a query , an oracle returns a random output .
Typical cases include:
Value-only ,
Value+gradient , possibly noisy.

Algorithm. Choose adaptively from past data , then output .

A common goal/criterion is: Find with suboptimality (where ) using as few oracle calls as possible (sample efficiency).

Is black-box optimization the same as zero order optimization

BBO is a modeling assumption about access to information, not a specific algorithm family.
The box might return just values, or values and gradients, but you can’t exploit an explicit formula.
BBO is about what you know (only queries), gradient-free optimization describes what you use.

Variants of black-box optimization