Introduction to Bayesian Optimization

Bayesian Optimization (BO) concerns gradient-free, assumption-free optimization. The goal is to solve a general optimization problem with no known structure (e.g. convexity or linearity) to exploit, and where we do not have access to any of the function \(f(\dot)\)’s derivatives.

\[\max_{x \in X} f(x)\]

Many other assumptions are also often applied, including:

Evaluating the function is expensive. For instance, \(f(\cdot)\) might be the outcome of an economic policy and the input the available economic levers (e.g. tax credits).
The feasible set \(X\) is easy to assess membership.
The function \(f\) is continuous. This assumption is necessary to use a common approach in the field (i.e. Gaussian process regression)

All/most approaches to Bayesian Optimization have two components:

A statistical model of the objective function, often called the surrogate function
A method for deciding where to sample next, often called the acquisition function.

There are a variety of different acquisition functions, but almost always, the surrogate function is Gaussian Process regression.

The psuedo-code is then:

Place a GP prior on \(f\)
For as many iterations as you can afford
- Update the posterior on \(f\) using all available data
- Choose a point \(x_n\) as the maximizer of some acquisition function
- Observe \(y_n = f(x_n)\)