Introduction to Bayesian Optimization
Bayesian Optimization (BO) concerns gradient-free, assumption-free optimization. The goal
is to solve a general optimization problem with no known structure (e.g. convexity or linearity)
to exploit, and where we do not have access to any of the function \(f(\dot)\)’s derivatives.
\[\max_{x \in X} f(x)\]
Many other assumptions are also often applied, including:
- Evaluating the function is expensive. For instance, \(f(\cdot)\) might be the outcome
of an economic policy and the input the available economic levers (e.g. tax credits).
- The feasible set \(X\) is easy to assess membership.
- The function \(f\) is continuous. This assumption is necessary to use a common approach
in the field (i.e. Gaussian process regression)
All/most approaches to Bayesian Optimization have two components:
- A statistical model of the objective function, often called the surrogate function
- A method for deciding where to sample next, often called the acquisition function.
There are a variety of different acquisition functions, but almost always, the
surrogate function is Gaussian Process regression.
The psuedo-code is then:
- Place a GP prior on \(f\)
- For as many iterations as you can afford
- Update the posterior on \(f\) using all available data
- Choose a point \(x_n\) as the maximizer of some acquisition function
- Observe \(y_n = f(x_n)\)