Resume

Research

Learning

Blog

Teaching

Jokes

Kernel Papers

Bayesian Optimization (BO) concerns gradient-free, assumption-free optimization. The goal is to solve a general optimization problem with no known structure (e.g. convexity or linearity) to exploit, and where we do not have access to any of the function \(f(\dot)\)’s derivatives.

\[\max_{x \in X} f(x)\]Many other assumptions are also often applied, including:

- Evaluating the function is expensive. For instance, \(f(\cdot)\) might be the outcome of an economic policy and the input the available economic levers (e.g. tax credits).
- The feasible set \(X\) is easy to assess membership.
- The function \(f\) is continuous. This assumption is necessary to use a common approach in the field (i.e. Gaussian process regression)

All/most approaches to Bayesian Optimization have two components:

- A statistical model of the objective function, often called the
*surrogate function* - A method for deciding where to sample next, often called the
*acquisition function*.

There are a variety of different acquisition functions, but almost always, the surrogate function is Gaussian Process regression.

The psuedo-code is then:

- Place a GP prior on \(f\)
- For as many iterations as you can afford
- Update the posterior on \(f\) using all available data
- Choose a point \(x_n\) as the maximizer of some acquisition function
- Observe \(y_n = f(x_n)\)