Bayesian optimization is a method for optimizing variables for an objective function . This is used in the situation where we don’t know the shape of but can evaluate it at any point—calculate for some . If we did know the shape, then ⛰️ Gradient Descent would be more suitable.

If we had infinite resources, we could evaluate for all and find the actual shape of . However, this is often infeasible.

Instead, bayesian optimization builds a probability model where —called surrogate model—of the objective function from a smaller set of evaluations. To do so, it requires two key components:

  1. Surrogate model: a probability model that estimates the objective given data . A common choice is a 🎲 Gaussian Process.
  2. Acquisition (selection) function: how we select which to evaluate that best improves our surrogate model.

To perform bayesian optimization, we repeat the following steps multiple times:

  1. Use the acquisition function to find the best to evaluate.
  2. Evaluate .
  3. Add to a history buffer .
  4. Fit a new surrogate model on .

At the end, we can find the global maximum from , thereby giving an estimate of the optimal .

Acquisition Functions