Response surface methods is a mix between โ™Ÿ๏ธ Reinforcement Learning and โœ‹ Active Learning that aims some that minimizes (for unknown function ). To do so, we repeatedly query and improve our guess for .

Unlike active learning, our goal is to minimize instead of fitting to the entire data. This makes the problem much more like reinforcement learning, specifically ๐Ÿ“– Contextual Bandit, where corresponds with an action and is the reward (or loss, in our minimization case).

Training

Given a set of datapoints , fit a model , known as the response surface (analogous to model of the world). Then, repeat the following.

  1. Pick the next using gradient descent on . This is similar to the exploitation step in standard reinforcement learning.
  2. Measure the corresponding , then use to update .