The cross entropy method (CEM) is a stochastic optimization algorithm that addresses the general optimization problem

via random selection. can be any objective, but in the reinforcement learning setting, we commonly have

for some world model .

First, weโ€™re motivated by the naive, completely stochastic approximation algorithm (sometimes called โ€œrandom shootingโ€):

  1. Pick from some distribution.
  2. Choose .

The cross entropy method notes that can be picked from an โ€œinformed guess.โ€ That is, rather than picking the at random, we can iteratively improve the distribution theyโ€™re chosen fromโ€”weโ€™ll repeat random shooting multiple times and update our sampling distribution based on results from the previous iterations.

Formally, the CEM algorithm is as follows.

  1. Sample from , which is typically a Gaussian.
  2. Evaluate .
  3. Pick the elites with the highest value ().
  4. Refit to the elites and repeat.

Note that though this method is efficient, it only works in low dimensions. Moreover, it only supports open-loop planning and doesnโ€™t incorporate any environment feedback to replan.