🎲 Cross Entropy Method

The cross entropy method (CEM) is a stochastic optimization algorithm that addresses the general optimization problem

via random selection. can be any objective, but in the reinforcement learning setting, we commonly have

for some world model .

First, we’re motivated by the naive, completely stochastic approximation algorithm (sometimes called “random shooting”):

Pick from some distribution.
Choose .

The cross entropy method notes that can be picked from an “informed guess.” That is, rather than picking the at random, we can iteratively improve the distribution they’re chosen from—we’ll repeat random shooting multiple times and update our sampling distribution based on results from the previous iterations.

Formally, the CEM algorithm is as follows.

Sample from , which is typically a Gaussian.
Evaluate .
Pick the elites with the highest value ().
Refit to the elites and repeat.

Note that though this method is efficient, it only works in low dimensions. Moreover, it only supports open-loop planning and doesn’t incorporate any environment feedback to replan.

Explorer

🎲 Cross Entropy Method

Backlinks

Graph View