Rollout algorithms are decision-time planning methods that decide the next action by estimating action values via ๐ช Monte Carlo Control. That is, using the environment dynamics, we simulate multiple trajectories that start at our current state and end at a terminal state; we can then estimate value as the average return.
Rather than optimizing a universal policy, we optimize a rollout policy
and modifying