🎲 Entropy Regularization

Entropy regularization in reinforcement learning encourages exploration by encouraging a stochastic policy to have higher 🔥 Entropy, thus allowing it to stumble on more states by chance. The ultimate goal is thus

where

is a temperature hyperparameter that controls the degree of exploration we desire.

Our value functions follow a similar form:

Note that for the action-value, we only consider action entropy after the initial action . Thus, our values are related,

Moreover, we also have a variant of the 🔔 Bellman Equation:

Explorer

🎲 Entropy Regularization

Backlinks

Graph View