Entropy regularization in reinforcement learning encourages exploration by encouraging a stochastic policy
where
Our value functions follow a similar form:
Note that for the action-value, we only consider action entropy after the initial action
Moreover, we also have a variant of the ๐ Bellman Equation: