Deep Q-Learning1 is a landmark โ๏ธ Reinforcement Learning algorithm that uses ๐ Q-Learning to train a ๐๏ธ Convolutional Neural Networkโtermed Deep Q-Network (DQN). This takes advantage of the powerful function approximation abilities of CNNs and was originally used to achieve groundbreaking performance on Atari games.
On top of replacing
The entire algorithm is as follows:
- Take some action
with ๐ฐ Epsilon-Greedy and observe , add it to . - Sample a mini-batch
from . - Compute
- Update parameters
Target Network
Unfortunately, Deep Q-Learning satisfies the function approximation, bootstrapping, and off-policy sampling conditions of the ๐ Deadly Triad, making it prone to unstable or divergent training. One way to improve the function approximation component is to use a target network
In the algorithm above,
Dueling Architecture
Another improvement to the original DQN is a modification of the architecture. Whereas the original predicts Q-values for each
While the definition of the value functions has
this is unidentifiable since a value for
With an optimal policy, the average should instead be a max operator (since
With this computation, the dueling architecture can be plugged into any algorithm that uses Q-value estimates and offers the benefit of generalizing across actions.