🧨 DDPG

Deep deterministic policy gradient (DDPG) combines ⚔️ Deterministic Policy Gradient with 👾 Deep Q-Learning. While the DQN works in discrete action space, we can modify it for continuous actions by using a deterministic actor-critic that removes the need for the intractable max operation. DDPG borrows the replay buffer from DQN, but we also slightly deviate from the incremental target network updates by using extremely small updates to the target instead.

Formally, our setup is below:

We’ll have the critic and actor along with targets and .
Our exploration policy is a noisy version of our deterministic one,

A single time step of the algorithm is as follows:

For state , select action . Execute and store in .
Sample a minibatch from . Each tuple’s target will be

Update the critic by minimizing

Update the actor with the DPG gradient

Update the targets with an extremely small ,

Explorer

🧨 DDPG

Backlinks

Graph View