Experience replay is a technique used in ♟️ Reinforcement Learning, usually with 🚀 Q-Learning.

One problem with the standard Q-learning algorithm is that our tuples will be heavily correlated with each other since we’re collecting them from the same trajectory. If we train our network in this order, we’re overfitting on our current trajectory.

The solution to this is a replay buffer that stores some past transitions. When training, instead of using the most recent tuple, we randomly take a batch from ; we also periodically update the buffer with our most recent data. This method disrupts the correlation, reducing variance and allowing our Q-function to generalize better. Moreover, we also improve sample efficiency since a transition can be used for multiple updates.

Prioritized Experience Replay

Prioritized Experience Replay¹ notes that in the original formulation, all transitions are equally likely to be sampled, regardless of their “usefulness” or “importance.” It would be more efficient to instead sample the important transitions more frequently.

One straightforward notion of “importance” is the transition’s TD error (from ⌛️ Temporal Difference Learning); the higher the error, the more crucial its update will be. Thus, we prioritize experience replay’s stochastic sampling by assigning higher probability to those transitions. For transition , the probability we select it is

where controls the degree of prioritization ( makes it uniform) and is the priority defined as

where is the TD error and is the rank of transition in sorted ordering.

However, since our ultimate goal for sampling is to estimate the expectation, we need to avoiding biasing our prioritized estimate with 🪆 Importance Sampling weights,

where is the number of transitions and controls the weight’s importance, which can be annealed from to throughout training. In our weight update, we use instead of just .

Prioritized Experience Replay (Schaul et al, 2016) ↩

Explorer

📺 Experience Replay

Prioritized Experience Replay

Backlinks

Graph View

Explorer

📺 Experience Replay

Prioritized Experience Replay §

Footnotes §

Backlinks

Graph View

Prioritized Experience Replay

Footnotes