Experience replay is a technique used in โ๏ธ Reinforcement Learning, usually with ๐ Q-Learning.
One problem with the standard Q-learning algorithm is that our tuples
The solution to this is a replay buffer
Prioritized Experience Replay
Prioritized Experience Replay1 notes that in the original formulation, all transitions are equally likely to be sampled, regardless of their โusefulnessโ or โimportance.โ It would be more efficient to instead sample the important transitions more frequently.
One straightforward notion of โimportanceโ is the transitionโs TD error (from โ๏ธ Temporal Difference Learning); the higher the error, the more crucial its update will be. Thus, we prioritize experience replayโs stochastic sampling by assigning higher probability to those transitions. For transition
where
where
However, since our ultimate goal for sampling is to estimate the expectation, we need to avoiding biasing our prioritized estimate with ๐ช Importance Sampling weights,
where