The successor representation framework is an extension of value estimation that decouples reward and future states. That is, it observes that
where
The advantage with this factorization is that the environment dynamics is now decoupled from the actual rewards. In fact, both can be learned separately from experience: the reward can be optimized with gradient descent, and the visitation probabilities with โ๏ธ Temporal Difference Learning,
With