The successor representation framework is an extension of value estimation that decouples reward and future states. That is, it observes that

where contains the reward for each state and contains expected discounted future states. Combining them together, we get the expected future reward, the exact definition of a value function.

The advantage with this factorization is that the environment dynamics is now decoupled from the actual rewards. In fact, both can be learned separately from experience: the reward can be optimized with gradient descent, and the visitation probabilities with โŒ›๏ธ Temporal Difference Learning,

๐Ÿ™

With , we can compute values when the reward changes in our environment, thus enabling transfer.