The Bellman equations defines a recursive property of the Value Functions:
The recursion relates our states to the successor states, and our value functions are always solutions to the Bellman equation.
Bellman Optimality Equation
Value functions allow us to objectively rank policies, and therefore we can define an optimal policy
We can apply the Bellman equations above to them, but since our policyโs optimal, it always chooses the best action. Thus, we have the optimality equations:
An exhaustive search over the entire MDP tree would allow us to solve the system of equations for the optimal value functions, which gives us the optimal policy. However, this is almost never possible due to computational costs, non-Markovian states, or unknown dynamics; many algorithms thus aim to approximate the solution, and this the optimality equation shows up all across reinforcement learning.