The Bellman equations defines a recursive property of the Value Functions:

The recursion relates our states to the successor states, and our value functions are always solutions to the Bellman equation.

Bellman Optimality Equation

Value functions allow us to objectively rank policies, and therefore we can define an optimal policy as one that gives maximum value. Using , we get the optimal state-value function and optimal action-value function for all and .

We can apply the Bellman equations above to them, but since our policyโ€™s optimal, it always chooses the best action. Thus, we have the optimality equations:

An exhaustive search over the entire MDP tree would allow us to solve the system of equations for the optimal value functions, which gives us the optimal policy. However, this is almost never possible due to computational costs, non-Markovian states, or unknown dynamics; many algorithms thus aim to approximate the solution, and this the optimality equation shows up all across reinforcement learning.