Value iteration is a reinforcement learning algorithm that estimates the Q-function and value function using the environment’s dynamics. The algorithm performs the following two steps for every state : 1.

An alternative formula merges the two steps, repeating a single instruction instead:

This method implicitly finds a deterministic optimal policy since the second step is assuming that our policy takes the best action following ,

We can interpret value iteration as a simpler version of ♻️ Policy Iteration where we merge the policy improvement step into policy evaluation. It also follows the same convergence guarantee for the tabular case; the intuition is that our algorithm continuously contracts the distance between our current and the optimal , which ensures convergence.

Fitted Value Iteration

While the above assumes the is a table with discrete states , it’s often not possible to use this tabular method when the state space is huge. Fitted value iteration replaces the table with a neural network , and our algorithm must learn to fit the network in the second step:

Unfortunately, when we introduce a neural network, our convergence guarantee doesn’t work. Our neural network effectively restricts the space of possible functions, and if we contract toward and then fit our neural network on the desired update, it’s possible to move further from the optimal parameters.

Fitted Q-Iteration

Value iteration and fitted value iteration both require knowing the transition probability . We can circumvent this by fitting the function instead as follows:

Note that in the first step, we approximate

using a single sample , and our value function is defined by the best action like above. Fitted Q-iteration now only requires tuples which can be generated by any policy, and thus it relaxes our reliance on . We can collect these tuples in a dataset before training or perform online updates; the latter gives us the function approximation version of 🚀 Q-Learning.

Explorer

💎 Value Iteration

Fitted Value Iteration

Fitted Q-Iteration

Table of Contents

Backlinks

Graph View

Explorer

💎 Value Iteration

Fitted Value Iteration §

Fitted Q-Iteration §

Table of Contents

Backlinks

Graph View

Fitted Value Iteration

Fitted Q-Iteration