Least-squares TD is a โœ๏ธ Linear Function Approximation solution to the standard temporal difference formula. This solution has an analytical closed form, but derive it from first analyzing the gradient update step. For ease of notation, let the encoding of state be and consider the following:

In steady state (expectation over all possible TD updates), we expect

where and from the equation above. Once converges to , there are no updates, so

Solving, we derive the equation

which is called the TD fixed point, or the parameters for linear function convergence.

Itโ€™s possible to arrive at this solution through both gradient descent and calculating the and matrices directly. The latter is least-squares TD, and we approximate our two matrices by sampling; at time step in our episode, we have

which gives us