When we need function approximation in the case of large complex state spaces, we want to find a parameterization that balances complexity and expressiveness. On the least complex end of the spectrum is linear approximation, which simply applies an inner product between the input and parameters; for a value function estimate, we have

where is some feature vector of state and is our parameters.

While this is often impractical in practice, this method does offer some properties for theoretical analysis. The gradient of our function is

so our gradient update is

where is some estimate of the true value . In this setting, there is only one optimum, so methods that guarantee convergence will always arrive at the optimal function approximation.

Feature Construction

Since linear function approximation only uses the inner product, the expressiveness of our state feature vector is crucial to generalization.

  1. Polynomials across state dimensions allows us to relate multiple states together.
  2. Fourier basis decomposes signals into periodic functions.
  3. Coarse coding produces a binary in-or-out vector with respect to some basis; the feature for some component is if the state is in the designated region or otherwise.
  4. Tile coding is a specific type of coarse coding that breaks the state space into multiple tile grids.
  5. Radial basis functions, commonly used in 🥢 Generalized Linear Models, are a soft version of coarse coding that measures similarity with Gaussian kernel functions.