EM is a general algorithm that trains latent variable models ; it alternates between the expectation (E) and maximization (M) steps, iteratively improving the model.

For each iteration, we first estimate the hidden variables given the model, then estimate the model given the hidden variables. The formal procedure is as follows.

Given training data and a model with hidden variables and parameters , iterate until convergence.

  1. E-step estimates , the probability of hidden variables being a specific value, to find the expected value of each hidden variable .
  2. M-step estimates using the probabilities of calculated above and similarly finds the MLE or MAP of the parameters.

At its core, EM is an instance of variational inference using the ๐Ÿงฌ Evidence Lower Bound

  1. E-step estimates , an approximation for the actual posterior . This makes the ELBO tight to the true likelihood.
  2. M-step maximizes with respect to , holding fixed.