A mixture model uses multiple distributions to fit a complex distribution. Weโ€™ll focus on a mixture of two Normal distributions, which looks like this:

This is akin to two ๐Ÿ›Ž๏ธ Normal Models where we randomly draw from each one according to probability .

The likelihood for is

This likelihood is very complicated, and this is because each datapoint has a possibility of coming from either Normal component. To make this simpler, we can introduce indicator variables

Incorporating , we can rewrite our likelihood as

However, we only have and not ; luckily, the ๐ŸŽ‰ Expectation Maximization algorithm serves to โ€œfill inโ€ this missing data. The steps are as follows:

  1. Start with initial values for .
  2. Expectation: calculate expected values for each ,
  1. Plug into the likelihood and find new values that maximize the likelihood; note that it can be easier to instead maximize the log likelihood.
  2. Go back to step 2, repeat until convergence.

The EM algorithm converges to a fixed set of values, providing only point estimates for our parameters. TBD for posterior distribution=