Gaussian mixture models structure datapoints into groups, or clusters. Each point is assigned a probability distribution over clusters, where is the cluster assignment for ; in other words, for each point , we maintain the probabilities of belonging to each cluster.

Info

Gaussian mixture models are a form of soft clustering as opposed to the hard clustering in ๐ŸŽ’ K-Means Clustering.

Each cluster, represented by a ๐Ÿ‘‘ Gaussian distribution, is defined by a centroid , covariance matrix , and size ; size is the probability a sample is drawn from mixture , and the sum of all cluster sizes is .

In a generative sense, our data is generated from gaussians,

where and gives us the probability of drawing from the th Gaussian.

We always optimize centroids , but and can be constant or calculated during optimization. If preset, . Variance can be restricted to fully-flexible, diagonal, or spherical.

Info

Note that this equation is incredibly similar to ๐Ÿ‘ถ Naive Bayes. If Naive Bayes lets be a Gaussian distribution (instead of discrete), we get a Gaussian Mixture with independent (diagonal covariance ).

Training

Gaussian mixtures uses the ๐ŸŽ‰ Expectation Maximization algorithm to optimize its mixtures. We first find the mixture distribution , then recalculate the parameters for each mixture. This optimization form is equivalent to maximizing the log likelihood

Algorithm

Given training data , randomly choose , , (assuming we let sizes and variances vary).

Alternate until convergence.

  1. For each data point , estimate .
  2. Calculate new parameters for each mixture

Info

Intuitively, the E-step calculates cluster assignments, and the M-step finds the most likely parameters based off the assignments.

Prediction

Given point , calculate for each mixture; this gives a probability distribution over clusters, which is a soft classification (to get the hard classification, take the class with highest probability).