Theory

K-Means groups datapoints into distinct groups, called clusters. Each cluster is defined by its centroid . We optimize the clusters using ๐ŸŽ‰ Expectation Maximization where we treat the cluster assignment as our hidden state .

We implicitly minimize the loss function

where when belongs in cluster ; in other words, we minimize the distance between each point and the centroid of the cluster it belongs to. This can also be interpreted as reconstruction loss if we approximate points by their centroid.

Info

For a softer clustering, we can turn to ๐Ÿ“ผ Gaussian Mixture Models that model the probability of belonging to cluster rather than a simple or .

E-step finds the best for each point, assigning each point to its closest centroid. M-step finds the best for each cluster, recalculating the centroid to be the average of the clusterโ€™s points.

Model

K-Means tracks the centroids of the clusters; cluster assignment can be calculated for each point by finding its closest centroid.

Training

Given training data , pick centroids at random, usually points from that are spread out.

Alternate until convergence.

  1. Assign each point to its nearest centroid.
  2. Recalculate each centroid to be the mean of its assigned points.

Prediction

Given point , its cluster assignment is the centroid closest to .