SimCLR is a contrastive learning framework that learns a latent representation by maximizing similarity between different augmentations of the same sample. The framework has four components:

  1. Data augmentation composition techniques that produce two views of a data sample, and .
  2. A base encoder that extracts the representation .
  3. A projection head that maps representations to contrastive space, . Though itโ€™s possible to optimize directly on the representations, the contrastive loss may discard some valuable representation information (like color or orientation transformations) that arenโ€™t crucial for measuring similarity.
  4. The contrastive loss that finds the corresponding for some among a set with negative samples .

Specifically, weโ€™ll first define similarity as cosine similarity,

Then, weโ€™ll perform the following for multiple iterations:

  1. Sample a minibatch of examples, then augment them to form views. For each , weโ€™ll treat the other views as negative samples.
  2. Optimize a temperature-scaled โ„น๏ธ InfoNCE loss,
๐Ÿ™