Inception Score

Inception score assesses the samples from a generative model trained on labeled datasets. The score consists of a sharpness and diversity metric, each of which require a classifier that can accurate predict the label given a sample.

Sharpness defines the modelโ€™s confidence in making predictions; specifically, it looks for low ๐Ÿ”ฅ Entropy in the classifierโ€™s predictions,

Diversity measures the entropy of the classifierโ€™s marginal predictive distribution , so

Our inception score is the product of the two metrics,

The higher the score, the better the sample quality.

Frechet Inception Distance (FID)

FID evaluates the samples from a generative model by comparing its distribution with the true data distribution.

To do this, we first run our generated samples and true samples through a classifier like InceptionNet to get its features, or activations, from one of its later layers. We can then fit a multivariate Gaussian, one for features of and one for features of , to get and . From here, we compare the two distributions using the standard Frechet distance

The lower our FID, the closer the distributions and thus the better our samples.