The difference in ๐ง Cross Entropy and ๐ฅ Entropy is known as KL divergence, which can multiple common forms below:
This value can be interpreted as the expected extra number of bits to transmit using our predicted
Info
Note that this value is non-symmetric, non-negative, and does not satisfy triangle inequality.
We commonly see KL divergence or cross entropy used as loss functions in categorization problems (for example, in ๐ฆ Logistic Regression). The truth label
for a single datapoint where
We can also apply KL divergence to ๐ฐ Information Gain. Rather than computing it as the difference in entropies before and after knowing