If weโ€™re given more โ€œinformationโ€ about a distribution, ๐Ÿ”ฅ Entropy decreases. Specifically, for the distribution that has some correlation with , if we learn the value of , we narrow the range of results of and thereby decrease entropy.

Averaging over the probability of all that we can know, we get the equation for conditional entropy,

The decrease in entropy caused by us knowing is known as information gain.