Entropy is the average level of โ€œsurpriseโ€ or โ€œuncertaintyโ€ in a probability distribution. Intuitively, the surprise of an individual event is inversely correlated with the probability of the event happeningโ€”we would be surprised if an improbable event happenedโ€”so we can quantify surprise as

Note that we apply the log to change the bound from to .

By combining all events together with an expectation, we get the entropy equation

Another way to interpret this is the expected number of bits needed to encode or number of questions needed to guess . Tying it all together: the more average surprise we have, the more uncertain we are about the results; the more uncertainty there is, the more bits we need to encode the distribution.

Info

Uncertainty is a measure of the variance of a distribution. A distribution with high entropy or uncertainty would be roughly uniform.

Below is a graph of the entropy of binary variable . Highest entropy is when probability is , and as probability goes toward one extreme, entropy decreases since becomes less random.

Conditional Entropy

Just like with a standard distribution, entropy can be measured for a conditional distribution . If we knew some value , then the entropy of the conditional is

Conditional entropy is the expectation of this entropy for all ,

More intuitively, we can derive