Mutual information measures the dependence between two random variables. Formally,

where is the โœ‚๏ธ KL Divergence. Intuitively, if and are independent, , we have low mutual information; conversely, the more dependent they are, the higher mutual information is.

Using the definition of KL divergence, we can also write

Following the last expression, another interpretation of mutual information is the difference between the ๐Ÿ”ฅ Entropy of and the entropy of after knowing . If has some information about , our entropy for would decrease after knowing , so our mutual information would be high.