Mutual information measures the dependence between two random variables. Formally,
where is the โ๏ธ KL Divergence. Intuitively, if and are independent, , we have low mutual information; conversely, the more dependent they are, the higher mutual information is.
Using the definition of KL divergence, we can also write
Following the last expression, another interpretation of mutual information is the difference between the ๐ฅ Entropy of and the entropy of after knowing . If has some information about , our entropy for would decrease after knowing , so our mutual information would be high.