Abstract

Maximum likelihood estimate (MLE) is how we estimate probabilities given only the data.

Given historical data , we can estimate the probabilities that generated this data. With the maximum likelihood estimate (MLE), we find that maximizes the likelihood of generating :

Another way to interpret this estimate is minimizing the dissimilarity between the empirical data distribution and the model distribution. Measuring dissimilarity with โœ‚๏ธ KL Divergence, we have

but since is a constant with respect to the model parameters, to minimize this divergence by maximizing the negation, , which is equivalent to our objective above.

Example

Weโ€™ll illustrate this concept with a coin-flip example. Let be a set of coin-flip results, with heads and tails, and let be the probability of the coin landing heads. Our optimal parameter is calculated as follows.

Note

Note that in the equation above, we find the over log-likelihood instead of likelihood. This works because log is monotonically increasing, and computing log-likelihood simplifies the math and avoids overflow errors.