Abstract
Maximum likelihood estimate (MLE) is how we estimate probabilities given only the data.
Given historical data
Another way to interpret this estimate is minimizing the dissimilarity between the empirical data distribution and the model distribution. Measuring dissimilarity with โ๏ธ KL Divergence, we have
but since
Example
Weโll illustrate this concept with a coin-flip example. Let
Note
Note that in the equation above, we find the
over log-likelihood instead of likelihood. This works because log is monotonically increasing, and computing log-likelihood simplifies the math and avoids overflow errors.