Abstract
The evidence lower bound (ELBO) is a lower bound on the log probability of our data
. This is often used to find an accurate variational distribution or to maximizing the likelihood .
Consider a latent variable model involving observed data
where
Inequality Proof
The inequality derivation is as follows. We start with the likelihood and apply a variational distribution
From the second-to-last equation, we observe that the gap between the ELBO and evidence is exactly the โ๏ธ KL Divergence between our approximate posterior
Then, rewriting the equality, we get
From the above expression, we can see that maximizing the ELBO can achieve two objectives: maximizing likelihood and minimizing KL.
- Maximizing likelihood is useful when we want to find the optimal
given the observed variables we want to model. - Minimizing KL, equivalent to finding the best approximation
, is commonly used in variational inference.