Abstract

The evidence lower bound (ELBO) is a lower bound on the log probability of our data . This is often used to find an accurate variational distribution or to maximizing the likelihood .

Consider a latent variable model involving observed data and latents . The evidence (or variational) lower bound (ELBO) deals with a joint distribution on observed data and latents . Specifically, it lower bounds the evidence, defined here as the log likelihood of :

where is a distribution with parameters that we use to approximate . Note that another common form can be derived by rearranging terms,


Inequality Proof

The inequality derivation is as follows. We start with the likelihood and apply a variational distribution to approximate the posterior.


From the second-to-last equation, we observe that the gap between the ELBO and evidence is exactly the โœ‚๏ธ KL Divergence between our approximate posterior and the true posterior .

Then, rewriting the equality, we get

From the above expression, we can see that maximizing the ELBO can achieve two objectives: maximizing likelihood and minimizing KL.

  1. Maximizing likelihood is useful when we want to find the optimal given the observed variables we want to model.
  2. Minimizing KL, equivalent to finding the best approximation , is commonly used in variational inference.