Abstract
Diffusion breaks down data over multiple time steps and trains a model to iteratively reconstruct it. Thus, the model learns a way to transition samples from senseless noise back to the data distribution.
Theory
Diffusion iteratively destroys data through gaussian noise and learns a way to reverse time and recreate the original image. If we successfully learn such a model, then we can generate any picture from random noise. This idea is also similar to ๐งจ Score SDEs.
Forward Process
Diffusion models destroy the training distribution of images
where
Reverse Process
Our goal is to learn the reverse process
Optimization
To find optimal parameters for
Each term has its own function.
- The first is a reconstruction term for getting back the original image.
- The second is the distance between our final noisy latent space and the standard gaussian prior.
- The third is a de-noising matching term within each time step.
De-Noising Objective
The third term contains
First, weโll find
Info
Note that to combine the two gaussians
and , our new gaussianโs variance is the sum of the two variances. In other words, the combination of and is .
Applying this pattern all the way down, we get
where
Then, plugging this into
where
Plugging this back into the KL term, we get
where
However, we can also replace
that says we can also predict the original image directly. Finally, we can express
Thus, predicting original image
Model
The model itself takes in
Training
We train our model to minimize a simplified loss defined as
where
Prediction
To sample an image, we first generate random noise
- Let
if , else . - Move one step back in time with
where
Note
The formula in the second step is derived from the equation for
in terms of and . Specifically,
Finally,