In stochastic models like the ๐Ÿ–‹๏ธ Variational Autoencoder, we sometimes have a sampling process in the middle of our model. However, during training, ๐Ÿค Backpropagation cannot flow through a random node because the randomness is also determined by parameters weโ€™re seeking to optimize.

Let our sampling process be

We cannot backpropagate through directly, but we can instead reparameterize as follows:

This essentially offsets the stochastic process to a separate part of our computation graph, allowing the gradient to flow through . The following shows the difference between the two computations.

Formal Justification

Consider the gradient of

If we can differentiate , we get:

However, if our density is parameterized by , we have:

The first term requires the gradient of , which we canโ€™t find. However, if we reparameterize , we can differentiate and calculate the entire gradient.