In stochastic models like the 🖋️ Variational Autoencoder, we sometimes have a sampling process in the middle of our model. However, during training, 🤝 Backpropagation cannot flow through a random node because the randomness is also determined by parameters we’re seeking to optimize.

Let our sampling process be

We cannot backpropagate through directly, but we can instead reparameterize as follows:

This essentially offsets the stochastic process to a separate part of our computation graph, allowing the gradient to flow through . The following shows the difference between the two computations.

Formal Justification

Consider the gradient of

If we can differentiate , we get:

However, if our density is parameterized by , we have:

The first term requires the gradient of , which we can’t find. However, if we reparameterize , we can differentiate and calculate the entire gradient.

Explorer

🪄 Reparameterization Trick

Formal Justification

Backlinks

Graph View

Explorer

🪄 Reparameterization Trick

Formal Justification §

Backlinks

Graph View

Formal Justification