In stochastic models like the ๐๏ธ Variational Autoencoder, we sometimes have a sampling process in the middle of our model. However, during training, ๐ค Backpropagation cannot flow through a random node because the randomness is also determined by parameters weโre seeking to optimize.
Let our sampling process be
We cannot backpropagate through
This essentially offsets the stochastic process to a separate part of our computation graph, allowing the gradient to flow through
Formal Justification
Consider the gradient of
If we can differentiate
However, if our density
The first term requires the gradient of