The log derivative trick allows us to estimate the gradient

by rewriting it as a valid expectation.

Note that simply expanding out this expectation gives us the following:

However, isnโ€™t a valid probability, so we canโ€™t directly approximate the integral as an average via ๐Ÿค” Monte Carlo Sampling.

Instead, we observe that

which allows us to rewrite the expectation below:

where are samples drawn from .