The log derivative trick allows us to estimate the gradient
by rewriting it as a valid expectation.
Note that simply expanding out this expectation gives us the following:
However,
Instead, we observe that
which allows us to rewrite the expectation below:
where