The natural gradient is a โฐ๏ธ Gradient Descent procedure that uses a different constraint for the optimization step. Gradient descent finds
That is, the constraint for standard gradient descent is defined by the
To solve this, we first need to make an approximation. Itโs difficult to directly use the โ๏ธ KL Divergence constraint in our optimization, but we can substitute
where
We can approximate
Then, solving the Lagrangian, we get the gradient step
where
if we decide to set