The gradient is a generalization of the ๐ง Derivative to functions with several variables. We find the gradient by varying one variable at the time, keeping the others constant, to find partial derivatives.
The rules from univariate differentiation still apply, but order matters since weโre dealing with matrices and vectors. The following state them more concretely using partial derivatives.
Product rule: .
Sum rule: .
Chain rule: .
Info
Note that if we compute gradients as row vectors, we can compute the chain rule for multiple multivariate functions via matrix multiplication.
We can compute gradients in higher dimensions as well. For example, the gradient of with respect to is a tensor with shape , and .
However, we can take advantage of the fact that there is a isomorphism from matrix space to vector space , which allows us to compute the Jacobian just like above.
The following are some useful gradients used in machine learning.