In deep learning, our output goes through multiple layers of function composition,

With loss function and parameters applied to to get , we perform โ›ฐ๏ธ Gradient Descent by iteratively computing the โ„๏ธ Gradient layer-by-layer:

Automatic Differentiation

Backpropagation is a specific instance of a general technique called automatic differentiation. A computation graph over input variables , intermediate variables , and output variable can be expressed as

where is an elementary function and are parent nodes of .

Then, for function that takes in the inputs and produces the output , we have . For the remaining variables, we apply the chain rule,