ResNet (residual network) tackles the vanishing/exploding gradient problem that plagues deep 🕸️ Multilayer Perceptrons. It introduces the residual block, pictured below which makes it easier for the network to learn an identity mapping if the extra network depth is unneeded.

In the residual block, we save the input and apply element-wise addition between the layers’ output and the original via the shortcut connection on the right.

If our desired transformation for these two layers is , the residual block instead encourages it to learn ; if is the identity mapping, then it’s much easier for the network to zero the layers than learn the identity.

ResNeXt

ResNeXt models extend the residual idea by adding multiple stacked layers in parallel between the shortcut connection, forming the structure below.

Instead of expanding upon the depth of the network, it extends the cardinality of each block akin “network-in-neuron.” This kind of parallel computation is also called grouped convolution.

Explorer

🪜 ResNet

ResNeXt

Backlinks

Graph View

Explorer

🪜 ResNet

ResNeXt §

Backlinks

Graph View

ResNeXt