ResNet (residual network) tackles the vanishing/exploding gradient problem that plagues deep ๐Ÿ•ธ๏ธ Multilayer Perceptrons. It introduces the residual block, pictured below which makes it easier for the network to learn an identity mapping if the extra network depth is unneeded.

In the residual block, we save the input and apply element-wise addition between the layersโ€™ output and the original via the shortcut connection on the right.

If our desired transformation for these two layers is , the residual block instead encourages it to learn ; if is the identity mapping, then itโ€™s much easier for the network to zero the layers than learn the identity.

ResNeXt

ResNeXt models extend the residual idea by adding multiple stacked layers in parallel between the shortcut connection, forming the structure below.

Instead of expanding upon the depth of the network, it extends the cardinality of each block akin โ€œnetwork-in-neuron.โ€ This kind of parallel computation is also called grouped convolution.