ResNet (residual network) tackles the vanishing/exploding gradient problem that plagues deep ๐ธ๏ธ Multilayer Perceptrons. It introduces the residual block, pictured below which makes it easier for the network to learn an identity mapping if the extra network depth is unneeded.
In the residual block, we save the input
If our desired transformation for these two layers is
ResNeXt
ResNeXt models extend the residual idea by adding multiple stacked layers in parallel between the shortcut connection, forming the structure below.
Instead of expanding upon the depth of the network, it extends the cardinality of each block akin โnetwork-in-neuron.โ This kind of parallel computation is also called grouped convolution.