Deep Learning is a subfield of ๐ค Machine Learning that focuses on neural networks, which model complex non-linear functions using certain structures and layer types.
Models
There are many distinct models that capture different assumptions about the data. As such, each have their own advantages and disadvantages in modeling a discriminative or generative distribution.
- ๐ธ๏ธ Multilayer Perceptron for general data, usually tabular with no dependencies.
- ๐๏ธ Convolutional Neural Network with spatial translation (2D) invariance for vision.
- ๐ฌ Recurrent Neural Network for sequential translation (1D) for text or audio.
- ๐ค Graph Neural Network for graphical structures like molecules or social nets.
- ๐ฆพ Transformer for capturing inter-and-intra-sequence (1D) dependencies.
- ๐ฑ Neural ODE for modeling continuous hidden states, generally for probabilities.
Enhancements
Building on top of these basic structures, there are general architectures and techniques that can typically be applied to a variety of domains.
- ๐ช ResNets introduce residual connections in MLPs that allow for increased model depth and complexity.
- ๐งฌ Autoencoders form an encoder-decoder framework for dimensionality reduction and latent space modeling.
- ๐จ Attention is used for weighted hidden calculations to capture dependencies.
Domains
There are a variety of domains that have specialized literature. The two biggest ones are computer vision and natural language processing, dealing with 2D/3D and 1D data respectively.
Computer Vision
Computer vision tasks deal with image data, generally using some form of the ๐๏ธ Convolutional Neural Network. The following are some landmark vision techniques.
Object Detection
- ๐ YOLO performs object detection in a single pass through a CNN.
- ๐ Faster R-CNN uses a region proposal network and detector to decouple the localization and classification tasks.
- ๐ฆ DETR predicts a fixed number of bounding boxes via a transformer.
Segmentation
- ๐ญ Fully Convolutional Networks are commonly used for dense predictions; one landmark paper successfully trains it for semantic segmentation.
- ๐บ Mask R-CNN augments Faster R-CNN with a mask branch for instance segmentation.
Synthetic Generation
- ๐ผ๏ธ Generative Adversarial Networks train a generator and discriminator on an adversarial loss for the generator to produce realistic samples.
- โจ StyleGAN improves on the original generator architecture by adapting ideas from style-transfer, which results in significantly stronger disentanglement.
- โป๏ธ CycleGAN is a GAN variant designed for domain transfer.
- ๐ง NCSN and ๐งจ Score SDE model the score of the data distribution via multiple noise levels and samples by moving up the score.
- ๐ฏ๏ธ Diffusion Probabilistic Models generates sample from a data distribution by learning a denoising process.
Reconstruction
- ๐ NeRF is a 3D modeling technique that models the density and color of points in space. NeRFs output images via rendering techniques.
- ๐บ Occupancy Networks model the occupancy (existence probability) of each point in space and allow for complete 3D inference.
Miscellaneous
- ๐ Group Equivariant CNN generalizes the convolution operation for translation, rotation, and reflection invariance.
- ๐พ PointNet and ๐ PointNet++ processes unordered point input and guarantees permutation invariance.
- ๐ฆฟ Vision Transformers adapt the transformer architecture to images by converting input images to a sequence of patches.
- ๐ ConvNeXt is a pure CNN model designed to mimic techniques from vision transformers.
Natural Language Processing
Natural language processing works with sequential data, commonly sentences or audio, with ๐ฌ Recurrent Neural Networks or ๐ฆพ Transformers.
- ๐งต Seq2Seq is an encoder-decoder architecture for sequence translation.
- ๐ฅ Long Short-Term Memory and โฉ๏ธ Gated Recurrent Units are RNN architectures designed to maintain memory of past information in long sequences.
- โ๏ธ Temporal Convolutional Networks apply 1D convolutions for sequence modeling.