Machine learning involves algorithms that โlearnโ from data. They take data as input and produce a model as output.
Problems
There are three main forms of learning depending the problem type.
- ๐ Supervised Learning deals with modeling inputs
to output . - ๐ Unsupervised Learning finds patterns in unlabeled inputs
. - โ๏ธ Reinforcement Learning trains an agent to act in an environment.
Across these three problems, we also have a multitude of more specific scenarios:
- Semi-supervised learning uses both labeled and unlabeled data, essentially combining unsupervised and supervised methods.
- Self-supervised learning frames unlabeled data as a supervised problem by extracting โpretextโ tasks from the existing data (for example, in-painting).
- โ Active Learning extends semi-supervised learning by allowing the model to select which data samples to label next.
- ๐ชฉ Representation Learning tackles learning latent embeddings of the dataโsummaries of the data with semantically-meaningful information.
- ๐จ Generative Modeling learns generative models that can โsampleโ from the data distribution, creating synthetic data that resembles its input.
Models
Theoretically, models can be thought of as probability distributions. They can be either generative or discriminative.
- Generative models
or capture the entire shape of the data distribution and predicts via probabilistic inference. - Discriminative Models
directly draw boundaries in the data space.
With simple data, ๐ญ Linear Factor Models can capture basic distributions. However, as our data becomes more complex, we canโt directly model the joint distributions and must incorporate assumptions via ๐ชฉ Probabilistic Graphical Models.
Either way, models are usually designed for one problem type:
Priors
Classical machine learning models generally have statistical and mathematical roots and are largely reliant on the smoothness priorโthat the function we learn should be smooth within a small region,
In more complex problems, classical methods run into problems. As the dimensionality of the problem space increases, we run into the โ ๏ธ Curse of Dimensionality. The smoothness prior is not sufficient to generalize higher dimensional problem spaces as there arenโt enough samples to cover the space. Rather, we have the ๐ช Manifold Hypothesis, which states that samples lie on a low-dimensional subspace of the full problem space.
To address this challenge, ๐ง Deep Learning uses biologically-inspired neural network architectures that are magnitudes more complex than classical methods. This introduces stronger priors and allows us to model more flexible functions that can capture higher-dimensional inputs.
Optimization
Training machine learning models involve optimizing their parameters, also called weights.
- โฐ๏ธ Gradient Descent gradually moves down a convex loss function. ๐ฑ Natural Gradient is a variant that moves in probability space rather than parameter space.
- ๐ Greedy Search performs feature selection for non-convex loss.
- ๐ Expectation Maximization optimizes hidden variables in unsupervised models.
More generally, mathematical techniques like ๐ Unconstrained Optimization and ๐ Constrained Optimization are used to optimize functions that satisfy certain conditions.
In practice, ๐ Overfitting is a common problem where models learn noise in the training data thatโs not part of the real world. The solution is two-fold.
- โฝ๏ธ Regularization Penalties in loss functions apply weight shrinkage or selection.
- โ Validation methods find hyperparameters that optimize model complexity.
๐ AutoML is a modern solution that automates both processes by automatically building a ensemble that maximizes performance for a given problem.
Operations
Finally, there are some more real-world practices to keep in mind.
- โ Imputation is required to address missing data.
- ๐น Classification Metrics is necessary to measure a modelโs performance.
- ๐๏ธ Explainability analyses the patterns our model learned and what the model actually tells us about the world.