Machine learning involves algorithms that โ€œlearnโ€ from data. They take data as input and produce a model as output.

Problems

There are three main forms of learning depending the problem type.

  1. ๐ŸŽ“ Supervised Learning deals with modeling inputs to output .
  2. ๐Ÿ” Unsupervised Learning finds patterns in unlabeled inputs .
  3. โ™Ÿ๏ธ Reinforcement Learning trains an agent to act in an environment.

Across these three problems, we also have a multitude of more specific scenarios:

  1. Semi-supervised learning uses both labeled and unlabeled data, essentially combining unsupervised and supervised methods.
  2. Self-supervised learning frames unlabeled data as a supervised problem by extracting โ€œpretextโ€ tasks from the existing data (for example, in-painting).
  3. โœ‹ Active Learning extends semi-supervised learning by allowing the model to select which data samples to label next.
  4. ๐Ÿ“– Representation Learning tackles learning latent embeddings of the dataโ€”summaries of the data with semantically-meaningful information.
  5. ๐ŸŽจ Generative Modeling learns generative models that can โ€œsampleโ€ from the data distribution, creating synthetic data that resembles its input.

Models

Theoretically, models can be thought of as probability distributions. They can be either generative or discriminative.

  1. Generative models or capture the entire shape of the data distribution and predicts via probabilistic inference.
  2. Discriminative Models directly draw boundaries in the data space.

With simple data, ๐Ÿญ Linear Factor Models can capture basic distributions. However, as our data becomes more complex, we canโ€™t directly model the joint distributions and must incorporate assumptions via ๐Ÿชฉ Probabilistic Graphical Models.

Either way, models are usually designed for one problem type:

Supervised LearningUnsupervised LearningReinforcement Learning
๐Ÿฆ Linear Regression๐ŸŽ’ K-Means ClusteringโŒ›๏ธ Temporal Difference Learning
๐Ÿ’ญ Decision Tree๐Ÿ“„ Latent Dirichlet Allocation๐Ÿš“ Policy Gradient
๐Ÿ‘“ Perceptron๐Ÿ—œ๏ธ Principle Component Analysis๐Ÿงจ Dynamic Programming
๐Ÿ”ฅ Adaboost๐Ÿ“ผ Gaussian Mixture Model๐Ÿช™ Monte Carlo Control

Priors

Classical machine learning models generally have statistical and mathematical roots and are largely reliant on the smoothness priorโ€”that the function we learn should be smooth within a small region,

In more complex problems, classical methods run into problems. As the dimensionality of the problem space increases, we run into the โ˜ ๏ธ Curse of Dimensionality. The smoothness prior is not sufficient to generalize higher dimensional problem spaces as there arenโ€™t enough samples to cover the space. Rather, we have the ๐Ÿช Manifold Hypothesis, which states that samples lie on a low-dimensional subspace of the full problem space.

To address this challenge, ๐Ÿง  Deep Learning uses biologically-inspired neural network architectures that are magnitudes more complex than classical methods. This introduces stronger priors and allows us to model more flexible functions that can capture higher-dimensional inputs.

Optimization

Training machine learning models involve optimizing their parameters, also called weights.

  1. โ›ฐ๏ธ Gradient Descent gradually moves down a convex loss function. ๐ŸŒฑ Natural Gradient is a variant that moves in probability space rather than parameter space.
  2. ๐Ÿ”Ž Greedy Search performs feature selection for non-convex loss.
  3. ๐ŸŽ‰ Expectation Maximization optimizes hidden variables in unsupervised models.

More generally, mathematical techniques like ๐Ÿ‘Ÿ Unconstrained Optimization and ๐Ÿ‘  Constrained Optimization are used to optimize functions that satisfy certain conditions.

In practice, ๐Ÿ‘” Overfitting is a common problem where models learn noise in the training data thatโ€™s not part of the real world. The solution is two-fold.

  1. โšฝ๏ธ Regularization Penalties in loss functions apply weight shrinkage or selection.
  2. โœ… Validation methods find hyperparameters that optimize model complexity.

๐Ÿ‘€ AutoML is a modern solution that automates both processes by automatically building a ensemble that maximizes performance for a given problem.

Operations

Finally, there are some more real-world practices to keep in mind.

  1. โ“ Imputation is required to address missing data.
  2. ๐ŸŽน Classification Metrics is necessary to measure a modelโ€™s performance.
  3. ๐ŸŽ™๏ธ Explainability analyses the patterns our model learned and what the model actually tells us about the world.