Supervised learning learns an accurate model that maps input to output label . To do this, we search over a hypothesis class to minimize the loss on the training set,

Note

We can use any function, simple or complex, to fit the data. However, the ๐Ÿฅช No Free Lunch Theorem states that thereโ€™s no single algorithm that performs best on all data. All models require built-in biases, and itโ€™s our job to choose a class that works for our data.

Moreover, we want the model to generalize well. Mathematically, our goal is to minimize risk

where is the distribution of our data.

Non-Parametric Models

Non-parametric models donโ€™t optimize any weights.

  1. ๐Ÿ  K-Nearest Neighbors compares an input with the known data.
  2. ๐Ÿ’ญ Decision Trees split up the data on features that give the most information about the label.
  3. ๐Ÿฏ Kernel Regression is a soft form of KNN that considers example similarity.

An ๐ŸŽป Ensemble combines non-parametric models to generate more accurate predictions.

Parametric Models

Parametric models, on the other hand, have a set of weights to optimize. Many optimize the their probability likelihood using ๐Ÿช™ Maximum Likelihood Estimate or ๐ŸŽฒ Maximum A Posteriori, while others use more model-specific methods.

While thereโ€™s a huge diversity of solutions, parametric models can be categorized into the type of problem they solve: regression or classification.

Regression

Regression problems deal with predicting continuous .

  1. ๐Ÿฆ Linear Regression fits a line to linear data.
  2. ๐Ÿฅข Generalized Linear Model adapts this idea to work for non-linear data.
  3. ๐Ÿ”จ Principle Component Regression combines dimensionality reduction with linear regression.

Classification

Classification problems, on the other hand, work with discrete , or classes.

  1. ๐Ÿฆ  Logistic Regression is an extension of linear regression that outputs class probabilities.
  2. ๐Ÿ›ฉ๏ธ Support Vector Machines divide the data into two classes with a hyperplane.
  3. ๐Ÿ‘ถ Naive Bayes classifies extremely large features sets by assuming that features are conditionally independent given the label.

Online Learning

There are some online variants of these algorithms that train via streaming, going through each datapoint one-by-one and updating the weights.

  1. ๐Ÿ—ผ Least Mean Squares is an online version of linear regression.
  2. ๐Ÿ‘“ Perceptrons perform online updates for SVMs.

If thereโ€™s not enough labeled data, โœ‹ Active Learning strategies identify the most useful datapoints to label next.