Mathematics is the core of many algorithms and intuitive ideas. The following are the main topics used in computer science with a slight bias toward machine learning fundamentals.

Linear Algebra

Linear algebra is the study of vectors and matrices and their manipulations.

  1. ๐Ÿน Vectors are objects that are closed under clearly defined addition and scalar multiplication operations.
  2. A ๐Ÿฑ Matrix is a two-dimensional tuple with entries; important properties of a matrix include the ๐Ÿ“– Determinant and ๐Ÿ–Š๏ธ Trace as well as its ๐Ÿ’ Eigenvalues.

Using vectors and matrices, we can solve โš™๏ธ System of Linear Equations and model ๐Ÿ—บ๏ธ Linear Mappings.

Sometimes, a problem requires a matrix to be factorized.

  1. ๐Ÿฅง Cholesky Decomposition converts a PSD matrix into two triangular matrices.
  2. ๐Ÿชท Eigendecomposition shows that a non-defective square matrix is similar to a diagonal matrix.
  3. ๐Ÿ“Ž Singular Value Decomposition decomposes a matrix into the product of two orthonormal matrices and one diagonal matrix with singular values.

Geometry

Geometry is closely tied with vectors and matrices from linear algebra, and we can use it to interpret them from a new point of view.

  1. ๐ŸŽณ Inner Products map pairs of vectors to a number, and ๐Ÿ“Œ Norms represent the size of vectors and matrices.
  2. With them, we can compute ๐Ÿš— Distances, ๐Ÿ“ Angles between vectors, and ๐Ÿ“ฝ๏ธ Projections of vectors onto subspaces.
  3. ๐Ÿชฉ Rotations can also be defined by matrices as linear mappings.

Calculus

Calculus describes the shape of functions in detail and is crucial for optimization and approximations.

  1. ๐Ÿง Derivatives find the tangent slope in univariate functions, and โ„๏ธ Gradients generalize them to multivariate functions.
  2. The ๐ŸŽค Taylor Series is an important method for approximating any differentiable function as a polynomial.

Optimization

Using calculus, we can solve general minimization problems.

  1. ๐Ÿ‘Ÿ Unconstrained Optimization uses gradient descent to walk down convex objectives.
  2. ๐Ÿ‘  Constrained Optimization applies the augmented Lagrangian to represent a problem in dual form.

Probability Theory

Probability theory measures the likelihood of events and outcomes.

๐Ÿช Random Variables measure some quantitative value over ๐ŸŽฒ Probability Distributions.

  1. ๐Ÿ‡บ๐Ÿ‡ธ Independence between two variables is an important property that leads to further conclusions.
  2. ๐Ÿ“š Summary Statistics explains important properties of a random variableโ€™s distribution.

Among all classes of probability distributions, the ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Exponential Family is commonly used due to its simplicity and computational properties. The most common member is the ๐Ÿ‘‘ Gaussian.

Generalizing to any distribution, there are several crucial theorems and conclusions.

  1. ๐Ÿช™ Bayesโ€™ Theorem relates the posterior, prior, likelihood, and evidence.
  2. ๐ŸŒˆ Jensenโ€™s Inequality measures the effect of applying a convex function to a random variable.
  3. ๐Ÿงฌ Evidence Lower Bound provides a tractable lower bound on an intractable likelihood.

Lastly, information theory is a subfield that analyzes the entropy of distributions.

  1. ๐Ÿ”ฅ Entropy measures the level of uncertainty in a distribution.
  2. ๐Ÿ’ง Cross Entropy generalizes entropy to compute across two distributions.
  3. ๐Ÿ’ฐ Information Gain measures the change in entropy after gaining new โ€œinformation.โ€
  4. ๐Ÿค Mutual Information measures the degree of shared โ€œinformationโ€ between two variables.
  5. โœ‚๏ธ KL Divergence and, more generally, ๐Ÿชญ F-Divergence measure the difference between two distributions.