Overfitting occurs when a model learns training noise along with the actual correlations; in other words, the modelโ€™s performance becomes dependent on the training data, resulting in low training error but high test error.

The degree of overfitting depends on model complexity and regularization. With regularization, we decrease model complexity to reduce overfitting, but this also increases training error. This phenomenon is commonly known as the bias-variance tradeoff.

Bias and Variance

Simple models donโ€™t fit data well but generalized better; complex models can fit data perfectly but generalize terribly. Therefore, simple models have high bias while complex models have high variance.

For estimate of parameter , bias and variance are formally defined as expectations over a distribution of training data . In other words, if we randomly take a subset of all possible observations from this problem as , what errors will the model always make (bias) and what errors will vary (variance)?

Bias-Variance Decomposition

These two terms are related by bias-variance decomposition. For model and average model (over all datasets ), expected squared error can be decomposed into variance, bias squared, and noise.