Theory
Adaboost is a boosting method that focuses new learners on past errors by weighing each datapoint
Each learner is a stump: a decision tree with only one split. This gives the simplest possible split of the data. The learner is also weighed by
The entire model is defined as
At every iteration, after we train a new learner, Adaboost redistributes the weights for each datapoint depending on whether
Note that
Next to scale the learner,
While this equation for
This definition can then be used to define
Observing that
The update step for
Now we can see that
Lastly, note that there was no explicit loss function to optimize. However, the algorithm implicitly minimizes an exponential loss function, and it learns exponentially fast.
Note
Since it doesnโt directly optimize from training error, it can also cause test error to decrease even with no training error; in a sense, it stretches the margin of classification, making the model more confident in each prediction.
Model
The model consists of many weak learners (stumps)
The entire ensemble is given by
Hyperparameters
is ensemble size, equal to number of training iterations. Higher gives a better ensemble, though it takes longer to train.
Training
Given binary classification training data
For
- Train a weak classifier
on distribution defined by weights - Evaluate
to calculate . - Let scaling factor
. - Update the distribution
This gives greater weight to training examples that we got wrong.
Prediction
Given input