Theory
Gradient tree boosting builds an ensemble of trees (instead of stumps as in ๐ฅ Adaboost) of a specified depth, each with a different scaling coefficient and trained on a fraction of the dataset.
The general idea is very similar to a ๐ฒ Random Forest, but boosting introduces two different ideas.
- Fit each new tree on the residual instead of the original data.
- Regress to find the scaling coefficient and add it to the model scaled by learning rate.
We can train the algorithm on any loss function; the residual we fit on is the gradient of this loss. For simplicity, weโll use
Model
Our model consists of trees
where
The
Training
Given training data
Then, for
- Pick a subset of the data of size
. - Calculate the residual of this subset
. - Fit a decision tree
on the residual. - Regress
for . - Update our model
using the learning rate.
Prediction
Similar to Adaboost, given input