Theory
Random forests create multiple slightly-inaccurate models so that together, their inaccuracies cancel out and weโre left with a good prediction. Therefore, each tree in our forest is limited by the data it accesses and features it splits on.
Model
Our forest consists of
We maintain the hyperparameters
Training
Given training data
- Choose a fraction
of the data, data points total (with replacement). - Build a decision tree as normal, but at every node, we only have
features to choose from (among the available).
Prediction
Given input