Theory

Random forests create multiple slightly-inaccurate models so that together, their inaccuracies cancel out and weโ€™re left with a good prediction. Therefore, each tree in our forest is limited by the data it accesses and features it splits on.

Model

Our forest consists of decision trees, each with unique splits.

We maintain the hyperparameters , the fraction of the training data to use for each tree, and , the number of features a tree node can choose to split on, which is commonly set to .

Training

Given training data , repeat the following times to build trees.

  1. Choose a fraction of the data, data points total (with replacement).
  2. Build a decision tree as normal, but at every node, we only have features to choose from (among the available).

Prediction

Given input , run through every tree in the forest. Return the majority vote for classification or average for regression.