Theory
Certain features in
Decision trees use this idea to recursively divide up the data in
To maximize the effectiveness of our questions, we want to split up the data that gets to a node using a question about a feature that most determines
Model
Our model is a binary tree: each internal node is a question, each edge is an answer, and leaves are predictions for
For discrete features, we can directly check the value in
Note
Note that with this threshold method, binary trees are scale invariant. If the scale of a feature increases, the best threshold will increase in scale equivalently. Therefore, the new decision tree after rescaling a feature will be the same.
Training
Given training data
- If all current records have the same label
, return a leaf with label . - If all current records have the same inputs
, return a leaf with majority decision . - Otherwise, split
on the feature that has most information gain and recurse on both sides.
Prediction
Given input