Theory

Certain features in are indicative of what can be; given training data , we can divide up all into groups based on their features such that each group has different .

Decision trees use this idea to recursively divide up the data in , building a binary tree where each node is a “yes-no” question, and depending on the answer, we go to a different subtree; each leaf gives a prediction for the value of . This structure is analogous to a 🌳 Binary Search Tree but with questions at the nodes rather than value comparisons.

To maximize the effectiveness of our questions, we want to split up the data that gets to a node using a question about a feature that most determines . To do this, we maximize 💰 Information Gain.

Model

Our model is a binary tree: each internal node is a question, each edge is an answer, and leaves are predictions for .

For discrete features, we can directly check the value in . For continuous features, we must impose a threshold on the value in to get a binary answer; in other words, replace feature with .

Note

Note that with this threshold method, binary trees are scale invariant. If the scale of a feature increases, the best threshold will increase in scale equivalently. Therefore, the new decision tree after rescaling a feature will be the same.

Training

Given training data ,

If all current records have the same label , return a leaf with label .
If all current records have the same inputs , return a leaf with majority decision .
Otherwise, split on the feature that has most information gain and recurse on both sides.

Prediction

Given input , walk down the decision tree according to the questions at each node and return the associated at the leaf we reach.

Explorer

💭 Decision Tree

Theory

Model

Training

Prediction

Table of Contents

Backlinks

Graph View

Explorer

💭 Decision Tree

Theory §

Model §

Training §

Prediction §

Table of Contents

Backlinks

Graph View

Theory

Model

Training

Prediction