5.1 Decision Trees
What Is a Decision Tree?
Classification and Regression Trees or CART for short is a term introduced by Leo Breiman to refer to Decision Tree algorithms that can be used for classification or regression predictive modeling problems.
Classically, the trees produced by this algorithm were referred to as "decision trees," but on some platforms, like R, the more modern term CART is used.
The CART algorithm is structured as a sequence of questions. The answer to the first question determines the next question, and so forth. The result of these questions is a tree-like structure where the ends are terminal nodes at which point there are no more questions.
The main elements of CART (and any decision tree algorithm) are:
-
A rule for splitting data records at a node into child nodes based on the value of one variable;
-
Stopping rules for deciding when a branch is terminal and should split no more; and
-
Finally, a prediction for the target variable in each terminal node.
For categorical outputs (Classification), the leaves of a tree assign a class label or a probability of a record belonging to a specific class. For numeric outputs (Regression), the leaves of a tree assign an average value or specify a function that can be used to compute a value for examples that reach that node.
Why Use Decision Trees?
Decision trees are useful for their capability to easily create interpretable rules for classifying or predicting the values of new records. Many machine learning algorithms provide results that are difficult or impossible to express in terms of simple rules. So the simplicity and interpretability of decision trees is an important advantage.
Calculating decision trees can be performed quickly based on the number of input variables. By doing so we can quickly create rules that will split the data into subgroups and allow us to understand what is happening.
Decision trees identify the input variables that are most important in terms of helping with predictions. The best splitter variables are reflected at the top or near the top of the tree, so it is easy to see which variables are being used to create predictions. Unimportant input variables are low in the tree or are excluded altogether.
The CART algorithm provides a foundation for important algorithms like boosted decision trees and random forests.
Decision Tree Example
For the remainder of this chapter we will be using “The Audubon Society Field Guide to North American Mushrooms” dataset to understand and learn how to interpret decision trees. Look for the file link below. This data is about classifying whether a mushroom is edible or not based on specific characteristics of the mushroom. In the image below, you will see that a mushroom can be classified as either edible or poisonous based on the following attributes: (1) Gills, (2) Rings, and (3) Color.
The first split occurs on color. If the color of the mushroom is green, chocolate, or white, then the mushroom is likely poisonous. If the color of the mushroom is brown, orange, purple, or yellow, then the mushroom is likely edible.
As you can see, the tree does not just stop there. Take the green, chocolate, and white mushrooms, for example. We can split this subgroup into another set of subgroups and clearly identify mushrooms that are definitely poisonous. This results in a rule of ‘IF mushroom color is green, chocolate, or white AND the mushroom has none, or one rings, THEN it can be classified as poisonous’. Though this classification tree shows only the first few splits, it shows how a tree creates a set of interpretable rules can be used to classify new records.