Evaluating Decision Trees

Confusion Matrix

Recall from the mushroom example used in this chapter, the classification tree’s primary objective is to classify if the mushrooms are either ‘Edible’ or a ‘Poison’. Figure 5.24 below displays a confusion matrix which was created using Microsoft Azure. A Confusion Matrix is a table that is used to explain the performance of the classification model.

Decision Tree Evaluation Metrics
Figure 5.24: Decision Tree Evaluation Metrics

Let’s define some of the terms in the figure above:

  • True Positives (TP): These are cases in which the model predicted the mushrooms are poisonous, and they are poisonous.

  • True Negatives (TN): The model predicted edible, and they are edible.

  • False Positives (FP): The model predicted poisionous, but they are edible. A false positive is also known as a "Type I error."

  • False Negatives (FN): The model predicted edible, but they were actually poisonous. (Also known as a "Type II error.")

An important thing to consider from your classification tree models is the costs of prediction errors. False positive and false negatives are different. Usually one is more expensive than the other. Another matter to consider is the cutoff threshold. If the cutoff is increased in the mushroom example, the model is confident selected mushrooms are edible. The model throws away more good mushrooms, and therefore the false negatives increases. If the cutoff is decreased, the model accepts more mushrooms as safe, and therefore the false positives increases. Hence, moving the cutoff changes the ratios of the confusion matrix.

Interpreting Decision Trees in Azure ML Studio

This video explains how to interpret decision tree in Azure ML Studio.