13.3 Evaluating Model Predictors
Likelihood of Donating Example
Download the file from the link below to follow along with the text example or video and to practice on your own.
In this section we will look at a more complex model. Earlier it was noted that the mx term could be m1x1 + m2x2 + m3x3 . . . For this example, we use a dataset comprising 200 records each with one target (Donate) and three feature variables (Income, Age, and Membership). Donate is the dependent variable (yes or no), and Income, Age, and Membership are the independent variables.
One thing we note about this example is that Income and Age are continuous variables, as we used in the previous example. However, Membership is a discrete or categorical variable with values of "member" or "not member," which will be coded as 1's and 2's. We will discover that logistic regression handles both types of variables.
Figure 13.14 illustrates the data that we start, the JADE Panel parameters, and the first step of splitting the data completed. The data range is A1:D201, and we will utilize 80% of the rows to train the model.
The next step after splitting the data is to run the model. Notice that the Outcome, i.e., the dependent variable is Column A, A1:A201. The Predictor columns, i.e., the independent variables, are B1:D201.
Figure 13.15 contains the results of running the model. First, notice the JADE Panel. Under the Configure Model section, the three independent variables are listed with checkmarks indicating that they were used in this execution of the model.
Let's look at the results. The Overall Model Fit box indicates that the Chi Square for this set of data is 58.983. There are three degrees of freedom, and the p-value is showing as 0.0000. In actuality, the p-value is calculated as 5.682e-13, which is extremely small. Hence we assume that this model should predict data points quite well. However, before we make a final decision, we will observe the Confusion Matrix results.
The Confusion Matrix uses the 20% of the initial data to test the model. In this instance, it appears that the data used to verify the model only fits the model somewhat well. The top row of data contains all those cases for no donation. There are 24 (21 + 3) zero values. The bottom row lists those cases for a yes donation. There are 16 (6 + 10) cases with a value of one.
As explained before, from the 40 verification cases, running the full model gives 6 type I errors (false positive) and 3 type II errors (false negative). It shows that for those cases where the donor did NOT donate (i.e., a zero value), 21 cases were predicted successfully, but 3 cases were predicted erroneously. Likewise, for those cases where the donor did donate, it predicted 10 successfully and 6 erroneously. Hence, we would consider that this model only fits the data somewhat well, giving successful results for 31 out of 40 test cases. The overall accuracy is 77.5%, which is measured as the number of correct values divided by the total number of data points.
Finally, we observe the values for the coefficients. The model yields an intercept value of -9.247 with a standard error of 2.965 and a p-value of 0.00182. The p-value is well below a standard cutoff value of 0.01 and hence is significant. The Income coefficient is 0.077 with a standard error of 0.013 and a p-value smaller than 0.00000. It also is significant. The Age coefficient is slightly negative at -0.177 with a standard error of 0.044, but with a p-value of 0.69188. This is a very high p-value and is not significant. The Member coefficient is 1.003 with a standard error of 0.473 and a p-value of 0.003406, which is below the cutoff value of 0.05, but higher than the 0.01 cutoff, so it is still significant but only at the 5% level.
We can still use this model to predict donation likelihoods for different data, although given the results, we may not consider this model the best that can be done. Figure 13.16 shows the prediction of 10 rows of data. As before, the probability or likelihood of donating as well as a predicted value is given. Column E shows the probabilities, and Column F shows the prediction based on probabilities of greater than 50% or less than 50%.
Model Refinement
We now have to ask the question: “Is this solution a good model?” or “Can the model be improved?”
Logistic regression is not like linear regression. Linear regression techniques use statistical tests such as Least Squares evaluation with R-squared to evaluate the degree of fit of the model with the data. Even though there are various types of measures to evaluate the goodness of fit, none of these measures are as effective as those for linear regression. In fact, many experts consider the evaluation based on the training and verification with the Confusion Matrix to be the best way to find the best model that fits the data.
Let’s identify an approach to evaluate and compare models.
-
Run the full model using all known independent variables.
-
Eliminate the variable with the highest p-value—especially if it is not significant. Rerun.
-
Continue eliminating any variables with high p-values.
-
Choose the model with the best results (least type I and type II errors), but that has the fewest independent variables.
Figure 13.15, shown previously, shows the results with all independent variables. As noted, the Age variable has a high p-value, so we next run the model without the Age variable. To do this, you simply uncheck the Age variable on the Run Model section of the JADE panel. You should not rerun the Split Data operation because that will cause the models to have a distinct set of data points and will thus not be comparable.
The results shown in Figure 13.17 are similar to the model using all variables. The Confusion Matrix has the same values. The intercept and coefficients for Income and Member are very similar to the full model values. Hence, we deduce that the Age variable does not add any value to the model. We would choose this version of the model over the one with Age included. However, we note that because this version is not any better than the full model, the Age variable did not detract from the model and cause worse results with its inclusion.
It is interesting, however, that the p-value for the Member coefficient is a little higher than would be expected. It is still significant at the 0.05 level, but not at the 0.01 level. Let’s rerun the model without the Member coefficient. This leaves only the Income variable.
Looking at the Confusion Matrix in Figure 13.18, we observe that this model is slightly worse than the complete model or the model with Income and Member. Two cases of a positive donation, which were previously correctly identified, are mislabeled in this version and labeled as no-donation. The type II errors increase by 2. However, the p-value for the overall model fit and for the individual coefficients are very small, indicating a fairly good model. The accuracy also drops down to 75%.
At this point, we can conclude that the best model for this data uses the Income and Member variables as the predictors for the outcome. This is a small model with few independent variables, so it is also feasible to verify that this is the best combination by checking other combinations of predictor variables. We will execute two more versions of the model. The first will be with Income and Age together, and finally we will check the model goodness using Age and Member without the Income variable.
The results for Income and Age variables as shown in Figure 13.19 are a little different than the model previously chosen. The type II errors are increased by one, but the type I errors are reduced by one. The sum of type I and type II errors is the same, however. It is interesting to note that type I errors are usually considered more dangerous than type II errors. We also note that the p-value for the Age coefficient is 0.63, which indicates that it is not significant.
These results, illustrated in Figure 13.20, are the worst of all versions. None of the actual cases of a positive donation were identified properly. We also note that the p-value for the Overall Model Fit is .25 and the p-values for all the variables are high, and the accuracy dropped considerably to 60%. It should be noted that with random choices between yes and no, that there is a 50% chance of guessing the right value. Thus 60% is not much better than random choices. This version of the model provides almost no benefit.
What can we conclude from this example with the various versions? First, we note that the data provided does not yield a perfect prediction or forecasting tool. None of the Confusion Matrices yield perfect results. It also appears that the best model is the one using the Income and Member variables. This model not only yields the best results (tied with all variables) but also has the best p-values on the coefficients.
During the process of creating these examples, the authors executed the JADE tool several times with different “splits” on the data and with different percentage splits. These other executions yielded a slightly different set of values for the intercept and coefficients. The overall model goodness is the same, but the results in the Confusion Matrix change. We suggest that you download from the Student Materials this dataset and test out various combinations. One other condition that would be interesting for you to test is to change the Training Set from 80% to 90% and see if the results yield a better fit of the data. However, be sure that you have the correct values when you try the Practice Problems and Test Your Skills exercise.