Difference Between Logistic Regression and MLR

In preparation for a Logistic Regression model, our binary categories must first be converted to 1s and 0s (e.g. Yes = 1, No = 0). Once the data is converted to numbers you may ask why you can’t then use MLR. This is a great question! MLR is a linear method that works on continuous numeric response variables. To demonstrate the issue, consider the following example.

Let’s say we are trying to predict whether a student will pass or fail an exam. We convert our classes into numeric values such that a 0 means the student fails the exam, and a 1 means the student passes the exam. Assume the failing or passing of the exam depends on study hours. In other words, the number of hours studied is our predictor variable, and the outcome of the test is our response variable.

Our dataset contains information about 14 students (see below).

Figure 10.2: Study Hours

If we were to use MLR for this classification, the resulting equation would be: Outcome Y = 0.03 + 0.026 * Study Hours. These results suggest that, on average, an increase in 1 hour of studying increases the probability of passing the exam by 2.6%. So what would the model predict for a person studied 100 hours for the exam? A student would achieve a 262.2% probability of passing the exam, which is impossible since probability cannot be greater than 1. 

This example, demonstrates that MLR, which applies a linear model to the situation, when it is inappropriate to do so, creates nonsensible predictions. Logistic regression adapts the MLR approach so it can work well to calculate probabilities of categorical response variables.