Performing Time Series Forecasting with MLR

Performing Time Series Forecasting with MLR

Time Series Forecasting can be performed with many different methods and models, however, we will mainly focus on how to do predictive forecasting using Multiple Linear Regression from chapter 6. We will briefly explain simple forecasting methods such as the Average, Naive, and Seasonal Naive. Then, we will walk through steps of how to do it with MLR, which is more powerful than simple methods. In addition, ANOVA models (Autoregressive Integrated Moving Average) are widely used, however, this content will not be covered in this course.

Simple Forecasting Methods

Average Method

The Average Method is simply forecasting your future data to be equal to the mean of all your observations.This method is more likely to be inaccurate provide hardly any value to any company or firm that is trying to make future time series predictions.

Naive Method

The Naive method is forecasting your future data to be equal to the most recent observation. This method tends to perform better than the average method, however, it fails to account for seasonality and trend. The Naive method, however, can be used as a baseline to determine if we are making better predictions with our other predictive models.

Seasonal Naive Method

The Seasonal Naive method is forecasting your future data to be equal to the most recent seasonal observation. For example, if we are trying to predict the average sales for each month in the upcoming year, we will predict each month to be equal to the average sales of the most recent previous year. If the average of this November’s sales was $100, we will predict next year of November sales to be $100. If the average of December’s sales was $300, then the predicted next year sales of December will be $300 and so forth.

The Seasonal Naive method tends to be better than the Naive method because it takes into account seasonality which allows us to have improved accuracy. However, the Seasonal Naive method still fails to account for increasing and decreasing trends over time. The main benefit of the Seasonal Naive method is to be used as a baseline and if no other predictive modeling tools are available.

Time Series Forecasting with MLR

The end of this section contains a detailed video that shows how to use JMP to forecast time series data with Multiple Linear Regression. Below are the following steps that should be performed to create your forecasted data. We will use the DeptStore data set, as an example as we go through this process.

  1. Prepare the data
    1. Create records as inputs for the forecasted records

    2. Create dummy variables

    3. Create a linear time series

  2. Partition the time series data into training and validation sets

  3. Discover which combination of input variables creates the best model

  4. Re-estimate the model coefficients with training data and validation data together

  5. Forecast future values

Below we will explain why each step should be performed while using the DeptStore dataset as an example.

Step 1: Prepare the Data

The image below shows the DeptStore data after it has been prepared. The [Year] and [Quarter] columns were used to create the linear time indexes and the QtrText dummy variables. Therefore, we will not use [Year] and [Quarter] as input variables in our regression.

Figure 8.8: Prepared DeptStore data
Step 1.1: Create placeholder records for the periods to be forecasted

Create records for the future periods to be forecasted. These will not have values for the outcome variable yet. These must have the dates or corresponding values that will be used to create time-series and dummy variable values. Eventually these will act as inputs to the model that will be used to produce the forecasted values. In the DeptStore example, the records 25 through 28 are the placeholder records. These records have values assigned to all variables except for the outcome variables, since those are the values that will eventually be forecasted. It also does not have values for the Partition column, as it is not necessary to have partition column values for these records. Also, these future records have been temporarily marked as Hide and Exclude so JMP will ignore these records until we are ready to create the forecasted values.

Step 1.2: Create dummy variables

When performing time series forecasting we must first identify which variables are are nominal/categorical. These will be used as dummy variables. A dummy variable can also be known as a boolean or binary variable because it takes on the value of 0 or 1 when performing mathematical computations on the data to execute our models or Multiple Linear Regression for example.

Notice in the prepared data that the [QtrText] has been created as a dummy variable by appending the characters "Qtr" to the quarter numbers. When working with times series, common examples of dummy variables are quarters in a year, months in a year, days of the month, days of the week, hours in a day and so forth. Why should these values be treated as dummy variables instead of traditional continuous numbers? The answer stems from the assumptions underlying continuous and categorical variables. Numeric continuous variables are typically used to count or measure. For example, we record counts of product sold and hours worked. The more the count the more units sold or worked. In addition to counts, continuous numbers are often used as measures along a scale. For example, a higher number for temperature signifies a hgher measure of heat.

Common intervals of time, like days of the week or months of a year, however, have more in common with categorical variables than continous numeric variables. These are names of common intervals of time that repeat themselves. We may say that a Sunday is the first day of the week and Monday is the second day. but Monday is not more of a day than Sunday. In reality, the number of the day is just used as a name and to provide a common order. The same can be said about months and quarters. We know that twelve months in a calendar year can be named 1 through 12 to show the order. But months with higher numbers are not "more" of a month than earlier months. January is just as much a month as February notwithstanding one comes before the other. Quarters are commonly named 1, 2, 3, or 4. In the next year, quarters are not named 5, 6, 7, ands 8. The numbers 1 through 4 are just recycled. So in reality these numbers are names not counts. As such they should treated as categorical, nominal variables.

We have to do something so that the data mining software will treat these values as categorical variables instead of continuous numeric variables. Why? Most statistical and data mining software packages automatically assume that numbers should be treated as continuous variables that should reflect counts and measures. Therefore, these numbers are typically used to calculate counts, averages, and trends.

Since this is not appropriate when numbers are used as names for categories such as day of the week, month in a year, and quarters of a year, the analyst must do something to let the software know it should treat these numbers as categories. One common approach is to append a textual value to the numbers. For example, quarters 1 through 4 can be converted to Qtr1, Qtr2, Qtr3, or Qtr4. That way they will be treated by the software as nominal categorical variables instead of continous numeric variables.

Some data mining software, including JMP, will let you designate numbers as nominal variables. This is easily changed in JMP by right clicking the column and selecting “Column Info…” Then select the dropdown menu for the “Modeling Type”. You then select Nominal.

Figure 8.9: Convert from continuous to nominal data type

There are pros and cons to appending text to the numbers versus designating numbers as a nominal datatype in the data mining software. If you append text, you have to go to the effort to append the text to the number. However, it will not be confused by the software or by you or someone else in the future as a number. This is the approach that I prefer. On the other hand, if you just designate it as a nominal variable in the software, it is easy and convenient in the moment. But a less knowleagable person may later change it back to a continous data type and subsequently perform innappropriate operations on the data.

Step 1.3: Create a linear time series

In MLR, time as a sequence is treated as a numerical variable so that MLR can measure the association between units of time as a sequence and the outcome variable. Therefore, some aspect of time must be treated as a linear sequence. Making our data linear in time means converting some aspect of our dates, months, years, days, days of the week, etc., into a continous set of numbers that is spaced equally between each period.

We do not want more than one linear time variable per model. For example, in the DeptStore dataset, we have the t_qtr column that will be used as the linear time index. We will not include Year and Quarter. Why? Because the time index was derived from Year and Quarter. To include them would introduce redundant collinear terms into the model, which would just confuse the algorithm.

A linear time index can be included along with dummy variables that represent common categories of time. For example, assume we have two years of daily data. The daily dates a could be converted into a linear time index as sequence from 1 through (2 x 365 = 730). At the same time, if inspection of the data suggest that some days each week are systemmatically higher or lower than other days, we would create dummy variables for days of the week. If inspection shows that the same month each year has consistently higher or lower values of the outcome variable than other months, it would be appropriate to create dummy variables for the 12 months within a year. In this way, we can measure the effect of the passage of linear time and also measure the separate effects of days of the week and months of the year.

In the DeptStore example, data was collected for each quarter for six years. These were converted into a linear time series starting from the number 1 through 24.

Step 2: Partition the time series data

In order to complete the forecasting process, four segments of data are necessary. Each play different roles in the overall forecasting process. These four segments are shown in the image below:

Figure 8.10: Data segments used in forecasting

To train and evaluate models, we must have a training partition and the validation partition both of which have values for the input variables AND the outcome variables.

Segment A: Initial Training partition: The term Initial is used in this name because the initial training partition is used for training during Step 3. But in step 4 it is combined with segment B to do the final training of the model coefficients.

Segment B: Validation partition. This partition is used in concert with the initial training partition during Step 3 so that the models can be evaluated for predictive quality.

So you train with the initial training partition and validate with the validation partition to evaluate each combination of input variables that you test. In this way, you determine which combination of input variables produces the best predictions.

Segment C: This segment consists of both segment A and segment B. It is the combination of the initial training data and the validation data. Why? In step 3 we figured out which combination of input variables makes the best model, but we did it using only the initial training data. We did not get the benefit of letting the model learn from all of the existing data, the training partition plus the validation partition. The validation data contains valuable information that we want to include to make our forecasting model as accurate as possible. It is important because it is the latest known data. It is more likely to be like the future dates that we forecast for than the older data. So we combine the validation data with the initial training data. Then we re-estimate the model using the same input variables that we found in Step 3. In this way, the coefficients of the input variables get updated to make the forecast as accurate as possible. This usually does not radically change the coefficients but it does help tune them.

Segment D: To-Forecast section. As described above, we have a group of records that have the input values filled in. The values of the outcome variable are empty waiting for use to complete the forecast, which we will do as the last step in the process.

Step 3: Discover which variables create the best model

The objective of this step is to determine which input variables should be included in the model. That is, what combination on input variables produces the best predictive performance.

Which input variables do you try? One good way to do this is to include all dummy variables and the linear time input variable. If there appears to be a curvilinear shape to the trend, include the squared value of the linear trend as well. If it is not obvious whether there is a curvilinear shape in the trend, include the X2 term and see if it is stastically significant. If input variables are not significiant, remove them and see if predictive power improves, stays the same, or goes down. Usually, predicive power will stay the same or improve. If this happens, leave out the insiginant input variable. Occaissionaly, the predictive power goes down. If this happens, add the insignificant input variable back to the model.

When a categorical variable is translated into a set of dummy variables sometimes some of the dummy variables stemming from the same categorical variable will be significant and others will not be. This is pretty common.

For example assume dummy variables are used to measure the effect of the seven days of the week. Assume all of these dummy variables have significant coefficients except two. What should you do? If the majority of dummy variables in a group like day of the week are significant, keep all the dummy variables in the group.

How do you know which model produces the best performance? The model that results in the highest validation R-squared and lowest validation RASE is the best performer. RASE (acronym for Root Absolute Squared Error) is the name that JMP uses for RMSE when RMSE has not been adjusted for degrees of freedom. When JMP uses the term RMSE, it is always the RMSE which has been inflated because of adjustments for degrees of freedom. We do not adjust for degrees of freedom when data mining. So when JMP provides RMSE and RASE, use RASE. Using RASE allows you to compare RMSE across competing models. RMSE that has been inflated for the degrees of freedom does not give you a comparable number that you can use to compare multiple models.

After validating the model to see if there is improvement we are now ready for Step 4.

Step 4: Update the model coefficients using segment C.

Now that we know which combination of input variables produced the best model, we update the coefficients in the model by retraining the model using segment C, which has all of the data that has known values of the income and outcome variables. We combine segments A and B into segment C. The reason we do this is because we want to get the benefit of training from all of the data we have, so that we can get the best predictive performance.

Step 5: Forecast future values

The last step is to forecast our future data points using the best model. This is done in JMP by going to the red dropdown arrow and selecting “Save Columns” -> “Predicted Values”. This saves all of the predicted values to you data table in JMP. This is shown in the video below.

Also, if you wish to see the MLR model or equation you can view this by clicking the red dropdown arrow and selecting “Estimates” -> “Show Predicted Expression”. This will return something like what is shown in the figure below.

Figure 8.11: Prediction expression