8.1 Introduction to Time Series Forecasting
Time Series is the term used to describe data collected over time, and how the date/time is related to the forecasting future values.
In its most simple form a time series dataset includes only a column for date and a column for the outcome. For example, it might the number of products sold each day over some span of time.
It is also possible to include other explanatory (x) variables that together with time (X) help predict the outcome (y) variable. For example, when predicting the traffic on a given day, in addition to the day, such as weekday, weekend, and holidays, weather may play a role in predicting traffic. For example, snow may reduce traffic and good weather may increase traffic independent of the day of the week.
Time series forecasting is an important area of machine learning because there are many prediction problems that involve a time component. Time does matter for many types of problems. Companies sell more products around Christmas time. Electrical utilities produce more power in the hot months when people use a lot of energy for air conditioning. People consume more natural gas during the cold months because it is used for heating. Most firms hire more college graduates to begin new jobs in summer. There are many other problems where time plays an important role in creating accurate predictions. These problems are sometimes neglected because the time component can make time series problems more difficult to handle. And yet having accurate predictions is important when planning such future activities.
Time plays a lesser role in normal, non-time-series machine learning. A model based on past data may be used to predict the value of future observations. However, all items that occurred in the past are assumed to not be differentiated by sequencial dates. Likewise, the value of input and output values that occured in the future are not differentiated by sequencial dates. In effect, time is ignored as it relates to the income and outcome variables in the past and for the predicted future values.
A time series dataset is different. The observations are recorded in a sequence based on date or time because time is assumed to be related to the outcome value. Time series adds an explicit order dependence between observations: a time dimension. This additional dimension is both a constraint and a structure that provides a source of additional useful information.