Introduction

Three fundamental approaches can be taken when creating predictive models for classification or numeric outcomes. Figure 1 reflects these approaches.

  1. Single Algorithm. The first approach is to try only one algorithm. This is simple but does not allow you to see how different algorithms perform on the data. Since different algorithms do better for specific types of problems, you may miss a better algorithm.

  2. Multiple algorithms; then pick the best single algorithm. The second approach is to try multiple algorithms and to pick the single algorithm that produces the best results. This approach is typically better than approach one.

  3. Ensembles. The third approach is to build multiple models, but instead of choosing a single best-performing model, the predictions of all individual models are combined to generate a single overall prediction. The combined models as a whole are referred to as ensembles. Random forests and boosting are examples of prediction ensembles

    • Averaging/Voting methods are a type of ensemble method. First multiple estimators are created independently; Then, their predictions are averaged for numeric predictions or are determined by voting for categorical predictions. Usually, the combined estimator is better than any of the single algorithm-based estimators because the ensemble's variance is reduced. An example of this is to create an ANN model and a KNN model and then make an ensemble. Random forests use an averaging/voting approach.

    • Boosting is also an ensemble method. With boosting, a weak model is created. Then a new simple model is created to fit the residuals of the first model. The residuals of the second model are used as the target variable for the third model, and so forth. This process continues until little progress is made or another stopping rule is invoked. When the estimate is created for the overall ensemble of boosted model, the prediction of each and every model is created and summed to come up with the final prediction. Thus, boosting combines a series of weak models to produce a good estimate.