15.5 Creating Random Forests in JMP
The video below shows how to create Random Forests using JMP. If you would like to follow along, download the Mushrooms.csv and CostSamsSales.csv files below the video.
Random Forest in JMP Video
Bootstrap Forest Menu in JMP
The following describes the necessary parameters to know how to create a Random Forest model that can fit the data well and create good predictions. Figure 17.4 is the menu that appears when you begin creating a Random Forest Model in JMP.
Number of trees in the Forest: The number of trees to create in the forest.
Number of terms sampled per split: The number of random input variables to be considered at each splitting node which is chosen from the "Max Number of Terms" parameter.
Bootstrap sample rate: The proportion of observations to sample (with replacement) for growing each tree (n). A sample rate equal to 1 is equivilant to n = N (with replacement). This usually ends up being about 63% of the entire training data set used for each tree. A new sample is generated for each tree.
Minimum splits per tree: The minimum number of splits for each tree.
Maximum splits per tree: The maximum number of splits for each tree.
Minimum size split: The minimum number of observations used on a canadite split. A split minimum split of 1 helps create 100% purity at the bottom leaf nodes.
Early Stopping: Is checked to perform early stopping. If checked, the process stops creating additional trees if the algorithm does not see improvment in the validation data set. If not checked, the process will create the specified number of trees until until the random forest model is complete.
Multiple fits over number of terms: Creates a bootstrap forest for several values for "number of terms sampled per split"
Max number of terms: The maximum number of terms to be considered for each split in each tree. The number of terms or input variables is randomly choses for each tree.