23.3 Random Numbers Normal Distribution
We will next explore normal distributions by using the NORM.INV function. The process for creating this normal distribution is similar to the process for the RANDBETWEEN function (which randomly distributes between those parameters), but we will use the NORM.INV function instead. The normal inverse returns the inverse of the normal cumulative distribution for the specified mean and standard deviation. In other words, it helps us get random numbers according to what a normal distribution would look like with a given mean and standard deviation. Of course, the mean is the middle where the distribution peaks. If it is normally distributed, 68.2% of the population will be within one standard deviation of the mean, 95.4% of the population will be within two standard deviations of the mean, and 99.7% of the population will be within three standard deviations. When we say within the specified number of standard distributions, this includes those both above and below the mean.
It is also important to note that a normal distribution is not the only kind of distribution, but it is very common. Be careful as you do this on your own to ensure that the data you are working with is normally distributed or that you know what to do if it is not. That instruction is more advanced than what is done here.
Now, we will go through the process to recreate this distribution. We will fill cell O2 with “Normal” as a heading. As we did with the RANDBETWEEN function, we will select 1,000 cells and fill them in with the NORM.INV function. We will do this by going to the Name box and typing in the cells we want to populate, such as O3:O1002. Next, we will type our formula =NORM.INV(RAND(),0,10) into the first cell and press Ctrl+Enter to apply that to all of the selected cells. Then, we will select all the cells, including “Normal” as our heading, and insert a table. Then, we will go to Insert again and turn the table into a PivotTable so that we can easily group it into intervals. In this case, we would like to store it in the existing worksheet. After this, we will populate the areas with our fields. Since our data was only one column, the only possible field to manipulate is “Normal,” which was the header. To recreate the specific table and chart that we are copying, “Normal” can be dragged to the rows area and the values area. We want it to say “Count of Normal” rather than “Sum of Normal,” so we can click the little down arrow, select Value Field Settings, and summarize the values by count. Next, we will group these. Since our standard deviation is 10, and 99.7% will be within three standard deviations above and below the mean of 0, we can assume that most of the data except for 0.3% of the extremes will be within a range of -30 to 30. Knowing this, we can start at -40 and end at 40 and group by 5 in our grouping of this chart. As a refresher, you can group a PivotTable by left-clicking on the table, going down to Group, and filling in the start and end dates of 40 and -40 with a “By” value of 5.
Looking at the data, it looks like it will make a normal distribution due to the large concentration of high numbers in the middle with smaller numbers on the edges similar to the one we are recreating. This is an effective way to check that you performed the process properly.
Let’s go ahead and make a PivotChart of this. Click on the PivotTable, go to PivotTable Analyze at the top, and select PivotChart. Our example is a line chart, so we will find a line chart from the selection, which will allow us to see the bell-shaped curve of the data. When the chart pops up in the middle of the screen, we can then clean it up by giving it a title and removing the legend.
We can see here that we are approximating a normal distribution. Of course, if we increase the count of our data from 1,000 to 10,000 or 100,000, we will see a much smoother curve. We can also recalculate and deduce that although it moves a bit with each adjusting recalculation, it still stays approximately normal with each iteration. Finally, we will highlight one other thing. Notice how few data points are outside of three standard deviations. It looks like just three are beyond -30 and two beyond 30, so five are outside, which means that 99.5% are within three standard deviations of the mean.
As you can see, we can easily create random numbers and approximate distributions within Excel. This will serve as the basis for what we do when we run our Monte Carlo simulations.