Populations and Samples

Like any specialized field, statistics has a unique jargon which sometimes makes it seem more mysterious than it is. Statisticians often use words and terms that may be familiar to your art major friends or your English professor but are used to mean something more narrow and specific. In this section, we want to review some basic terminology that is central to principles of business research. First, let's look at the distinction between samples and populations, followed by different methods of sampling.

Populations and Samples—Parameters and Statistics

A population is the entire group that we want to know about. It is the complete collection of people, objects, or data points (cases) that we want to describe or for which we want to draw conclusions. For example, if we want to know the average sales commission earned within a quarter at a new startup company, then the collective quarterly sales commission of all company salespersons would make up the population. In this example, our population is all of the sales commissions; however, it is limited to only those sales commissions earned by salespersons of the specific company of interest, and only those sales commissions during a single specified quarter. We don't care what the sales were last year or what the sales are at any other company when we are looking at this population.

A population does not need to be large to be considered a population. If the company in question has only four salespeople with one sales commission each, then the population is four cases. If, on the other hand, the company in question has 400 salespeople, still with one sales commission each, then the population is 400 cases. Large or small, populations include every person, every data point, every thing we want to know about. This means that a population does not need to be made of or include people. For example, suppose we want to know the average life span (mean time before failure, or MTBF) of a specific model of a microwave manufactured by company X. In this case, the population does not include microwaves of a different model, nor does it include microwaves manufactured by other companies. But the population does include the time before failure of each and every microwave of the specified model manufactured by company X.

Suppose we were able to collect the time before failure of each and every microwave of the specified model manufactured by company X. We could then calculate the average—or mean—time before failure for that model. We would call this MTBF a parameter, because a parameter is a value generated from, or descriptive of, a population. This information is useful for company X as well as for consumers, because it provides a timely incentive to replace old purchases with new models.

However, this example well illustrates the problem of parameters, namely that the likelihood of being able to collect all the necessary data is impractical if not impossible. There may be no cost-efficient or feasible way to track every single microwave all the way through its life span, even when limited to a single model. While in some scenarios a parameter may be calculated and known (such as the four-salesperson population described above), most often the value of a parameter cannot be known due to the inability to measure or examine the entire population. You can’t call everyone who bought the microwave every month to see if their product is still functioning properly. That would be impossible, impractical, and simply too costly—not to mention a tedious job.

If calculating a parameter is impossible, impractical, or simply too costly, what can we do? We turn to statistics. An alternative method of calculating the MTBF for the specified model from company X would be to randomly select a subset, or smaller number, of the microwaves to measure and examine only this select subset's MTBF. This subset from the larger population is known as a sample, and the MTBF that we can calculate from this sample is a type of statistic. In practice, we can then use this statistic to estimate the unmeasured parameter, thereby being able to describe or draw conclusions about what we want to know even in those situations where calculating the exact parameter is not feasible.

How can an average calculated from only a subset be an accurate estimate of the population average? If the microwaves from the sample were in some way systematically different from the remaining microwaves that were not measured (for example, if the sample consisted only of defective microwaves while the remaining were not defective), then surely the average of the sample would be different from that of the population. If, on the other hand, the microwaves did not differ systematically from the remaining microwaves (that is, there was an equal representation of defective microwaves in the sample as there was in the larger population), then the average of the sample would be an accurate estimate of the average of the population.

In order to accurately generalize from sample data to make inferences about a population the smaller sample must be representative of the larger population. Random sampling, which is addressed in the next section, is the best way to produce a sample that is representative of the population.

In sum, statistics are numbers that are calculated from sample data, whereas parameters are numbers calculated from the whole population. Statistics come from samples, parameters from populations. When parameters are not known, statistics can be used to estimate them with the condition that samples are representative of the population.

Recap:

  • Statistics are numbers that are calculated from sample data, whereas parameters are numbers calculated from the whole population.

  • Statistics come from samples, parameters from populations.

  • When parameters are not known, statistics can be used to estimate them with the condition that samples are representative of the population.

  • Do not try to call every person you sell an item to in order to determine whether the item functions properly.

Want to try our built-in assessments?


Use the Request Full Access button to gain access to this assessment.