1.5 Types of Variables
Independent and Dependent Variables
One of the most important concepts to nail down before using statistics in a business is knowing the difference between independent variables (IVs) and dependent variables (DVs).
In a business, there are so many variables that might be influential in any part of our business. For marketing, we could consider the time of year, the age of our audience, the colors of our advertisements, the attractiveness of the models used, or some quality of our product itself. For employee productivity, it might be our benefits packages, our corporate culture, the qualifications we publish in our job postings, or even our job descriptions themselves. Each of these possible influences can be properly called an independent variable. An independent variable consists of the groups (IV) and subgroups (levels) you are using to make comparisons. For the IV of color, you might compare the levels of red, blue, or yellow backgrounds on a website. For the IV of job, you might compare the levels of accountants, marketers, and managers.
Can you identify a situation in which each of the following could be independent or dependent variables?
-
heights of the players on a basketball team
-
race times
-
product MTBF
-
your score on the next quiz
A variable is independent when it is thought to be an influencer of other variables and, thus, is not affected by them (independent variables don't give in to peer pressure). Even though this idea of independence is not usually strictly true (there are very few variables that have an effect on other variables but are not influenced by them in turn), it is usually a pretty good rule of thumb, or heuristic, to identify them. So if you had to think of your business as a machine, the independent variables would be the inputs, or the raw materials. Each IV is required to have at least two levels, because we need at least two to make a comparison. As an example, if my IV was time of purchase, the levels might be morning, afternoon, and evening.
The dependent variables, on the other hand, are the outputs that you might want to compare or even predict, like profits, productivity, customer or employee satisfaction, and the like. Like a farmer's corn yield per acre depends on the amount, quality, or other aspects of the independent variables (seed, fertilizer, water, soil, weather), the outputs are dependent on the input of the independent variables. Strictly speaking, something that is a DV for one analysis could be the IV in another setting. For example, if you wanted to predict sales (the DV), you might look at the qualities of the salesperson (a series of IVs that might include product knowledge, friendliness, etc.). In another setting, you may be using sales as an IV in to predict probable growth in a company (DV).
Once you have found the IVs that most influence the DV (say, for this example, the most influential IV was product knowledge), you might then use product knowledge as the DV in a further study and see what IVs (corporate training, interactions with customers, personal insights, etc.) most influence the DV of product knowledge. Look for a cause and an effect. In the world of statistics, the kinds of analyses you can do will in great measure be determined by the kinds of IVs and DVs you have chosen and the kinds of data you can obtain about each. Choose wisely. Perhaps the greatest advantage of thinking in these terms is that when you know what you want to predict, measure, or change to help improve your business or find out what needs improvement, you are probably thinking of a DV. What you need at that point are the influencing variables, the IVs, that you think most impact the DV you have chosen. Likewise, when your company produces a lot of data (IVs), most of it will not be useful unless you can determine at least a few DVs you would like to measure or predict, like profit margins or sales numbers. Much of this course is about helping you choose statistical analyses that will help you uncover how influential your chosen IVs are on your DVs.
Quantitative and Qualitative Variables
A basic concept that arises in statistical analysis is the difference between qualitative variables and quantitative variables. Qualitative variables are variables that cannot be counted but are placed into categories. They are also called categorical variables. Quantitative variables measure how much or count how many. They can be separated into continuous variables and discrete variables. These distinctions are simple, but the implications are important for extracting the most information from the data at hand. Additionally, the kind of data you have will constrain the analyses possible. We will discuss that in the next section.
Quantitative Variables
Continuous Variables
Continuous variables are variables for which you can subdivide the values into smaller units and those smaller units have meaning. An easy way of thinking about this is that if decimal places have real meaning in your data, the variable is probably continuous. Some continuous variables might be the weight of your product, its length in centimeters, the time it takes to produce one unit, and the amount of profit per item. For each of these examples, the variable can take any value of an infinite number of values on a measurement scale.
Discrete Variables
Discrete variables are another type of quantitative variable. Discrete variables have a finite number of possibilities, often limited to whole numbers. Some examples of discrete variables would be the number of employees you have, how many products you sell, inventory numbers, and how many unit sales you have per month. For discrete variables, it would not make sense to say you have half a sale or that you have 2.5 employees working on a project (although you might have an employee share their hours between projects, the decimal works for the hours but not for the number of employees themselves). A discrete variable cannot be broken down into smaller groups. Discrete variables should be whole numbers—it doesn't really count if you sold only half of a car.
Categorical (Qualitative) Variables
Categorical (qualitative) variables place the cases into groups or categories. Some examples are demographic variables, like sex or country of origin, and categories that give valuable information to your business, such as differentiating between GenX and millennial shoppers. Categorical variables must be classified into groups (like male and female for gender) rather than represented as simple mathematical values. Information given through categorical variables can be extremely useful for analysts. Imagine being able to examine loyal versus infrequent customers or satisfied versus dissatisfied customers.
Similarly, categorical variables can have as many levels as needed to address your needs. While a simple continuous mathematical age range can be very useful, you might find that simply having three age ranges is sufficient to know what products will sell and to whom. If you sell cars, for example, you might find that you can simply divide your data into starter cars for those aged 16–25, family vehicles for those 25–45, and sports cars for those 45+. While built on the base of a continuous variable (age), the categorical labels might serve to both simplify the data and give more explanatory power than simply listing age itself. This is because the levels themselves have inherent meaning that allows us to understand more than just the simple label. For example, realizing you have a population that wants a family car has more value than observing that a customer is 37 with two children under eight years of age. We can see that the whole category of customers aged 25–45 may want the same type of vehicle.
Want to try our built-in assessments?
Use the Request Full Access button to gain access to this assessment.