Activities of the Analyze Phase

As you and your team arrive at the Analyze phase, you will have gathered quite a bit of information and data from the previous stage. You will have developed a much clearer understanding of what is going on in the process and which steps may contribute the most to the significant problem. You now need to make sense of all that information and track down the cause-and-effect relationships that produce the issue you are addressing.

Your goal at this juncture is to develop knowledge that will assist the team in using its time in the Improve phase most effectively. The bulk of activities in the Analyze phase will focus on exploring the relationships between input (X) and output (CTQ) variables. The data and process tools you used in the Measure phase helped you focus on process factors that the team believes are most likely contributing to the problem at hand. A crucial point here at the Analyze phase is that belief (“most likely contributing”) is just a theory until you test it with additional data.

Identifying and Prioritizing Possible Causes

Initially, the team will focus on activities that stimulate creative thinking about the selected problem’s causes. You will think broadly about what is taking place with the process you are examining, and that is the project’s focus. Subsequently, you will perform data analysis as appropriate to verify whether there is a cause-and-effect relationship and how strong it might be.

The team starts considering a wide range of potential causes to find explanations for patterns. We can borrow some insights from design thinking here. According to IDEO, a global design and innovation company, we can think of the continuum of innovation as a system of overlapping spaces consisting of inspiration, ideation, and implementation. Before the implementation stage, a design team typically moves through the two other spaces: inspiration, in which you gather insights from an extensive range of possible sources, and ideation, in which you translate those insights into more concrete terms. These phases (or “spaces”—a term used to dismiss the notion of a sequence of orderly steps) involve engaging in divergent followed by convergent thinking.

Divergent thinking means solving problems by examining a multiplicity of causes and possible solutions to determine what works best. This type of thinking is advantageous because it creates multiples options. By contrasting and discussing different ideas, you will increase the possibility of generating something genuinely innovative and creative. Once you have completed the divergent thinking stage, you organize and structure the outcomes using convergent thinking. The latter is a practical way of deciding among existing alternatives.

Applying divergent thinking, you and your team can produce ideas (hypotheses) about factors (Xs) contributing to problems in the project’s process. You can leverage the detailed process map created in the previous phase to brainstorm possible causes (if not already done, you can create one). Another tool that you can use is the cause-and-effect diagram. The cause-and-effect chart is a popular means for brainstorming and analyzing causation in a process. Similarly, you can also use the 5 Whys: a suitable method for stimulating people to think about root causes.

Once you have generated a list of possible causes (Xs) during the divergent thinking phase, you now need to converge towards a few crucial ones; i.e., you need to prioritize the Xs. You can use the Pareto chart and Failure Modes and Effects Analysis (FMEA) for those purposes. Figure 9.1 shows a simple Pareto analysis for a travel reimbursement process. Note that the first four possible causes cover nearly 85% of the total effects, which, in this case, are invoice rejection. Issues with invoices were one of the causes identified through the cause-and-effect diagram of a project focusing on the travel reimbursement process. We discuss FMEA later in this topic.

Figure 9.1: Invoice Rejection for a Travel Reimbursement Process

You can select any cause from a cause-and-effect diagram or a tall bar on a Pareto chart, such as “missing required signature” in the example in Figure 9.1, to perform a 5 Whys analysis. The team would ask why invoices are missing signatures. Then ask “why” the answer is occurring and so on. Keep in mind that the number 5 is just a reference and is not written in stone. You will typically get to the root cause after a few “whys,” but sometimes you may need to continue beyond five times. You should stop when you reach a possible cause that the team can work on.

Investigating the Possible Causes

After converging on a few crucial potential causes, you will need to confirm whether they actually contribute to the problem. When initially picking out inputs (Xs), the team will have a theory (“most likely or tentative explanation”) on how each X affects the CTQ. The next step is to use the data on the inputs (Xs) to test the theory and determine which Xs actually affect the output (CTQ). A general procedure consists of three stages:

  1. Develop the theory: here, you state how you believe each X impacts the CTQ.

  2. Test the theory: here, you determine whether a given X is critical to the CTQ via statistical methods, process knowledge, or literature review.

  3. Draw a conclusion: now you state whether the X is critical to the CTQ based on the test.

On some occasions, you and the team may not have access to appropriate data. However, you may be very familiar with the process and can leverage your process knowledge to establish whether an X is critical. You can develop process knowledge by studying process theory and through experience with a process. For instance, lean manufacturing theory explains the direct relationship between batch size and cycle time. The theory predicts that if you decrease the batch size, you will decrease cycle time. The idea here is to make an educated guess in more obvious circumstances or when data is not available, try a solution, and study what happens.

If the team is working on a relatively straightforward issue, you will likely not need sophisticated methods to verify the impact. In such situations, you can confirm the effect using, for instance, scatter diagrams, run charts, simple correlation, or regression analysis. In other cases, you may need more rigorous and precise procedures to test the effect of your potential causes, such as inferential and hypothesis testing methods. The best analysis, conclusions, and solutions, though, are typically based on solid data and statistical techniques.

Conducting Causal Analysis through Statistical Inference

The team can leverage basic statistical hypothesis testing principles and techniques when situations require a more sophisticated means of establishing the effects of potential causes. The team's primary goal in the Analyze phase is to discover which inputs (Xs) affect the output (CTQ) of interest in a process. For that purpose, you and the team will execute hypothesis testing. For each potential cause (X), you will perform the following:

  1. Develop two hypotheses (null and alternative)

  2. Test the hypotheses by analyzing data from the process

  3. If the alternative hypotheses hold, add the X to a list of significant causes

You collect data for each X and then develop hypotheses that will help you establish whether the selected Xs correctly impact the CTQ so that you can develop solutions in the Improve phase. The type of hypothesis test you will use will depend on the answers you and your team are seeking. However, the different types of tests follow the same general guidelines:

  • You start stating the null and alternative hypothesis.

  • You set the confidence level.

  • You select the appropriate test.

  • You compare the statistics or criteria against reference criteria or distribution.

  • You draw a conclusion based on how the calculated statistics compare to the reference.

Recall that we draw conclusions about the overall process (the population in statistics lingo) by analyzing the sample data and measurements. When you develop the hypotheses, you are making a statement about the entire process (or the entire population) and not merely about the sample. Now let's refresh some points about the null and alternative hypotheses.

Null versus Alternative Hypothesis

The hypothesis test consists of two parts: the null hypothesis and the alternative hypothesis. The null hypothesis (H0) is typically a statement about the data that reflects no effect or no difference. More specifically, the null hypothesis states the assumption that there is no difference in criteria for two or more populations and that any observed difference in samples is caused by chance or sample error. For instance, you use the null hypothesis to examine whether the new process mean differs from the old process mean. The goal of the test is to find out if any change in the mean is caused by random variation or if the process has changed and the new mean is significantly different from the old. The null declares that all the data belongs to the same underlying population. You assume that the null is true until you have sufficient evidence against it.

The alternative hypothesis (Ha or H1) is typically a statement that is likely to be true if the null hypothesis is not true. The alternative states that the observed difference or relationship between two populations actually exists and is not the result of chance or sampling error. We can summarize the two parts of hypotheses testing by saying that the null hypothesis is a statement of “no effect” or “no difference” and sample observations result from chance, while the alternative hypothesis states that a difference or effect indeed exists. You always write the null hypothesis from the perspective that no change or difference is present and the alternative from the perspective that a change or difference is present (either not equal, greater than, or less than).

Let’s Talk Risk: Hypothesis Testing Error

Anytime you draw inferences about a process or population from sample data, you will face the risk of at least some probability of error. There are two types of risk in hypothesis testing:

  1. Type I Error: when you reject a true null hypothesis.

    • You measure the probability of this risk with α (alpha).

  2. Type II Error: when you accept a false null hypothesis.

    • You measure this probability with β (beta).

You handle these risks by selecting a confidence level. You normally establish the confidence level with the Type I error in mind, using alpha for the confidence level. Typically, you and your team will likely use a confidence level of 95 percent (α = 0.05). However, other common settings are 99 (α = 0.01) and 99.9 (α = 0.001) percent confidence level. The β risk is the probability of accepting the null hypothesis as true when it is actually false. In other words, it is the risk of not discovering a difference in the sample parameter when it actually exists. The value of β contributes to the sample size requirements and the power. Figure 9.2 summarizes the risks.

Figure 9.2: Type I and Type II Errors

Testing your Hypothesis

A frequent application of hypothesis testing that you will encounter is to check whether two means are equal. For instance, two supply chain analysts working for a major global consulting organization have been submitting very different amounts of billable hours, even though they have been working on otherwise similar projects. Upon some brainstorming, the senior team hypothesized that consult A’s staff took additional time to handle non-hourly billable tasks. The consulting organization bills specific administrative tasks at a flat rate. If the team uses extra time on those tasks, the organization cannot bill them by the hour and will lose revenue.

You could test the senior’s team theory (their tentative explanation about what may be causing the issue) by gathering data about the time the administrative staff spends on tasks. Let's say you selected a set of tasks where the organization charges a flat fee rather than by the hour and examine the time the staff spends on them. Here is how you would set up your hypothesis test:

  1. H0: there is no difference between the average time consultant A’s staff takes and the average time Consult’s B staff takes for such tasks.

    • H0Consultant A = μConsultant B

  2. Ha: Consultant B’s staff takes, on average, less time for these tasks.

    • H0Consultant A > μConsultant B

Software programs such as Minitab or even the widely available Microsoft Excel handle the calculations for you and return various values. For the purpose of hypothesis testing, you are predominantly concerned with the p-value. Each test returns a p-value, and you compare it to the alpha value you set before you ran the test. If the software returns a p-value for a test less than the alpha value you set, then you reject the null hypothesis and accept the alternative hypothesis. If the software returns a p-value more than the alpha value you set, then you fail to reject the null hypothesis, and you reject the alternative hypothesis. Let’s continue with our example.

Following the guidelines provided earlier, you have already formulated your hypotheses. You then decided that a confidence level of 95 percent (α = 0.05) suffices for this situation. You next draw two random samples to compare the mean times from the two consultants’ staff performing flat-billed tasks. Table 9.1 shows the data. Let’s assume that the data come from a normally distributed population, which will be reasonable when you have 20 or more observations. As you are comparing means between two samples, a 2-sample t-Test is appropriate.

Table 9.1
Time (in minutes) to Execute Non-billable Tasks
Consultant’s A Staff Consultant’s B Staff
68 69
71 86
69 95
61 84
76 80
81 84
75 70
70 88
82 73
52 93
64 83
89 86
72 88
60 91
68 79

Let’s use Excel to compute the statistics. You must install the Data Analysis ToolPak to perform the test. If you don't have it installed yet, there are two possible ways to do so. For the first way, just go to the File tab on the top left and click Options. Then, Click Add-Ins, and click Go. Check Analysis Tool Pak and click OK. The second method is to go to Tools, click Excel Add-ins, select Analysis ToolPak, and then click OK. After you enable it, click Data Analysis in the Analysis menu, as shown in Figure 10-3. You are now ready to produce the necessary statistics that will enable you to perform the comparisons. Let’s do it.

Figure 9.3: Data Analysis in Excel
  1. Click Data Analysis on the Analysis tab.

  2. From the Data Analysis popup, choose t-Test: Two-Sample Assuming Equal Variances.

  3. Under Input, select the ranges for both Variable 1 and Variable 2.

  4. In Hypothesized Mean Difference, enter zero. This value is the null hypothesis value, which represents “no effect”. In this case, a mean difference of zero represents no difference between the two staff teams - no effect.

  5. Check the Labels checkbox to include variables labels appearing in row 1.

  6. Confirm the default Alpha value of 0.05.

  7. Click OK.

Above, you selected t-Test: Two-Sample Assuming Equal Variances. When you have an equal or nearly equal number of observations in both groups and moderate sample size, t-Tests are robust to differences between variances, and you do not need to agonize over minor differences. If one group has twice the variance of the other, then you would need to rethink the choice.

Let’s now take a look at the output. It shows that the mean time for consultant’s A staff working on non-hourly billable hours is 70.5 minutes and for consultant’s B staff is 83.3 minutes. The statistic of the most significant value for you in this situation is the p-value. Recall that if the p-value is less than your significance level (α = 0.05 in this case), the difference between means is statistically significant. Let's look at the two-tailed test, P(T<=t), as it can detect differences in both directions: greater than or less than (Figure 9.4).

Figure 9.4: Excel Output for the Consultant's Case

Looking at the output in Figure 9.4, we see that the p-value in this example (0.000396) is less than the significance level you chose of 0.05, your alpha. Therefore, you can reject the null hypothesis. Your sample data would support the alternative hypothesis that the population means are different. Translating this finding to your specific case, you can conclude that consultant’s A staff seems to indeed be taking additional time to perform the non-hourly billable tasks.

Statistically, an important point to keep in mind is that we can never really accept or prove a null hypothesis. We can merely fail to reject the null based on certain probability. Likewise, we never accept or prove that the alternative hypothesis is right. We simply reject the null.

The embedded activity could not be inserted. (g2855568855a52001x2)
Click here to view a list of available activities.