This article discusses the common probability types of distributions in Six Sigma Black Belt Projects. Many statistical approaches you’ll learn later are built on the assumption that the data is normally distributed. So, it’s important to gain an early understanding of the normal distribution. We’ll also cover some less common statistical distributions used in Six Sigma Black Belt Projects and show why they matter.
The Normal Distribution
The most common distribution used in Six Sigma is the normal distribution.
The Normal Distribution has these 3 unique characteristics:
- Only Random Error is Present
- There is no evidence of Assignable Cause
- There are no drifts or shifts in the data as evidenced by the fact that the [Mean = Median = Mode].
The obvious conclusion from items 1-3 is that if the data is not normally distributed, then the following are likely true:
- Probably more than random error is present
- Probably there is evidence of assignable special cause
[callout title=Note on the Normal Distribution]As you assess the distributions in your data set, understand that it’s difficult to determine what’s affecting a process if the data set is normally distributed. Special cause is harder to determine when your data set is a pretty little normal distribution. When it’s skewed in some way, then it’s much easier.[/callout]
Most Used and Abused Distribution
While pretty and smooth, the normal distribution is the most used probability distribution – and because it’s so misunderstood, it’s also the most abused. And while it serves as the foundation of many statistical tools that we’ll learn later in the Measure Phase and in the Analyze Phase, encountering the normal distribution in real life is not common.
Characteristics of Normal Distribution
The Normal Distribution is a function of two parameters: The Mean and the Standard Deviation.
Each combination of the Mean and Standard Deviation can produce a unique normal curve. Below are pictures of normal distributions.
Notice how they are different?
If your distribution has the normal bell shape, but is uniquely different from the “Standard” Normal Curve, then we can transform our unique normal distribution to the “Standard” Normal Distribution.
[callout title=Why be Normal?]Why do we want to do this? By doing this it allows us to use the Z Table, which helps us to be able to compare various normal distributions and lets us estimate tail area proportions. This act of converting our data to the Standard normal is called “Normalizing”.[/callout]
By Normalizing our data, we convert the raw score into the standard Z Scores with a Mean = 0 and a Standard Deviation = 1. This allows us to use the Z Table to make estimates.
Area Under the Curve and Process Capability
I understand that this material is getting a little academic. I want you to keep the end in mind: this information will help us later on determine process capability, a key measure in Six Sigma. More on that later.
Proportion of Distribution
The area under the curve between any 2 points represents the proportion of distribution between those points. If we can estimate the area under the curve between 2 points, then we’ll be able to estimate process capability.
Extending what we now know about the area under the curve, we can predict more accurately our estimates of processes are performing. This curve will be familiar to you – it’s the foundation of Six Sigma.
The Empirical Rule
Based on this data and what we know about the area under the curve, we can make the following conclusions:
- 68.27 % of the data will fall within +/- 1 standard deviation
- 95.45 % of the data will fall within +/- 2 standard deviations
- 99.73 % of the data will fall within +/- 3 standard deviations
- 99.9937 % of the data will fall within +/- 4 standard deviations
- 99.999943 % of the data will fall within +/- 5 standard deviations
- 99.9999998 % of the data will fall within +/- 6 standard deviations
This means that, regardless of the shape of the distribution in our data set, when we get beyond 3 standard deviations from the mean, the probability of occurrence will be very low.
Below are several distributions you might encounter in your DMAIC project work:
Other Distributions Used in Black Belt DMAIC Projects
I’ve already indicated that the Normal Distribution will be the most commonly used distribution in your time running DMAIC projects. But, in the wild – in real life – encountering a data set that is approximated by the Normal Distribution will not be a common occurrence. So, to help you get accustomed to what you might see, below are a few distributions to keep in mind.
Binomial Distribution
What is it?
The Binomial Distribution is used to model discrete data and applies when the population is large (where N is greater than 50) and the sample size is small compared to the population.
When to Use it?
Use the Binomial Distribution when the proportion of defects is equal to or greater than 0.1 (this means it’s really small).
Poisson Distribution
What is it?
From experience, the Poisson Distribution is a common distribution in waiting lines, call centers, and in many transactional processes. Don’t be surprised if you see it in trying to approximate data from those areas.
When to Use it?
The Poisson Distribution can be used to model discrete data.
Chi Square Distribution
What is it?
The Chi Square Distribution is formed by summing the squares of the standard normal random variables. For example, if z is a standard normal random variable, then the sum of the squares will for the Chi Square.
When to Use it?
The Chi Square is common (from my experience) in healthcare processes and in areas where discrete data is found such as Go/No Go, Present/Not Present, etc.
Next Up
In our next module, we’ll treat different data distributions and learn various ways we can visualize the data using graphical methods.
[contentblock id=16 img=gcb.png]
Comments are disabled for this post.