Estimating Control Chart Constants with R

In this post, I will show you how a very basic R code can be used to estimate quality control constants needed to construct X-Individuals, X-Bar, and R-Bar charts. The value of this approach is that it gives you a mechanical sense of where these constants come from and some reinforcement on their application.

If you work in a production or quality control environment, chances are you’ve made or seen a control chart. If you’re new to control charting or need a refresher check out Understanding Statistical Process Control, Wheeler et. al. If you want to dive in and start making control charts with R, check out R packages

ggQC: ggplot based QC charting
qcc: base graphics QC charting

If your familiar with control charts, you’ve likely encountered cryptic alpha-numeric constants like d2, A2, E2, d3, D3, and asked,

“What are they and where do they come from?”

Short (not so satisfying) answer: They are constants you plug into formulas to determine your control limits. Their value depends on how many samplings you do at a time and the type of chart you are making. For example, if you measure the size of 5 widgets per lot of 50, then your subgroup size, n, is 5 and you should be using a set of control chart constants for n = 5.

So where do they come from and how are they calculated?

Read on.

X-Bar and X-Individuals Constants

Often, control charts represent variability in terms of the mean range, R, observed over several subgroup rather than the mean standard deviation. The table below should make the idea of subgroup range and mean range more clear.

Why range? My guess is that, historically, employees at all levels would have understood the concept of range. Range requires no special computation, just (max-min). Speculation aside, we begin our quest to understand where control constants come from with the relationship shown in Eq. 1 that the mean subgroup range is proportional to standard deviation of the individual values.

The proportionality constant between R(X_{Sub_Grp_Indv}) and S(X_indv) is d₂, the first constant we’ll be estimating. The relationship is expressed in Eq.2

To estimate d₂ for n = 2 (i.e, the subgroup size is 2), we start by drawing two samples from a normal distribution with mean = 0 and sd = 1. Why you ask? Because it makes the math really simple. Consider Eq. 2, if S(X_indv) = 1 then R(X_{Sub_Grp_Indv}) = d₂.

So all we need to do to determine d₂ when n = 2 is:

Draw 2 individuals from the normal distribution,
Determine the range of the 2 samples.
Repeat many, many times
Take the average of all the ranges you’ve calculated, R
R = d₂ (when the mean = 0 and sd = 1)

The R code for the process is shown below.

```r
require(magrittr)  #Bring in the Pipe Function
reps &lt;- 1E6 set.seed(5555) replicate(reps, rnorm(n=2, mean = 0, sd = 1) %&gt;% #Draw Two From the Normal Distibution 
    range() %&gt;% #Determine the Range Vector = (Max, Min)
    diff() %&gt;% #Determine the Difference of the Range Vector
    abs() #Take the Absolution Value to make sure the Result is positive
    ) %&gt;% # Replicate the above proceedure 1,000,000 Times 
    mean() -&gt; R_BAR -&gt; d2 #Take the mean of the 1,000,000 ranges
d2
```
 
```
## [1] 1.12804
```

The pipes make the above code easy to read but slow things down quite a bit. The following code does the same thing about 12.5 times faster.

```r
reps &lt;- 1E6
set.seed(5555)
d2 &lt;- R_BAR &lt;- mean(replicate(reps, abs(diff(range(rnorm(2))))))
d2  
```
 
```
## [1] 1.12804
```

Once you have d₂, calculating E2 (3σ for the individuals) and A2 (3σ for the sub-group means) is straight forward as shown in Eq.3 – Eq.6. A2 and E3 are the coefficients to the left of R.

The code below gives the expected results for all the control constants need to construct X-Bar and X-Individual charts.

```r
c(N=2, d2 = d2, E2 = 3/d2, A2 = 3/(d2*sqrt(2)))
```
 
```
##        N       d2       E2       A2 
## 2.000000 1.128040 2.659480 1.880536
```

R-Bar Constants

The constants for R charts are d₃ (1σ around R,), D3 (Lower 3σ limit of R) and D4 (Upper 3σ limit of R). To get these constants, we start with the assumption that the standard deviation of R is proportional to the standard deviation of the individual X’s. The proportionality constant is d₃ shown in Eq.7. Notice that Eq. 7 has the same form as Eq. 2.

```r
reps &lt;- 1E6
set.seed(seed)
d3 &lt;- sd(replicate(reps, abs(diff(range(rnorm(2))))))
d3  
```
 
```
## [1] 0.8529419
```

Notice in the R code above, the only difference between the calculation of d₃ and d₂ is that we use standard deviation rather than the mean of the R_{Sub_Grp_Indv}. Now we have d₃, but we need to do a little simple algebra to express the S(R_{Sub_Grp_Indv}) in terms of R. Remember, historically the employee doesn’t need to worry about standard deviation – just ranges. We can define the above expression in term of R by combining Eq.2 and Eq.7, yielding Eq.8.

OK almost to D3 and D4. The lower 3σ limit of R can be expressed as Eq.8:

Factoring out the R terms on the right-hand side of the expression yields

The expression inside the parentheses is D3. D4, the upper limit of R is evaluated analogously. The only difference is a “+” sign in the final expression. Final expressions for D3 and D4 are:

All done! Here is the R code summarizing the constants for R using n=2.

```r
c(N  = 2, 
  d3 = d3,
  D3 = ifelse(1 - 3*d3/d2 &lt; 0, 0, 1 - 3*d3/d2), 
  D4 = 1 + 3*d3/d2
  )
```
 
```
##         N        d3        D3        D4 
## 2.0000000 0.8529419 0.0000000 3.2683817
```

Notice for D3 the value is 0, this is because the value calculated was negative. Such values are rounded to zero per the R code above.

Summary

In this post, we used R to estimate the control chart constants needed to produce X-Individuals, X-Bar, and R-Bar charts. All the constants together are shown below. In addition, the constants for n = 7 have also been presented.

```r
reps &lt;- 1E6
set.seed(5555)
FUN_d2 &lt;- function(x) {mean(replicate(reps, abs(diff(range(rnorm(x))))))}
FUN_d3 &lt;- function(x) {sd(replicate(reps, abs(diff(range(rnorm(x))))))}
 
Ns &lt;- c(2,7)
d2 &lt;- sapply(Ns, FUN_d2)
d3 &lt;- sapply(Ns, FUN_d3)
 
round(data.frame(
  N = Ns,
  d2 = d2,  
  E2 = 3/d2, 
  A2 = 3/(d2*sqrt(Ns)),
  d3 = d3,
  D3 = ifelse(1 - 3*d3/d2 &lt; 0, 0, 1 - 3*d3/d2), 
  D4 = 1 + 3*d3/d2  
  ), digits = 3)
```
 
```
##   N    d2    E2    A2    d3    D3    D4
## 1 2 1.128 2.659 1.881 0.854 0.000 3.272
## 2 7 2.704 1.109 0.419 0.833 0.076 1.924
```

3 Comments

Benjamin

October 17, 2018 at 1:02 pm Reply

These can be obtained directly from the studentized range distribution (`?ptukey`). Barbosa et. al use this in the IQCC package. Their function may be vectorized as

“`
d2 <- function (n){
n[n == 1] <- 2

fn <- quote(function(w){ptukey(w, n, Inf, lower.tail=FALSE)})

vapply(n,
function(n)
{
stats::integrate(eval(fn), 0, Inf)[[1]]
},
numeric(1))
}
“`
- Kenith Grey
  
  October 17, 2018 at 3:08 pm Reply
  
  Absolutely! Using ptukey is way faster and more accurate than simulating the result from the normal distribution. The goal with this post was to show a simple conceptual approach to estimating those constants.
Pingback:

December 5, 2018 at 3:17 am

ggQC | ggplot Quality Control Charts - New Release | R-BAR

X-Bar and X-Individuals Constants

R-Bar Constants

Summary

Related Posts

Control Chart Constants | Tables and Brief Explanation

XmR Control Limits | Why Moving Range, not Standard Deviation

ggQC | ggplot Quality Control Charts – New Release

3 Comments

Benjamin

Kenith Grey

Pingback:

Leave a Reply Cancel reply