## Infamous Inf – Part II

R’s Inf keyword – Have you’ve ever wondered what to do with it? If so, this is the second in series of posts that explore how we can exploit the keyword’s interesting properties to get the answers we need and improve code robustness. If you want to catch up on the first post where we look at Inf and the cut() function, please see Infamous Inf – Part I

For those unfamiliar with R’s Inf keyword, it is defined as a positive or negative number divided by zero yielding positive or negative infinity, respectively.

c(plus_inf = 1/0, minus_inf = -1/0)

# plus_inf minus_inf
#      Inf      -Inf

Sounds very theoretical. So how we can make practical use of infinity in R? In this first post, we’ll be discussing how Inf can make binning data with cut() a more robust process.

## Inf with integrate()

For many of us, Integration ranks low in the fun category, but sometimes its necessary. For example, to get the probability of a normally distributed event between two limits you would need to integrate the normal distribution density function between limits a and b. The functions are detailed below along with their R function equivalent.

$f(x |\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \qquad \mathrm{Density\ Function\ (aka.\ dnorm)} \\ P(x |\mu, \sigma^2) = \int\limits_a^b \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \qquad \mathrm{Proability\ Function \space (aka.\ pnorm)}$

The total area under the Density Function should be 1. So if we integrate dnorm(x) from negative to positive infinity we should get 1. Let’s see

LongDF <- function(x, mu = 0 , sigma = 1){
1/sqrt(2*pi*sigma^2) * exp(-(x-mu)^2/(2*sigma^2))
}

#dnorm and the long function are the same...
#dnorm(0) == LongDF(0)
#TRUE

integrate(dnorm, -Inf,  Inf)

The result:

1 with absolute error < 9.4e-05

As expected the answer is 1. The same result you’d get if you ran.

pnorm(q = Inf, mean = 0, sd = 1)

Which also yields, 1

## Infamous Inf – Part I

R’s Inf keyword – Have you’ve ever wondered what to do with it? If so, this is the first in series of posts that explore how we can exploit the keyword’s interesting properties to get the answers we need and improve code robustness.

For those unfamiliar with R’s Inf keyword, it is defined as a positive or negative number divided by zero yielding positive or negative infinity, respectively.

c(plus_inf = 1/0, minus_inf = -1/0)

# plus_inf minus_inf
#      Inf      -Inf

Sounds very theoretical. So how we can make practical use of infinity in R? In this first post, we’ll be discussing how Inf can make binning data with cut() a more robust process.

## Inf with cut()

Suppose you want to bin the following set of numbers into five discrete levels. Take note of the extreme values on the positive and negative end of the vector.

{-100,-1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000}

You might do something like,

numbers <- c(-100, -1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000)
bins <- cut(numbers, breaks = c(-101,-2,2,5,11, 1001))
xtabs(~bins)


bins Freq
(-101,-2] 1
(-2,2] 7
(2,5] 3
(5,11] 3
(11,1e+03] 2

The above code and data explicitly shows how we might bin the data including the extreme values -100, 100, and 1000. But with real data, extreme values can change when new data is added. For example, what if we received the following new pieces of data: -102 and 1002? Below, we’ve added them to our list and re-binned them as before to see what happens.

numbers <- c(-102, -100, -1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000,1002)
bins <- cut(numbers, breaks = c(-101,-2,2,5,11, 1001))
xtabs(~bins)

bins Freq
(-101,-2] 1
(-2,2] 7
(2,5] 3
(5,11] 3
(11,1e+03] 2

Oops! The new data was not included in the binning. We need to manually expand our limits, or we could perhaps do something clever with the max and min functions.

Alternatively, we can solve this problem with R’s $\pm$ Inf keyword by placing $\pm$ Inf in the breaks argument of the cut function (see line 2 below)

numbers <- c(-102, -100, -1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000,1002)
bins <- cut(numbers, breaks = c(-Inf,-2,2,5,11, Inf))
xtabs(~bins)

bins Freq
(-Inf,-2] 2
(-2,2] 7
(2,5] 3
(5,11] 3
(11, Inf] 3

Now, when new data is collected that is less than -2 or greater than 11 there will be a bin to catch it, no mater how big or how small.

## Control Charts with ggQC: XmR

Preparing an XmR plot is common when dealing with processes where a single product/item is made or measured and there is a significant time gap between the next production or observation. XmR plots can also be useful when dealing with outputs from a batch process rather than a continuous one.

In this post, we will show how to make quick QC XmR plots with the ggQC package available on cran or github.

cran: install.package("ggQC")

To get us started, let’s simulate some data on the diameter of a golden egg produced monthly by a golden goose.

set.seed(5555)
Golden_Egg_df <- data.frame(month=1:12,
egg_diameter = rnorm(n = 12, mean = 1.5, sd = 0.2)
)


For fun, lets make the egg on the third month intentionally larger than “normal”.

Golden_Egg_df$egg_diameter[3] <- 2.5  ## Simple XmR Plot Great! Let’s make the plot: Using ggQC, plots are made in the standard ggplot way, except we use a new stat function stat_QC(). library(ggplot2) library(ggQC) XmR_Plot <- ggplot(Golden_Egg_df, aes(x = month, y = egg_diameter)) + geom_point() + geom_line() + stat_QC(method = "XmR") XmR_Plot  As expected, the 3rd egg is not typical for our golden goose, falling slightly outside the upper control limit. Wouldn’t it be nice if it produced more large eggs like this. Perhaps we should go back and see if the feed was different during the month. You take the control chart to your boss. He too asks about the feed stock during the third month, but he also wants to know what the control limits are on the plot. To display the control limits, you use the stat_QC_labels function as shown below. ## Labeled XmR Plot XmR_Plot + stat_QC_labels(method="XmR")  ## mR Plot You return back to your boss. He is pleased with the plot, but now wishes to see a plot illustrating egg-to-egg consistency. For this, you will need a Moving Range (mR) plot. Using ggQC, we will call the stat_mR function to render the necessary data. **Note** it is normal for stat_mR() to produce a warning, because the first point becomes an NA. mR_Plot <- ggplot(Golden_Egg_df, aes(x = month, y = egg_diameter)) + stat_mR() + stat_QC_labels(method="mR") mR_Plot  ## Faceting The farmer is pleased. The consistency looks good with the exception of the third month. Moving on, he mentions that there is a town fair coming up and there will be eggs there produced by geese from two other towns (Trentown and Maxhall). He gives you last year’s monthly data for each goose and wishes to compare how his own goose’s performed using control charts. Below, we put the data provided in data frames and bind them together into one data frame. (data is simulated) set.seed(6666) OurGoose <- data.frame(month=1:12, egg_diameter = rnorm(n = 12, mean = 1.5, sd = 0.2), group = "OurGoose" ) TrentTown <- data.frame(month=1:12, egg_diameter = rnorm(n = 12, mean = 2.5, sd = 0.6), group = "TrentTown" ) Maxhall <- data.frame(month=1:12, egg_diameter = rnorm(n = 12, mean = .5, sd = 0.1), group = "Maxhall" ) GooseData <- rbind(OurGoose, TrentTown, Maxhall)  Knowing the farmer will want to look at the diameter of the eggs and the consistency of production, you anticipate his request with the following faceted plot. library(gridExtra) XmR_Town <- ggplot(GooseData, aes(x = month, y = egg_diameter)) + geom_point() + geom_line() + stat_QC(method = "XmR") + stat_QC_labels(method="XmR") + facet_grid(.~group) + xlim(0,14) mR_Town <- ggplot(GooseData, aes(x = month, y = egg_diameter)) + stat_mR(method = "mR") + stat_QC_labels(method="mR") + facet_grid(.~group) + ylab("mR") + xlim(0,14) grid.arrange(XmR_Town, mR_Town, nrow = 2)  ## Summarizing Result in a Table Farmer is again pleased, and makes a final request that you summarize the mean and control limits for each goose in a table that he can print and put in his wallet for the fair. This can be done quickly using ddply and the QC_Lines function. library(plyr) ddply(GooseData, .variables = "group", .fun = function(df) {QC_Lines(data = df$egg_diameter, method = "XmR")}
)



group xBar_one_LCL mean xBar_one_UCL mR_LCL mR mR_UCL
OurGoose 1.067510 1.4617341 1.8559579 0 0.1482044 0.4843321
TrentTown 1.291068 2.5171125 3.7431568 0 0.4609189 1.5062830
Maxhall 0.104316 0.4570895 0.8098629 0 0.1326216 0.4334074

## Control Charts with ggQC: XbarR

XbarR charts are useful when monitoring a continuous process over time and your taking multiple samples in a given period. Some examples might include,

• the first, middle, and last parts coming off an assembly line,
• subgroups of molded parts produced several at a time over several cycles,
• batch uniformity of continuously produced chemical / material.

In this post, we will show how to make quick QC XbarR plots with the ggQC package available on cran or github.

 cran: install.package("ggQC")

Generating an Xbar or XbarR plot with ggQC is simple. To get us started, let’s simulate some production line data on candles. The candles are shaped using a mold capable of producing 4 units a cycle. Each cycle takes an hour. Thus in a 24 hour period, the process would yield 96 candles. The parameter being tracked is candle width.

## Simulate Some Data

The code block below, generates the candle width data we are going to use to work with ggQC. Here we are generating data for the 4 mold cells. Three of the mold cells are performing normally, one of the mold cells is not.

set.seed(5555)
candle_df1t3 <- data.frame(
Cycle = as.factor(rep(1:24, each=3)),
candle_width = rnorm(n = 3*24, mean = 10, sd = 1),
mold_cell = as.ordered(rep(1:3))
)

candle_df4 <- data.frame(
Cycle = as.factor(rep(1:24, each=1)),
candle_width = rnorm(n = 1*24, mean = 11, sd = 2),
mold_cell = as.ordered(rep(4, each=24))
)

candle_df <- rbind(candle_df1t3, candle_df4)


## Simple XbarR Plot

Making a plot with ggQC and ggplot is simple

NOTEs:

Remember to set the group aesthetic equal to 1. Otherwise you will end up with far more control lines than you want.

XbarR is the default method for stat_QC and stat_QC_labels functions.

library(ggplot2)
library(ggQC)

XbarR <- ggplot(candle_df, aes(x = Cycle, y = candle_width, group = 1)) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line") +
stat_QC()

XbarR


Data looks to be in good control, but it would be nice to have the center line and control limits labeled before presenting it to the line manager.

## Labeled XbarR Plot

XbarR + stat_QC_labels()


Your line manager is happy to see the candles are being produced as intended, but would like to get a sense of the process consistency. For this you will need an R-Bar chart.

## R Bar Chart

R_Bar <- ggplot(candle_df, aes(x = Cycle, y = candle_width, group = 1)) +
stat_summary(fun.y = QCrange, geom = "point") +
stat_summary(fun.y = QCrange, geom = "line") +
stat_QC(method="rBar") +
stat_QC_labels(method="rBar") + ylab("R-Bar")

R_Bar


The second run for the day was more inconsistent than usual, and the line manager lifts an eye brow. He has just received a customer complaint that some of the candles are too wide to fit in their candle holders. He asks if you can show the individuals on the plot and the natural control limits.

## XbarR with Individuals

XbarR <- ggplot(candle_df, aes(x = Cycle, y = candle_width, group = 1)) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line") +
stat_QC() + stat_QC_labels() +
# Show Individuals
geom_point(alpha= 1/5) +
stat_QC(n=1, color.qc_limits = "orange") +
stat_QC_labels(n=1, color.qc_limits = "orange")

XbarR


Orange lines show the 3sigma control limits for the individual samples. The line manager is surprised to see so many individuals’ widths are over 13 units, and wants to know how a candle width of nearly 16 could occur. Leaving the conversation you are asked to examine the data as a function of the different cells in the mold.

## Colorizing the Data

XbarR <- ggplot(candle_df, aes(x = Cycle, y = candle_width, group = 1, color=mold_cell)) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line") +
stat_QC() + stat_QC_labels() +
# Show Individuals
geom_point(alpha= 1/2) +
stat_QC(n=1, color.qc_limits = "orange") +
stat_QC_labels(n=1, color.qc_limits = "orange")

XbarR



Mold Cell 4 (purple in the plot) looks a little suspicious. So you plot an XmR chart for each cell.

## Faceting

XmR <- ggplot(candle_df,
aes(x = Cycle, y = candle_width, group = 1, color = mold_cell)) +
geom_point() + geom_line() +
stat_QC(method="XmR") + stat_QC_labels(method="XmR") +
facet_grid(.~mold_cell)

XmR


Nice work! Looks like we need to replace the mold.