[StATS]: When is a control chart not a control chart? (created 20070206).
I found a pair of data sets on the web that represent counts and where one goal of the data collection is to see if any of the individual counts differ from the overall average. They look quite similar and you might be tempted to analyze both of them using a control chart. But the second example is different in subtle
 but important ways and it is better analyzed using an approach called Analysis of Means (ANOM).
The first example comes from the NIST/SEMATECH eHandbook of Statistical Methods. This data set comes from a semiconductor manufacturing plant. The plant produces wafers
 each of which contains a hundred chips. These wafers will have defects on them and we want to monitor the rate at which defects occur to see if there is a shift for the better or for the worse in manufacturing quality.
The average rate of defects is 16. Did any wafers produce defects above the average of 16? You could take this literally and point out that the 3rd
 6th
 10th
 etc. chips were above the average of 16. But you only want to identify wafers that are above average after accounting for the uncertainty inherent in this process. A control chart (in this particular case a C chart) works well for this data.
The control limits are 8 and 32. Look for a single point outside the control limits (wafer #24) or eight consecutive points on the same side of the center line (no such examples in this chart). Some people will divide the control charts into three zones and use rules like 2 out of 3 consecutive points in Zone A
 or 4 out of five consecutive points in Zone B.
The Fraser Institute produces an Ontario hospital report card on a variety of measures. One measure if the volume of procedures performed at various hospitals. There is some evidence that hospitals that perform a large volume of procedures have better outcomes (an application
 perhaps
 of the old saying “practice makes perfect”). Here is the data on the number of carotid endarterectomy procedures performed in 20042005. Some material in the middle of the web page was removed to shrink the image size and to delete some data that is extraneous to this particular discussion.
The average hospital performs 52.5 procedures. Which hospitals perform more than the overall average? Your first instinct might be to use a control chart again
 and while this is not a terrible first step
 it does raise some issues. Most importantly
 a control chart is
 ore or less
 a run chart with
 a plot of the data in time sequence. There is no time sequence for this data set. Each hospital started counting procedures in January 2004 and stopped in December 2005. You could place the data in alphabetical order
 but that is a rather arbitrary choice, and then what would you do with the hospitals that are unidentified? If a hospital changed it’s name
 would you redraw the graph?
There’s another difference and this is much more subtle. A control chart represents a continuous monitoring of a work process. Although there are some practical limits on the number of points in a control chart
 there is no obvious upper bound that you can readily identify. You just keep on monitoring the quality of wafers until the factory closes or until it switches to a different product.
Because a control chart has no obvious upper limit on its length
 its working characteristics are defined in terms of average run lengths. The average run length is the number of data points that you would have to place on the chart on average
 before a point is declared to be out of control. You want the average run length to be large (in the hundreds) if the process actually is in control and to be small if the process is actually out of control. The control limits are defined to optimize the average run length.
With the hospital example
 the size of the chart is limited to the number of hospitals in Ontario (actually
 the number in Ontario that actually perform carotid endarterectomy). An average run length statistic is meaningless here because the number of hospitals is fixed. As a results the control limits defined in terms of average run length may be too conservative or too liberal.
Instead
 you should use a criteria that controls the probability that one or more hospitals will be identified as performing above or below the overall average. The general procedure for comparing a fixed number of groups to see if one group deviates significantly from the overall average is called analysis of means (ANOM).
The ANOM model can be used for continuous outcomes
 though it relies on an assumption of normality. It can also be used with proportions and rates as long as the sample sizes are sufficiently large that a normal approximation would hold.
The general formula for balanced data is
where I is the number of groups and N is the total sample size. If there are n observations in each group

then N=nI. The quantity NI represents the degrees of freedom for the pooled estimate of standard deviation. The table for h can be found in

Nelson et al

The Analysis of Means. The Analysis of Means. Peter R. Nelson

Peter S. Wludyka

Karen A.F. Copeland (2005) Philadelphia

PA: SIAM. [BookFinder4U link]](http://www.bookfinder4u.com/detail/089871592X.html)
You can also compute the critical value using a multivariate tdistribution. There are a few complications. First the distribution you are trying to describe represents deviations from an overall mean, so there will be correlations in the data since each group contributes to the overall mean. This correlation
 1/(I1)
 is the same for any pair of deviations. Second
 the sum of the deviations must equal zero, so this produces a degenerate distribution. This degeneracy means that there is no inverse for the correlation matrix.
In R
 there is a library
 mvtnorm
 that will allow you to compute the percentiles needed for ANOM. Here’s an example:
i < 25 co < matrix(1/(i1),nrow=i,ncol=i) diag(co) < rep(1,i) qmvt(p=0.95,tail="both.tails",corr=co,df=5000)
For count data
 you need to modify the formula slightly to
[Formula is misplaced. I will try to restore it soon.]
and there are further modifications if the data represents proportions or rates
 or if the sample size is unequal between groups. In this example
 I=25 (three of the data values are redundant). The upper and lower limits are 31 and 74.
Seven hospitals have volume above the overall average and twelve have volume significantly below the overall average.
This page was written by Steve Simon while working at Children’s Mercy Hospital. Although I do not hold the copyright for this material
 I am reproducing it here as a service
 as it is no longer available on the Children’s Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Analysis of means
 or Category: Control charts.
charts](../category/ControlCharts.html). means](../category/AnalysisOfMeans.html)
 or [Category: Control for pages similar to this one at [Category: Analysis of with general help resources. You can also browse Children’s Mercy Hospital website. Need more information? I have a page reproducing it here as a service
 as it is no longer available on the Hospital. Although I do not hold the copyright for this material
 I am This page was written by Steve Simon while working at Children’s Mercy