P.Mean: Twelve subjects in a within subject design (created 2011-07-30).

News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.

How valid is a study which uses a sample size of 12 with a within-subject design? Or how well can we extrapolate the conclusions to the population studied. I have heard that you need at least 30 per group to get a normal distribution.

A within-subject design has many of the same strengths as a crossover study, and often you can draw strong conclusions, even from a small sample size. That being said, I would still need to know more about the design before I could state anything conclusively.

That been said, I have read that even numbers > approaching 10 can get you close to a normal distribution. The cut-off of 30 is a total myth. Before computerized tables were available, books would publish tables for the t-distribution and they would stop, more often than not at 30 degrees of freedom. This led to a commonly held belief that 30 or less is a small sample size. How rapidly something approaches a normal distribution depends on a lot of factors. An average of ten uniformly distributed random variables is very close to normal, but you can find other distributions where you need the average of a hundred or even a thousand values to get close to a normal distribution. But lack of normality is almost totally irrelevant. Lack of normality might distort the Type I error rate a bit, and your confidence intervals might be 92% intervals rather than 95% intervals, but this is the least of your worries. So what should you worry about with small sample sizes. I'd worry about three things: 1. inadequate precision 2. chance imbalances during randomization 3. lack of replicability Others have tackled replicability, which depends largely on the heterogeneity of the population, so I don't need to cover this. 1. Inadequate precision. I tell this story all the time to highlight problems with inadequate precision. A researcher gets a six year, ten million dollar research grant and writes up a report at the end of the study which says "This is an exciting advance in surgical techniques and we are 95% confident that the cure rate is between 3% and 98%." That is a terrible waste of money, of course. So look at your confidence intervals. Are they wide enough to drive a truck through? Then the study is worthless. But I bet that with the right within subject design, you could get nice tight confidence intervals. 2. Chance imbalances during randomization. Randomization relies on the law of large numbers. But just as a flip of twelve coins will not lead to exactly six heads and six tails, a randomization of twelve subjects might leads to serious imbalances in covariates. But this is a within subjects design. It is impossible to have covariate imbalance in a within subject design, because each subject serves as their own control. So there's nothing to worry about here. In a parallel group design, chance imbalances are fairly common in a total sample size of 10 or less, but fairly rare in a total sample size of 40 or more. Your sample size of 12 would be questionable, except for the fact that you are using a within subjects design. 3. Lack of replicability. Complex statistical models replicate poorly with small sample sizes. This is pretty easy to demonstrate. Get a huge database and fit a complex model to twelve randomly selected records in that database. Then fit the same complex model to a different set of twelve randomly selected subjects. Do the two analyses agree, more or less? It turns out that you should have 10-15 observations for every parameter that your model is estimating. So if you have a two group comparison, and you are adjusting for three different risk factors, you have four parameters in your model (some people would quibble and say you have five parameters, but let's not fuss). This model would replicate well if you had more than 60 (4*15) subjects in your data set. I'm not sure how complex your within subjects deign is, but anything more than a single parameter is problematic in a data set with 12 total subjects. > The study I am talking about measures muscle fiber size using a > biopsy and also assess strength measurements. So a within subjects design would imply multiple biopsies? Ouch! > In the exercise field, we sill use P-values so they haven't reported > the confidence intervals. Write a letter to one of your "A" journals the next time that a big paper comes out and tell them that the results are uninterpretable without confidence intervals. I'll show you the right references to cite. Steve Simon, net@pmean.com, Standard Disclaimer. Sign up for the Monthly Mean, the newsletter that dares to call itself average at www.pmean.com/news I

Creative Commons License This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Incomplete pages.