StATS: Statistics for Boards (March 25, 2008).

I was asked to give a talk to the medical residents with the title "Statistics for Boards". Many health care professionals need to take boards or other certifying examinations during their training and afterwards to certify or re-certify their skill in an area. These boards often ask some basic statistics questions. A common theme appears to be, what statistic should I use in what situation. The answer often depends on what the predictor variable is and what the outcome variable is. Either variable could either be

• binary (two possible values),
• categorical (more than two possible values, but still a small number), or
• continuous (a large number of values, potentially any number inside a particular range).

In theory binary variables are subsumed under the categorical variable also includes binary variables, but I am deliberately separating the two.

Some people divide continuous variables into those that are normally distributed (their histogram follows a bell shaped curve) and those that are non-normal. I dislike such distinctions for a variety of reasons, but they don’t ask me to write these exams. Normality is never an issue for the predictor variable, only for the outcome variable.

There are often variables that are difficult to place in this classification scheme, but don’t worry about these. The goal of the boards is not to trip people up with technical distinctions, but rather to see if you understand some fundamental distinctions among various statistical analysis methods.

Here are some examples of binary variables:

• exposure status (exposed or unexposed),
• sex (male or female), and
• drug (active or placebo)

Here are some examples of categorical variables:

• cancer stage (Stage I, II, III, or IV),
• race/ethnicity (white/black/hispanic/other), and
• likert scale (strongly disagree, disagree, neutral, agree, strongly agree).

Here are some examples of continuous variables:

• body mass index (any value between 15 and 50 is possible),
• patient's age in years (any value between 1 and 99), and
• length of stay (any value between 1 day and 1 year).

Here’s a simple description of the statistical methods that are typically applied. I want to provide some of the “buzzwords” that you are likely to encounter without providing an in-depth discussion of any particular method. These questions are usually multiple choice. I'll list the most commonly cited answer first, but include some variants that you might encounter.

Binary predictor and binary outcome. Chisquare test (also known as Chi-square, Chi Squared, etc.). For small sample sizes some people will recommend a continuity correction or the use of Fisher’s Exact Test. In theory, you can use logistic regression here, but most exams will not be looking for or mentioning this option. It’s possible that the exam writers are looking for an odds ratio here or a relative risk. Don’t suggest a relative risk if the data comes from a case-control design.

Possible question: A study is examining demographic factors such as employment status (full/part time work vs. unemployed) educational level (high school diploma or better versus no high school degree) to see if they are associated with intestinal parasites (present or absent). What statistical test would you use?

Categorical predictor and binary outcome. Chisquare test again. Fisher’s Exact Test will not be an option. Technically, an extension is available, but ignore that. Logistic regression is also a possibility.

Possible question: Dental students were asked what influences were very important in helping them choose a career in Dentistry such as "regular working hours". Influences rated as very important were coded as 1 and influences rated only important or lower were coded as 0. What statistic would you use to examine the association between influence and the race/ethnicity of the respondent?

Continuous predictor and binary outcome. Logistic regression. There are no other serious competitors here.

Possible question: A group of 110 elderly patients were followed over a two year span to estimate the prevalence of falls and how it might be predicted by the patients age. What statistical model would be appropriate here?

Binary predictor and categorical outcome. Chisquare test again. This type of question is less likely to appear. For certain categorical outcomes that represent ranks or ordinal variables, consider the responses under binary predictor and continuous but non-normal outcome.

Binary predictor and continuous outcome. T-test. If the data is unmatched, then specify a two sample t-test or an independent samples t-test. If the data is matched, then specify a paired t-test.

Binary predictor and continuous but non-normal outcome. Mann-Whitney-Wilcoxon test. There are several permutations of the name of this test that incorporates different order or which ignores the contribution of Dr. Wilcoxon. If the data is matched, then specify a Wilcoxon signed ranks test.

Categorical predictor and binary outcome. Chisquare test. Logistic regression is also a solid choice here.

Categorical predictor and categorical outcome. Chi-square test. Some people will use the term contingency table analysis. In some situations, a specialized logistic regression model might work (ordinal logistic regression, multinomial logistic regression) but these choices are too technical to be on a board exam.

Categorical predictor and continuous outcome. Analysis of variance (ANOVA).

Categorical predictor and continuous but non-normal outcome. Kruskal-Wallis test. Sometimes this is called rank ANOVA or nonparametric ANOVA.

Continuous predictor and binary outcome. Logistic regression.

Continuous predictor and categorical outcome. This scenario is certainly possible, but will almost never be used in a board exam. For the record, you need to use specialized logistic regression model like ordinal logistic or multinomial logistic regression.

Continuous predictor and continuous outcome. Linear regression is your best choice here. A correlation coefficient (Pearson correlation, product moment correlation) might also be a possibility.

Continuous predictor and continuous but non-normal outcome. Spearman correlation coefficient. Another good choice is Kendall's correlation coefficient.

Other areas of statistics that a board exam might cover are:

• definitions of sensitivity and specificity,
• interpretation of confidence intervals and p-values,
• different epidemiological designs (e.g., case-control design, cohort design).

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Teaching resources.