P.Mean >> Statistics webinar >> What do all these numbers mean? What do all these numbers mean? Odds ratios, relative risks, and number needed to treat.

Abstract: This one hour training class will teach you some of the numbers used in studies where the outcome only has two possible values (e.g., dead/alive). The odds ratio and the relative risk are both measures of risk used for binary outcomes, but sometimes they can differ markedly from one another. The relative risk offers a more natural interpretation, but certain research designs preclude its computation. Another measure of risk, the number needed to treat, provides comparisons on an absolute rather than relative scale and allow you to assess the trade-offs between effects and harms.

Objectives: In this class you will learn how to:

• compute an odds ratio and a relative risk from a two by two table;
• list the types of research designs where the relative risk should not be computed, and
• make clinical judgments about the benefits and harms of a therapy using the number needed to treat/harm.

Handout: The final handout is not available yet. A draft handout will be available very soon.

Outline:

• Practice exercises
• Odds ratio versus relative risk
• Number needed to treat

Practice Exercises

1. Read the following abstract. The authors report an adjusted odds ratio of 5.0 for low socioeconomic index. Compute a crude odds ratio using the data that appears in the abstract. Does it differ much from the adjusted odds ratio? Interpret the adjusted odds ratio and its associated confidence interval.

Socioeconomic disparities in intimate partner violence against Native American women: a cross-sectional study. Malcoe LH, Duran BM, Montgomery JM. BMC Med 2004: 2(1); 20. [Medline] [Abstract] [Full text] [PDF]

BACKGROUND:
Intimate partner violence (IPV) against women is a global public health problem, yet data on IPV against Native American women are extremely limited. We conducted a cross-sectional study of Native American women to determine prevalence of lifetime and past-year IPV and partner injury; examine IPV in relation to pregnancy; and assess demographic and socioeconomic correlates of past-year IPV.
METHODS:
Participants were recruited from a tribally-operated clinic serving low-income pregnant and childbearing women in southwest Oklahoma. A self-administered survey was completed by 312 Native American women (96% response rate) attending the clinic from June through August 1997. Lifetime and past-year IPV were measured using modified 18-item Conflict Tactics Scales. A socioeconomic index was created based on partner's education, public assistance receipt, and poverty level.
RESULTS:
More than half (58.7%) of participants reported lifetime physical and/or sexual IPV; 39.1% experienced severe physical IPV; 12.2% reported partner-forced sexual activity; and 40.1% reported lifetime partner-perpetrated injuries. A total of 273 women had a spouse or boyfriend during the previous 12 months (although all participants were Native American, 59.0% of partners were non-Native). Among these women, past-year prevalence was 30.1% for physical and/or sexual IPV; 15.8% for severe physical IPV; 3.3% for forced partner-perpetrated sexual activity; and 16.4% for intimate partner injury. Reported IPV prevalence during pregnancy was 9.3%. Pregnancy was not associated with past-year IPV (odds ratio = 0.9). Past-year IPV prevalence was 42.8% among women scoring low on the socioeconomic index, compared with 10.1% among the reference group. After adjusting for age, relationship status, and household size, low socioeconomic index remained strongly associated with past-year IPV (odds ratio = 5.0; 95% confidence interval: 2.4, 10.7).
CONCLUSIONS:
Native American women in our sample experienced exceptionally high rates of lifetime and past-year IPV. Additionally, within this low-income sample, there was strong evidence of socioeconomic variability in IPV. Further research should determine prevalence of IPV against Native American women from diverse tribes and regions, and examine pathways through which socioeconomic disadvantage may increase their IPV risk.

2. Read the following abstract. The crude odds ratios for Fissured Tongue and for benign migratory glossitis have been removed from this abstract. Calculate these value using the information provided in the abstract. Interpret these odds ratios and the associated confidence intervals.

Tongue lesions in psoriasis: a controlled study. Daneshpazhooh M, Moslehi H, Akhyani M, Etesami M. BMC Dermatol 2004: 4(1); 16. [Medline] [Abstract] [Full text] [PDF]

BACKGROUND:
Our objective was to study tongue lesions and their significance in psoriatic patients.
METHODS:
The oral mucosa was examined in 200 psoriatic patients presenting to Razi Hospital in Tehran, Iran, and 200 matched controls.
RESULTS:
Fissured tongue (FT) and benign migratory glossitis (BMG) were the two most frequent findings. FT was seen more frequently in psoriatic patients (n = 66, 33%) than the control group (n = 19, 9.5%) [odds ratio (OR): [DELETED]; 95% confidence interval (CI): 2.61-8.52] (p-value < 0.0001). BMG, too, was significantly more frequent in psoriatic patients (28 cases, 14%) than the control group (12 cases, 6%) (OR: [DELETED]; 95% CI: 1.20-5.50) (p-value < 0.012). In 11 patients (5.5%), FT and BMG coexisted. FT was more frequent in pustular psoriasis (7 cases, 53.8%) than erythemato-squamous types (56 cases, 30.4%). On the other hand, the frequency of BMG increased with the severity of psoriasis in plaque-type psoriasis assessed by psoriasis area and severity index (PASI) score.
CONCLUSIONS:
Nonspecific tongue lesions are frequently observed in psoriasis. Further studies are recommended to substantiate the clinical significance of these seemingly nonspecific findings in suspected psoriatic cases.

3. Read the following abstract. The authors report an adjusted odds ratio of 0.19 for presence of contraindication. Compute a crude odds ratio using the data that appears in the abstract. Does it differ much from the adjusted odds ratio? Interpret the adjusted odds ratio and its associated confidence interval.

Breastfeeding practices in a cohort of inner-city women: the role of contraindications. England L, Brenner R, Bhaskar B, Simons-Morton B, Das A, Revenis M, Mehta N, Clemens J. BMC Public Health 2003: 3(1); 28. [Medline] [Abstract] [Full text] [PDF]

BACKGROUND:
Little is known about the role of breastfeeding contraindications in breastfeeding practices. Our objectives were to 1) identify predictors of breastfeeding initiation and duration among a cohort of predominantly low-income, inner-city women, and 2) evaluate the contribution of breastfeeding contraindications to breastfeeding practices.
METHODS:
Mother-infant dyads were systematically selected from 3 District of Columbia hospitals between 1995 and 1996. Breastfeeding contraindications and potential predictors of breastfeeding practices were identified through medical record reviews and interviews conducted after delivery (baseline). Interviews were conducted at 3-7 months postpartum and again at 7-12 months postpartum to determine breastfeeding initiation rates and duration. Multivariable logistic regression analysis was used to identify baseline factors associated with initiation of breastfeeding. Cox proportional hazards models were generated to identify baseline factors associated with duration of breastfeeding.
RESULTS:
Of 393 study participants, 201 (51%) initiated breastfeeding. A total of 61 women (16%) had at lease one documented contraindication to breastfeeding; 94% of these had a history of HIV infection and/or cocaine use. Of the 332 women with no documented contraindications, 58% initiated breastfeeding, vs. 13% of women with a contraindication. In adjusted analysis, factors most strongly associated with breastfeeding initiation were presence of a contraindication (adjusted odds ratio [AOR], 0.19; 95% confidence interval [CI], 0.08-0.47), and mother foreign-born (AOR, 4.90; 95% CI, 2.38-10.10). Twenty-five percent of study participants who did not initiate breastfeeding cited concern about passing dangerous things to their infants through breast milk. Factors associated with discontinuation of breastfeeding (all protective) included mother foreign-born (hazard ratio [HR], 0.55; 95% CI 0.39-0.77) increasing maternal age (HR for 5-year increments, 0.80; 95% CI, 0.69-0.92), and infant birth weight > or = 2500 grams (HR, 0.45; 95% CI, 0.26-0.80).
CONCLUSIONS:
Breastfeeding initiation rates and duration were suboptimal in this inner-city population. Many women who did not breastfeed had contraindications and/or were concerned about passing dangerous things to their infants through breast milk. It is important to consider the prevalence of contraindications to breastfeeding when evaluating breastfeeding practices in high-risk communities.

4. Read the following abstract. The relative risk for cryotherapy has been removed. Calculate this value using the information provided in the abstract. Interpret this relative risk and the associated confidence interval.

Treatment of Retinopathy of Prematurity with topical ketorolac tromethamine: a preliminary study. Avila-Vazquez M, Maffrand R, Sosa M, Franco M, De Alvarez BV, Cafferata ML, Bergel E. BMC Pediatr 2004: 4(1); 15. [Medline] [Abstract] [Full text] [PDF]

BACKGROUND:
Retinopathy of Prematurity (ROP) is a common retinal neovascular disorder of premature infants. It is of variable severity, usually heals with mild or no sequelae, but may progress to blindness from retinal detachments or severe retinal scar formation. This is a preliminary report of the effectiveness and safety of a new and original use of topical ketorolac in preterm newborn to prevent the progression of ROP to the more severe forms of this disease.
METHODS:
From January 2001 to December 2002, all fifty nine preterm newborns with birthweight less than 1250 grams or gestational age less than 30 weeks of gestational age admitted to neonatal intensive care were eligible for treatment with topical ketorolac (0.25 milligrams every 8 hours in each eye). The historical comparison group included all 53 preterm newborns, with the same inclusion criteria, admitted between January 1999 and December 2000.
RESULTS: Groups were comparable in terms of weight distribution, Apgar score at 5 minutes, incidence of sepsis, intraventricular hemorrhage and necrotizing enterocolitis. The duration of oxygen therapy was significantly longer in the control group. In the ketorolac group, among 43 children that were alive at discharge, one (2.3%) developed threshold ROP and cryotherapy was necessary. In the comparison group 35 children survived, and six child (17%) needed cryotherapy (Relative Risk [DELETED], 95%CI 0.00 to 0.80, p = 0.041). Adjusting by duration of oxygen therapy did not significantly change these results. Adverse effects attributable to ketorolac were not detected.
CONCLUSIONS: This preliminary report suggests that ketorolac in the form of an ophthalmic solution can reduce the risk of developing severe ROP in very preterm newborns, without producing significant adverse side effects. These results, although promising, should be interpreted with caution because of the weakness of the study design. This is an inexpensive and simple intervention that might ameliorate the progression of a disease with devastating consequences for children and their families. We believe that next logical step would be to assess the effectiveness of this intervention in a randomized controlled trial of adequate sample size.

5. Read the following abstract. The relative risks for reduced blood loss, shivering, and pyrexia have been removed. Calculate these values using the information provided in the abstract. Interpret these relative risks and their associated confidence intervals.

Misoprostol for treating postpartum haemorrhage: a randomized controlled trial [ISRCTN72263357]. Hofmeyr GJ, Ferreira S, Nikodem VC, Mangesi L, Singata M, Jafta Z, Maholwana B, Mlokoti Z, Walraven G, Gulmezoglu AM. BMC Pregnancy Childbirth 2004: 4(1); 16. [Medline] [Abstract] [Full text] [PDF]

BACKGROUND:
Postpartum haemorrhage remains an important cause of maternal death despite treatment with conventional therapy. Uncontrolled studies and one randomised comparison with conventional oxytocics have reported dramatic effects with high-dose misoprostol, usually given rectally, for treatment of postpartum haemorrhage, but this has not been evaluated in a placebo-controlled trial.
METHODS:
The study was conducted at East London Hospital Complex, Tembisa and Chris Hani Baragwanath Hospitals, South Africa. Routine active management of the third stage of labour was practised. Women with more than usual postpartum bleeding thought to be related to inadequate uterine contraction were invited to participate, and to sign informed consent. All routine treatment was given from a special 'Postpartum Haemorrhage Trolley'. In addition, participants who consented were enrolled by drawing the next in a series of randomised treatment packs containing either misoprostol 5 x 200 microg or similar placebo, which were given 1 orally, 2 sublingually and 2 rectally.
RESULTS:
With misoprostol there was a trend to reduced blood loss >/=500 ml in 1 hour after enrolment measured in a flat plastic 'fracture bedpan', the primary outcome (6/117 vs 11/120, relative risk [DELETED]; 95% confidence interval 0.21 to 1.46). There was no difference in mean blood loss or haemoglobin level on day 1 after birth < 6 g/dl or blood transfusion. Side-effects were increased, namely shivering (63/116 vs 30/118; [DELETED], 1.50 to 3.04) and pyrexia > 38.5 degrees C (11/114 vs 2/118; [DELETED], 1.29 to 25). In the misoprostol group 3 women underwent hysterectomy of whom 1 died, and there were 2 further maternal deaths.
CONCLUSIONS:
Because of a lower than expected incidence of the primary outcome in the placebo group, the study was underpowered. We could not confirm the dramatic effect of misoprostol reported in several unblinded studies, but the results do not exclude a clinically important effect. Larger studies are needed to assess substantive outcomes and risks before misoprostol enters routine use.

6. Read the following abstract. The Number Needed to Treat for 60% of attempts at sexual intercourse being successful, and the Number Needed to Harm for treatment-related adverse events have been removed. Calculate these values using the information provided in the abstract. Interpret these values and their associated confidence intervals.

Sildenafil (Viagra) for male erectile dysfunction: a meta-analysis of clinical trial reports. Moore RA, Edwards JE, McQuay HJ. BMC Urol 2002: 2(1); 6. [Medline] [Abstract] [Full text] [PDF]

BACKGROUND:
Evaluation of company clinical trial reports could provide information for meta-analysis at the commercial introduction of a new technology.
METHODS: Clinical trial reports of sildenafil for erectile dysfunction from September 1997 were used for meta-analysis of randomised trials (at least four weeks duration) and using fixed or dose optimisation regimens. The main outcome sought was an erection, sufficiently rigid for penetration, followed by successful intercourse, and conducted at home.
RESULTS:
Ten randomised controlled trials fulfilled the inclusion criteria (2123 men given sildenafil and 1131 placebo). NNT or NNH were calculated for important efficacy, adverse event and discontinuation outcomes. Dose optimisation led to at least 60% of attempts at sexual intercourse being successful in 49% of men, compared with 11% with placebo; the NNT was [DELETED] (95% confidence interval 2.3 to 3.3). For global improvement in erections the NNT was 1.7 (1.6 to 1.9). Treatment-related adverse events occurred in 30% of men on dose optimised sildenafil compared with 11% on placebo; the NNH was [DELETED] (4.3 to 7.3). All cause discontinuations were less frequent with sildenafil (10%) than with placebo (20%). Sildenafil dose optimisation gave efficacy equivalent to the highest fixed doses, and adverse events equivalent to the lowest fixed doses.
CONCLUSION:
This review of clinical trial reports available at the time of licensing agreed with later reviews that had many more trials and patients. Making reports submitted for marketing approval available publicly would provide better information when it was most needed, and would improve evidence-based introduction of new technologies.

Odds ratio versus relative risk.

Dear Professor Mean:  There is some confusion about the use of the odds ratio versus the relative risk. Can you explain the difference between these two numbers?

Both the odds ratio and the relative risk compare the likelihood of an event between two groups. Consider the following data on survival of passengers on the Titanic. There were 462 female passengers: 308 survived and 154 died. There were 851 male passengers: 142 survived and 709 died (see table below).

 Alive Dead Total Female 308 154 462 Male 142 709 851 Total 450 863 1,313

Clearly, a male passenger on the Titanic was more likely to die than a female passenger. But how much more likely? You can compute the odds ratio or the relative risk to answer this question.

The odds ratio compares the relative odds of death in each group. For females, the odds were exactly 2 to 1 against dying (154/308=0.5). For males, the odds were almost 5 to 1 in favor of death (709/142=4.993). The odds ratio is 9.986 (4.993/0.5). There is a ten fold greater odds of death for males than for females.

The relative risk (sometimes called the risk ratio) compares the probability of death in each group rather than the odds. For females, the probability of death is 33% (154/462=0.3333). For males, the probability is 83% (709/851=0.8331). The relative risk of death is 2.5 (0.8331/0.3333). There is a 2.5 greater probability of death for males than for females.

There is quite a difference. Both measurements show that men were more likely to die. But the odds ratio implies that men are much worse off than the relative risk. Which number is a fairer comparison?

There are three issues here: The relative risk measures events in a way that is interpretable and consistent with the way people really think. The relative risk, though, cannot always be computed in a research design. Also, the relative risk can sometimes lead to ambiguous and confusing situations. But first, we need to remember that fractions are funny.

Fractions are funny.

Suppose you invested money in a stock. On the first day, the value of the stock decreased by 20%. On the second day it increased by 20%. You would think that you have broken even, but that's not true.

Take the value of the stock and multiply by 0.8 to get the price after the first day. Then multiply by 1.2 to get the price after the second day. The successive multiplications do not cancel out because 0.8 * 1.2 = 0.96. A 20% decrease followed by a 20% increase leaves you slightly worse off.

It turns out that to counteract a 20% decrease, you need a 25% increase. That is because 0.8 and 1.25 are reciprocal. This is easier to see if you express them as simple fractions: 4/5 and 5/4 are reciprocal fractions. Listed below is a table of common reciprocal fractions.

 0.8 (4/5) 1.25 (5/4) 0.75 (3/4) 1.33 (4/3) 0.67 (2/3) 1.50 (3/2) 0.50 (1/2) 2.00 (2/1)

Sometimes when we are comparing two groups, we'll put the first group in the numerator and the other in the denominator. Sometimes we will reverse ourselves and put the second group in the numerator. The numbers may look quite different (e.g., 0.67 and 1.5) but as long as you remember what the reciprocal fraction is, you shouldn't get too confused.

For example, we computed 2.5 as the relative risk in the example above. In this calculation we divided the male probability by the female probability. If we had divided the female probability by the male probability, we would have gotten a relative risk of 0.4. This is fine because 0.4 (2/5) and 2.5 (5/2) are reciprocal fractions.

Interpretability

The most commonly cited advantage of the relative risk over the odds ratio is that the former is the more natural interpretation.

The relative risk comes closer to what most people think of when they compare the relative likelihood of events. Suppose there are two groups, one with a 25% chance of mortality and the other with a 50% chance of mortality. Most people would say that the latter group has it twice as bad. But the odds ratio is 3, which seems too big. The latter odds are even (1 to 1) and the former odds are 3 to 1 against.

Even more extreme examples are possible. A change from 25% to 75% mortality represents a relative risk of 3, but an odds ratio of 9.

A change from 10% to 90% mortality represents a relative risk of 9 but an odds ratio of 81.

There are some additional issues about interpretability that are beyond the scope of this paper. In particular, both the odds ratio and the relative risk are computed by division and are relative measures. In contrast, absolute measures, computed as a difference rather than a ratio, produce estimates with quite different interpretations (Fahey et al 1995, Naylor et al 1992).

Designs that rule out the use of the relative risk

Some research designs, particularly the case-control design, prevent you from computing a relative risk. A case-control design involves the selection of research subjects on the basis of the outcome measurement rather than on the basis of the exposure.

Consider a case-control study of prostate cancer risk and male pattern balding. The goal of this research was to examine whether men with certain hair patterns were at greater risk of prostate cancer. In that study, roughly equal numbers of prostate cancer patients and controls were selected. Among the cancer patients, 72 out of 129 had either vertex or frontal baldness compared to 82 out of 139 among the controls (see table below).

 Cancer cases Controls Total Balding 72 82 154 Hairy 55 57 112 Total 129 139 268

In this type of study, you can estimate the probability of balding for cancer patients, but you can't calculate the probability of cancer for bald patients. The prevalence of prostate cancer was artificially inflated to almost 50% by the nature of the case-control design.

So you would need additional information or a different type of research design to estimate the relative risk of prostate cancer for patients with different types of male pattern balding. Contrast this with data from a cohort study of male physicians (Lotufo et al 2000). In this study of the association between male pattern baldness and coronary heart disease, the researchers could estimate relative risks, since 1,446 physicians had coronary heart disease events during the 11-year follow-up period.

For example, among the 8,159 doctors with hair, 548 (6.7%) developed coronary heart disease during the 11 years of the study. Among the 1,351 doctors with severe vertex balding, 127 (9.4%) developed coronary heart disease (see table below). The relative risk is 1.4 = 9.4% / 6.7%.

 Heart disease Healthy Total Balding 127 (9.4%) 1,224 (90.6%) 1,351 Hairy 548 (6.7%) 7,611 (93.3%) 8,159 Total 675 8,835 9,510

You can always calculate and interpret the odds ratio in a case control study. It has a reasonable interpretation as long as the outcome event is rare (Breslow and Day 1980, page 70). The interpretation of the odds ratio in a case-control design is, however, also dependent on how the controls were recruited (Pearce 1993).

Another situation which calls for the use of odds ratio is covariate adjustment. It is easy to adjust an odds ratio for confounding variables; the adjustments for a relative risk are much trickier.

In a study on the likelihood of pregnancy among people with epilepsy (Schupf and Ottman 1994), 232 out of 586 males with idiopathic/cryptogenic epilepsy had fathered one or more children. In the control group, the respective counts were 79 out of 109 (see table below).

 Children No children Total Epilepsy 232 (40%) 354 (60%) 586 Control 79 (72%) 30 (28%) 109 Total 311 384 695

The simple relative risk is 0.55 and the simple odds ratio is 0.25. Clearly the probability of fathering a child is strongly dependent on a variety of demographic variables, especially age (the issue of marital status was dealt with by a separate analysis). The control group was 8.4 years older on average (43.5 years versus 35.1), showing the need to adjust for this variable. With a multivariate logistic regression model that included age, education, ethnicity and sibship size, the adjusted odds ratio for epilepsy status was 0.36. Although this ratio was closer to 1.0 than the crude odds ratio, it was still highly significant. A comparable adjusted relative risk would be more difficult to compute (although it can be done as in Lotufo et al 2000).

Ambiguous and confusing situations

The relative risk can sometimes produce ambiguous and confusing situations. Part of this is due to the fact that relative measurements are often counter-intuitive. Consider an interesting case of relative comparison that comes from a puzzle on the Car Talk radio show. You have a hundred pound sack of potatoes. Let's assume that these potatoes are 99% water. That means 99 parts water and 1 part potato. These are soggier potatoes than I am used to seeing, but it makes the problem more interesting.

If you dried out the potatoes completely, they would only weigh one pound. But let's suppose you only wanted to dry out the potatoes partially, until they were 98% water. How much would they weigh then?

The counter-intuitive answer is 50 pounds. 98% water means 49 parts water and 1 part potato. An alternative way of thinking about the problem is that in order to double the concentration of potato (from 1% to 2%), you have to remove about half of the water.

Relative risks have the same sort of counter-intuitive behavior. A small relative change in the probability of a common event's occurrence can be associated with a large relative change in the opposite probability (the probability of the event not occurring).

Consider a recent study on physician recommendations for patients with chest pain (Schulman et al 1999). This study found that when doctors viewed videotape of hypothetical patients, race and sex influenced their recommendations. One of the findings was that doctors were more likely to recommend cardiac catheterization for men than for women. 326 out of 360 (90.6%) doctors viewing the videotape of male hypothetical patients recommended cardiac catheterization, while only 305 out of 360 (84.7%) of the doctors who viewed tapes of female hypothetical patients made this recommendation.

 No cath Cath Total Male patient 34 (9.4%) 326 (90.6%) 360 Female patient 55 (15.3%) 305 (84.7%) 360 Total 89 631 720

The odds ratio is either 0.57  or 1.74, depending on which group you place in the numerator. The authors reported the odds ratio in the original paper and concluded that physicians make different recommendations for male patients than for female patients.

A critique of this study (Schwarz et al 1999) noted among other things that the odds ratio overstated the effect, and that the relative risk was only 0.93 (reciprocal 1.07). In this study, however, it is not entirely clear that 0.93 is the appropriate risk ratio. Since 0.93 is so much closer to 1 and 0.57, the critics claimed that the odds ratio overstated the tendency for physicians to make different recommendations for male and female patients.

Although the relative change from 90.6% to 84.7% is modest, consider the opposite perspective. The rates for recommending a less aggressive intervention than catheterization was 15.3% for doctors viewing the female patients and 9.4% for doctors viewing the male patients, a relative risk of 1.63 (reciprocal 0.61).

This is the same thing that we just saw in the Car Talk puzzler: a small relative change in the water content implies a large relative change in the potato content. In the physician recommendation study, a small relative change in the probability of a recommendation in favor of catheterization corresponds to a large relative change in the probability of recommending against catheterization.

Thus, for every problem, there are two possible ways to compute relative risk. Sometimes, it is obvious which relative risk is appropriate. For the Titanic passengers, the appropriate risk is for death rather than survival. But what about a breast feeding study. Are we trying to measure how much an intervention increases the probability of breast feeding success or are we trying to see how much the intervention decreases the probability of breast feeding failure? For example, Deeks 1998 expresses concern about an odds ratio calculation in a study aimed at increasing the duration of breast feeding. At three months, 32/51 (63%) of the mothers in the treatment group had stopped breast feeding compared to 52/57 (91%) in the control group.

 Continued bf Stopped bf Total Treatment 19 (37.3%) 32 (62.7%) 51 Control 5 (8.8%) 52 (91.2%) 57 Total 24 84 108

While the relative risk of 0.69 (reciprocal 1.45) for this data is much less extreme than the odds ratio of 0.16 (reciprocal 6.2), one has to wonder why Deeks chose to compare probabilities of breast feeding failures rather than successes. The rate of successful breast feeding at three months was 4.2 times higher in the treatment group than the control group. This is still not as extreme as the odds ratio; the odds ratio for successful breast feeding is 6.25, which is simply the inverse of the odds ratio for breast feeding failure.

One advantage of the odds ratio is that it is not dependent on whether we focus on the event's occurrence or its failure to occur. If the odds ratio for an event deviates substantially from 1.0, the odds ratio for the event's failure to occur will also deviate substantially from 1.0, though in the opposite direction.

Summary

Both the odds ratio and the relative risk compare the relative likelihood of an event occurring between two distinct groups. The relative risk is easier to interpret and consistent with the general intuition. Some designs, however, prevent the calculation of the relative risk. Also there is some ambiguity as to which relative risk you are comparing. When you are reading research that summarizes the data using odds ratios, or relative risks, you need to be aware of the limitations of both of these measures.

Bibliography

When can odds ratios mislead? Odds ratios should be used only in case-control studies and logistic regression analyses [letter]. Deeks J. British Medical Journal 1998:317(7166);1155-6; discussion 1156-7.

Evidence-based purchasing: understanding results of clinical trials and systematic reviews. Fahey T, Griffiths S and Peters TJ. British Medical Journal 1995:311(7012);1056-9; discussion 1059-60.

Interpretation and Choice of Effect Measures in Epidemiologic Analyses. Greenland S. American Journal of Epidemiology 1987:125(5);761-767.

Male Pattern Baldness and Coronary Heart Disease: The Physician's Health Study. Lotufo PA. Archives of Internal Medicine 2000:160(165-171.

Measured Enthusiasm: Does the Method of Reporting Trial Results Alter Perceptions of Therapeutic Effectiveness? Naylor C, Chen E and Strauss B. American College of Physicians 1992:117(11);916-21.

What Does the Odds Ratio Estimate in a Case-Control Study? Pearce N. Int J Epidemiol 1993:22(6);1189-92.

Likelihood of Pregnancy in Individuals with Idiopathic/Cryptogenic Epilepsy: Social and Biologic Influences. Schupf N. Epilepsia 1994:35(4);750-756.

A Haircut in Horse Town: And Other Great Car Talk Puzzlers. (1999) Tom Magliozzi, Ray Magliozzi, Douglas Berman. New York NY: Berkley Publishing Group.

This webpage was written by Steve Simon on (2001-01-09), edited by Steve Simon, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Ask Professor Mean, Category: Measuring benefit and risk, Category: Statistical evidence

Number Needed to Treat.

Dear Professor Mean, How are patients and their doctors supposed to decide whether a research finding has practical significance? Why don't the medical journals make things clearer?

You're hoping for clarity from medical profession? These are the folks who take a simple ear ache and call it "otitis media." To them, a runny nose is "rhinorhea" and a tummy ache is "gastrointestinal distress." It's enough to make me produce lacrimal secretions.

In fairness to these folks, though, they do realize that practical interpretation of the medical research is difficult. They are trying to change it. There are two important changes that we are starting to see in medical research papers. First, they have learned that you can't ignore the size of the effect and focus only on the statistical significance. Since confidence intervals provide information about both the size and significance, many journals include them instead of p-values.

A second change is the realization that absolute changes in risk are more important than relative changes in risk. A nurse recently informed me that my snoring (oops! sleep apnea) can triple the risk of a stroke (excuse me, a cerebrovascular event) if left untreated. But how serious is that for someone who is only 42 years old and otherwise in good health? Three times nothing is nothing, and three times something very small is still very small. I decided to get treatment, but it was more for helping me and my wife to sleep better than a concern about stroke.

A good measure of the absolute risk is the number needed to treat (NNT). It is the average number of patients that a doctor would need to treat in order to have one additional event occur. A small value (e.g., NNT=2.7) means that a doctor will see a lot of events in very little time. A large value (e.g., NNT=800) means that the doctor will have to treat a large number of patients in order to see a very few events.

When you are measuring an increase in bad events like side effects that might be associated with a treatment, then the number needed to treat is sometimes described as the number needed to harm (NNH). Often you can quantify the tradeoffs between the benefits and side effects of a treatment by comparing the NNT and NNH values.

Some examples

Here are some examples of Numbers Needed to Treat, found at the Bandolier web site (http://www.jr2.ox.ac.uk/bandolier/index.html).

Prevention of post-operative vomiting using Droperidol, NNT=4.4. For every four or five surgery patients treated with Droperidol, you will see one less vomiting incident on average.

Prevention of infection from dog bites using antibiotics, NNT=16. For every 16 dog bites treated with antibiotics, you would see one fewer infection on average.

Primary prevention of stroke using a daily low dose of aspirin for one year, NNT=102. For every hundred patient years of treatment with aspirin, you will see one fewer stroke on average.

Notice that this last event is a rate. Assuming that the rates are reasonably homogenous over time, one hundred patient years is equivalent to following ten patients for a decade. Be careful, of course, of rates that are not homogenous over time. If the rates decline the longer you follow your patients, then the number of events you will see for one hundred patients during their first year of therapy would be quite different from the number of events you would see following ten patients for their first decade of therapy.

Here's another example from the British Medical Journal (Freemantle 1999: 318(7200); 1730-1737). Prevention of cardiac death using beta blockers among patients with previous myocardial infarction, NNT=42. For every 42 patients treated for two years with beta blockers, you would see one fewer death. This is superior to treatment with antiplatelet agents (NNT=153), Statins (NNT=94), or Warfarin (NNT=63), but not as effective as thrombolysis and aspirin for 4 weeks (NNT=24).

Computational Example

To compute the NNT, you need to subtract the rate in the treatment group from the rate in the control group and then invert it (divide the difference into 1).

A recently published article on the flu vaccine showed that among the children who received a placebo, 17.9% later had culture confirmed influenza. In the vaccine group, the rate was only 1.3%. This is a 16.6% absolute difference. When you invert this percentage, you get NNT=6. This means that for every six kids who get the vaccine, you will see one less case of flu on average.

The study also looked at the rate of side effects. In the vaccine group, 1.9% developed a fever. Only 0.8% of the controls developed a fever. This is an absolute difference of 1.1%. When you invert this percentage, you get NNH=90. This means that for every 90 kids who get the vaccine, you will see one additional fever on average.

Sometimes the ratio between NNT and NNH can prove informative. For this study, NNH/NNT=90/6=15. This tells you that you should expect to see one additional fever for every fifteen cases of flu prevented.

Although I am not a medical expert, the vaccine looks very promising because you can prevent a lot of flu events and only have to put up with a few additional fevers. In general, it takes medical judgment to assess the trade-offs between the benefits of a treatment and its side effects. The NNT and NNH calculations allow you to assess there trade-offs.

What if the outcome measure is continuous?

To calculate the NNT or NNH, you need to have a distinct event. With a continuous variable, you could define such an event by setting a cut-off. For example, an intervention to improve breastfeeding rates might improve the average duration of breastfeeding by seven weeks. How would you calculate the NNT for this data? Well, you might declare that you are interested in the proportion of mothers who breastfeed for at least 12 weeks. If you had access to the original data, you would find that 54% of women in the control group and 87% in the treatment group breastfed for at least 12 weeks. This would allow you to compute an NNT of 3. For every three mothers given the new intervention, one additional mother would breastfeed beyond 12 weeks.

The choice of 12 weeks is somewhat arbitrary and you would get different results if you chose a different cut-off, such as 24 weeks. You should choose a value that has clinical relevance to your colleagues.

Calculating the NNT or NNH from a continuous measure using a cutoff is usually impossible to do after the fact. So if you are reading someone else's work and they present the data as a mean difference, you cannot calculate NNT or NNH. You would need additional information, such as the proportions that exceed some threshold, or you would have to make some questionable assumptions, such as normality for the outcome measure.

Summary

Professor Mean explains that the journals are getting better at presenting the practical implications of the research. In particular, they are presenting the number needed to treat, a measure that helps you better understand the practical significance of your research findings. The number needed to treat is the average number of patients that you will have to treat with a new therapy to see one additional success, on average, compared to the standard therapy.

1. 2-way Contingency Table Analysis. John C. Pezzullo. Accessed on 2003-08-11. members.aol.com/johnp71/ctab2x2.html
2. Adjusting the number needed to treat: incorporating adjustments for the utility and timing of benefits and harms. R Riegelman, WS Schroth. Medical Decision Making 1993: 13(3); 247-52. [Medline]
3. Applying evidence to the individual patient. S. E. Straus, D. L. Sackett. Ann Oncol 1999: 10(1); 29-32. [Medline]
4. Basic statistics for clinicians: 3. Assessing the effects of treatment: measures of association [published erratum appears in Can Med Assoc J 1995 Mar 15; 152(6):813]. R. Jaeschke, G. Guyatt, H. Shannon, S. Walter, D. Cook, N. Heddle. Cmaj 1995: 152(3); 351-7. [Medline] [Full text]
5. Benefit-Risk ratios in the assessment of the clinical evidence of a new therapy. AR Willan, BJ O'Brien, DJ Cook. Cont Clin Trials 1997: 18(2); 121-30. [Medline]
6. Beta blockade after myocardial infarction: systematic review and meta regression analysis. Nick Freemantle, J Cleland, P Young, J Mason, J Harrison. British Medical Journal 1999: 318(7200); 1730-1737. [Medline] [Abstract] [Full text] [PDF]
7. Calculating and Using NNTs. Bandolier. Accessed on 2003-06-12. www.jr2.ox.ac.uk/bandolier/Extraforbando/NNTextra.pdf
8. Calculating confidence intervals for the number needed to treat. R. Bender. Controlled Clinical Trials 2001: 22(2); 102-10. [Medline]
9. Calculating the "number needed to be exposed" with adjustment for confounding variables in epidemiological studies. R Bender, M Blettner. Journal of Clinical Epidemiology 2002: 55(5); 525-530. [Medline] [PDF]
10. Calculating the number needed to treat for trials where the outcome is time to an event. D. G. Altman, P. K. Andersen. British Medical Journal 1999: 319(7223); 1492-5. [Medline] [Full text] [PDF]
11. Choice of Effect Measure for Epidemiological Data. SD Walter. Journal of Clinical Epidemiology 2000: 53(9); 931-939. [Medline]
12. Confidence limits made easy: interval estimation using a substitution method. L. E. Daly. Am J Epidemiol 1998: 147(8); 783-90. [Medline]
13. Events per person per year -- a dubious concept. J Windeler, S Lange. BMJ 1995: 310(6977); 454-56. [Medline] [Full text]
14. Expressing the magnitude of adverse effects in case-control studies: "the number of patients needed to be treated for one additional patient to be harmed". L. M. Bjerre, J. LeLorier. British Medical Journal 2000: 320(7233); 503-6. [Medline] [Full text] [PDF]
15. Getting NNTs. Bandolier. Accessed on 2003-07-01. www.jr2.ox.ac.uk/bandolier/band36/b36-2.html
16. Influence of method of reporting study results on decision of physicians to prescribe drugs to lower cholesterol concentration. H. C. Bucher, M. Weinbacher, K. Gyr. British Medical Journal 1994: 309(6957); 761-4. [Medline] [Abstract] [Full text]
17. Interpreting the Number Needed to Treat. Lambert A. Wu, Thomas E. Kottke. Journal of the American Medical Association 2002: 288(7); 830-1. [Medline]
18. Missing the point (estimate)? Confidence intervals for the number needed to treat. N. J. Barrowman. Cmaj 2002: 166(13); 1676-7. [Medline] [Full text] [PDF]
19. Nicotine nasal spray with nicotine patch for smoking cessation: randomised trial with six year follow up. T. Blondal, L. J. Gudmundsson, I. Olafsdottir, G. Gustavsson, A. Westin. British Medical Journal 1999: 318(7179); 285-8. [Medline] [Abstract] [Full text] [PDF]
20. Number needed to harm should be measured for treatments. Arnold Zermansky. British Medical Journal 1998: 317(7164); 1014. [Medline] [Full text]
21. Number needed to screen: development of a statistic for disease screening. C. M. Rembold. British Medical Journal 1998: 317(7154); 307-12. [Medline] [Abstract] [Full text] [PDF]
22. The number needed to treat: a clinically useful measure of treatment effect. R. J. Cook, D. L. Sackett. British Medical Journal 1995: 310(6977); 452-4. [Medline] [Full text]
23. Number needed to treat: Caveat emptor. LA Wu, TE Kottke. Journal of Clinical Epidemiology 2001: 54(2); 111-116. [Medline]
24. Numbers needed to treat derived from meta-analysis. Bruce G. Charlton. British Medical Journal 1999: 319(7218); 1199. [Medline] [Full text]
25. Randomised controlled trial shows that glyceryl trinitrate heals anal fissures, higher doses are not more effective, and there is a high recurrence rate. EA Carapeti, MA Kamm, PJ McDonald, SJ Chadwick, D Melville, RK Phillips. Gut 1999: 44(5); 727-30. [Medline] [Abstract]
26. Recombinant or urinary follicle-stimulating hormone? A cost-effectiveness analysis derived by particularizing the number needed to treat from a published meta-analysis. B. Ola, S. Papaioannou, M. A. Afnan, N. Hammadieh, S. Gimba. Fertil Steril 2001: 75(6); 1106-10. [Medline]
27. Unqualified success and unmitigated failure: number-needed-to-treat-related concepts for assessing treatment efficacy in the presence of treatment-induced adverse events. M Schulzer, GB Mancini. International Journal of Epidemiology 1996: 25(4); 704-12. [Medline]
28. Updated New Zealand cardiovascular disease risk-benefit prediction guide. R. Jackson. Bmj 2000: 320(7236); 709-10. [Medline] [Full text] [PDF]
29. Using numerical results from systematic reviews in clinical practice. H. J. McQuay, R. A. Moore. Ann Intern Med 1997: 126(9); 712-20. [Medline]
30. When should an effective treatment be used? Derivation of the threshold number needed to treat and the minimum event rate for treatment. J. C. Sinclair, R. J. Cook, G. H. Guyatt, S. G. Pauker, D. J. Cook. J Clin Epidemiol 2001: 54(3); 253-62. [Medline]

1. BMJ 1999;318:1548-1551 ( 5 June ) http://bmj.bmjjournals.com/cgi/content/full/318/7197/1548
2. J Clin Epidemiol. 2002 Jan;55(1):102-3. PMID: 11781128.
3. JAMA. 2002 Jun 5;287(21):2813-4. PMID: 12038920

This webpage was written by Steve Simon on 2000-01-27, edited by Steve Simon, and was last modified on 2008-07-14. This page needs minor revisions. Category: Ask Professor Mean, Category: Measuring benefit and risk

Please fill out an evaluation form. Your input is important. These evaluation forms also ensure that we can offer Continuing Medical Education credits for this class.

What now?

Go to the main page of the P.Mean website