Meta-analysis for a diagnostic

Steve Simon

1999/11/15

There is no real consensus yet on how to best combine data from several studies of a diagnostic test. I will outline a few approaches that seem to make sense. In addition to this page

Direct analysis of sensitivity/specificity

The simplest overall estimate of sensitivity (sens) or specificity (spec) is to just combine all the studies in a pot and stir. Just count the number of true positives (tp)

This is equivalent to a weighted average of the individual sensitivities where the weights for each individual study is simply the individual true positive plus false negative values. You would calculate an overall estimate of sp.

The tricky part comes when you try to define a confidence interval for the overall estimate. This confidence interval is effectively a combination of the standard errors that you would assign to each individual study.

A first attempt might be to define the standard error of an individual study using the classic binomial formula. Writing the standard error in terms of true positive and false negative values

The problem with this formula for the standard error is that it gives less weight to studies where sensitivity is close to 50% and greater weight to studies where sensitivity is much smaller than 50% or much larger than 50%. Another problem occurs when one or more of the sensitivities is 100%. The standard error using a binomial distribution equals zero for those studies with 100% sensitivity, which seems at first like a good thing. But when one study has
standard error of zero

One way to avoid some of these problems is to estimate the standard error

Since the numerator is now the same for every study

It’s interesting to note that

which implies that

For a random effects model

Example: In an article describing systematic reviews of diagnostic and screening tests,

data from 20 studies of endovaginal ultrasonography for detecting endometrial cancer are presented. I typed the data in as a comma separated file.

study,tp,fn,tn,fp Abu Hmeidan,81,5,186,273 Auslender,16,0,48,90 Botsis,8,0,14,98 Cacclatore,4,0,30,11 Chan,15,2,15,35 Dorum,12,3,34,51 Goldstein,1,0,16,11 Granberg,18,0,32,125 Hanggi,18,3,13,55 Karlsson (a),112,2,414,601 Karlsson (b),14,1,33,57 Klug,7,1,44,127 Malinova,57,0,26,35 Nasri (a),7,0,14,38 Nasri (b),6,0,24,59 Petrl,18,1,96,35 Taviani,2,0,18,21 Varner,1,1,4,9 Weigel,37,0,91,72 Wolman,4,0,18,32

and here is the R code to read in an compute the meta-analysis models.

library(meta) f0 <- "X:/webdata/EndovaginalUltrasonography.csv" deeks.example.dat <- read.csv(f0) attach(deeks.example.dat) sens <- tp / (tp + fn) sens.overall <- sum(tp) / sum(tp + fn) spec <- tn / (tn + fp) spec.overall <- sum(tn) / sum(tn + fp) par(mar=c(5.1,4.1,0.1,0.1)) plot(1-spec,sens,xlim=0:1,ylim=0:1) points(1-spec.overall,sens.overall,pch="+",cex=2)

The last three lines create a graph of the data

Plotting 1-spec on the x-axis

I experimented with trying to show the confidence limits for each study in the graph itself

The computations for the actual meta-analysis are shown below. The code is a bit cryptic perhaps

`te1 <- sens se1 <- sqrt(sens.overall * (1 - sens.overall) / (tp + fn)) deeks1.ma <- metagen(TE=te1

and here is the output

> deeks1.ma Sensitivity 95%-CI %W(fixed) %W(random) Abu Hmeidan 0.9419 [0.8997; 0.9840] 18.82 10.27 Auslender 1.0000 [0.9022; 1.0978] 3.50 5.62 Botsis 1.0000 [0.8617; 1.1383] 1.75 3.61 Cacclatore 1.0000 [0.8044; 1.1956] 0.88 2.10 Chan 0.8824 [0.7875; 0.9772] 3.72 5.81 Dorum 0.8000 [0.6990; 0.9010] 3.28 5.42 Goldstein 1.0000 [0.6088; 1.3912] 0.22 0.60 Granberg 1.0000 [0.9078; 1.0922] 3.94 5.99 Hanggi 0.8571 [0.7718; 0.9425] 4.60 6.47 Karlsson (a) 0.9825 [0.9458; 1.0191] 24.95 10.77 Karlsson (b) 0.9333 [0.8323; 1.0344] 3.28 5.42 Klug 0.8750 [0.7367; 1.0133] 1.75 3.61 Malinova 1.0000 [0.9482; 1.0518] 12.47 9.37 Nasri (a) 1.0000 [0.8521; 1.1479] 1.53 3.27 Nasri (b) 1.0000 [0.8403; 1.1597] 1.31 2.91 Petrl 0.9474 [0.8576; 1.0371] 4.16 6.16 Taviani 1.0000 [0.7233; 1.2767] 0.44 1.15 Varner 0.5000 [0.2233; 0.7767] 0.44 1.15 Weigel 1.0000 [0.9357; 1.0643] 8.10 8.21 Wolman 1.0000 [0.8044; 1.1956] 0.88 2.10 Number of trials combined: 20 Sensitivity 95%-CI z p.value Fixed effects model 0.9584 [0.9401; 0.9767] 102.6404 < 0.0001 Random effects model 0.9481 [0.9171; 0.9792] 59.8249 < 0.0001 Quantifying heterogeneity: tau^2 = 0.002; H = 1.43 [1.1; 1.85]; I^2 = 51% [18.1%; 70.7%] Test of heterogeneity: Q d.f. p.value 38.75 19 0.0048 Method: Inverse variance method > deeks2.ma Specificity 95%-CI %W(fixed) %W(random) Abu Hmeidan 0.4052 [0.3606; 0.4498] 15.27 5.83 Auslender 0.3478 [0.2665; 0.4292] 4.59 5.46 Botsis 0.1250 [0.0347; 0.2153] 3.73 5.35 Cacclatore 0.7317 [0.5825; 0.8810] 1.36 4.49 Chan 0.3000 [0.1648; 0.4352] 1.66 4.71 Dorum 0.4000 [0.2963; 0.5037] 2.83 5.17 Goldstein 0.5926 [0.4087; 0.7765] 0.90 3.97 Granberg 0.2038 [0.1275; 0.2801] 5.22 5.52 Hanggi 0.1912 [0.0753; 0.3071] 2.26 4.99 Karlsson (a) 0.4079 [0.3779; 0.4379] 33.78 5.93 Karlsson (b) 0.3667 [0.2659; 0.4674] 3.00 5.21 Klug 0.2573 [0.1842; 0.3304] 5.69 5.56 Malinova 0.4262 [0.3039; 0.5486] 2.03 4.90 Nasri (a) 0.2692 [0.1367; 0.4018] 1.73 4.75 Nasri (b) 0.2892 [0.1843; 0.3941] 2.76 5.15 Petrl 0.7328 [0.6493; 0.8163] 4.36 5.43 Taviani 0.4615 [0.3085; 0.6146] 1.30 4.43 Varner 0.3077 [0.0426; 0.5728] 0.43 2.91 Weigel 0.5583 [0.4834; 0.6331] 5.42 5.54 Wolman 0.3600 [0.2248; 0.4952] 1.66 4.71 Number of trials combined: 20 Specificity 95%-CI z p.value Fixed effects model 0.3894 [0.3719; 0.4068] 43.7721 < 0.0001 Random effects model 0.3845 [0.3216; 0.4475] 11.9685 < 0.0001 Quantifying heterogeneity: tau^2 = 0.0172; H = 3.26 [2.77; 3.85]; I^2 = 90.6% [86.9%; 93.2%] Test of heterogeneity: Q d.f. p.value 202.17 19 < 0.0001 Method: Inverse variance method

Notice that there is substantial evidence of heterogeneity in both the sensitivity and specificity values.

Analysis of sensitivity/specificity on the log odds scale

Another approach is to transform the sensitivity/specificity to the log odds scale before entering the data into a meta-analysis model. The log odds transformation is a common transformation for binomial data and serves as the heart of a logistic regression model. The standard error for the log odds sensitivity has a nice simple approximation. To derive this

This formula relies on two things you forgot from your days of calculus

The details are tedious

Compare this to the standard error for sensitivity shown above. The numerator for the standard error has now moved in with its downstairs neighbor

You can simplify this formula further. Note that the denominator of sens~i~ can cancel out the tp~i~+fn~i~ term right next to it. With a bit more algebra

The log odds transformation also has some problems when the sensitivity is 100%. A simple fix is to add an arbitrary constant (usually 0.5) to both the numerator and denominator. Another approach would be to use the more complex formula listed above

Example: Let’s use the example in Deeks 2001 again. Here is the R code to compute log odds and analyze the data in a meta-analysis model. Note that the pmax function replaces the zeros in fn with 0.5.

logit <- function(p) {log(p)-log(1-p)} fn.adj <- pmax(fn,0.5) sens <- tp/(tp+fn.adj) te3 <- logit(sens) se3 <- sqrt(1/tp+1/fn.adj) deeks3.ma <- metagen(TE=te3,seTE=se3,studlab=study,sm="Log Odds Sens") spec <- tn/(tn+fp) te4 <- logit(spec) se4 <- sqrt(1/tn+1/fp) deeks4.ma <- metagen(TE=te4,seTE=se4,studlab=study,sm="Log Odds Spec")

Here is the output. Using the summary function results in a briefer output because the results of individual studies are not shown.

summary(deeks3.ma) Number of trials combined: 20 Log Odds Sens 95%-CI z p.value Fixed effects model 2.4775 [2.0562; 2.8987] 11.5269 < 0.0001 Random effects model 2.4761 [2.0318; 2.9204] 10.9228 < 0.0001 Quantifying heterogeneity: tau^2 = 0.0551; H = 1.03 [1; 1.27]; I^2 = 5.4% [0%; 38.1%] Test of heterogeneity: Q d.f. p.value 20.07 19 0.3901 Method: Inverse variance method summary(deeks4.ma) Number of trials combined: 20 Log Odds Spec 95%-CI z p.value Fixed effects model -0.4277 [-0.5036; -0.3518] -11.0403 < 0.0001 Random effects model -0.5033 [-0.7668; -0.2399] -3.7446 0.0002 Quantifying heterogeneity: tau^2 = 0.292; H = 3.07 [2.58; 3.64]; I^2 = 89.4% [85%; 92.5%] Test of heterogeneity: Q d.f. p.value 178.76 19 < 0.0001 Method: Inverse variance method

You need to do a few additional calculations to get sensitivity transformed back to the original measurement scale. You can define a function in R to do this calculation for you. I call it the expit function

expit <- function(log.odds) {exp(log.odds)/(1+exp(log.odds))}

With this function

attach(deeks3.ma) est.and.cl.fixed <- TE.fixed+c(0,-1.96,1.96)*seTE.fixed round(100*expit(est.and.cl.fixed),1) 92.3 88.7 94.8 est.and.cl.random <- TE.random+c(0,-1.96,1.96)*seTE.random round(100*expit(est.and.cl.random),1) 92.2 88.4 94.9 attach(deeks4.ma) est.and.cl.fixed <- TE.fixed+c(0,-1.96,1.96)*seTE.fixed round(100*expit(est.and.cl.fixed),1) 39.5 37.7 41.3 est.and.cl.random <- TE.random+c(0,-1.96,1.96)*seTE.random round(100*expit(est.and.cl.random),1) 37.7 31.7 44.0

The estimated sensitivity and 95% confidence limits under the fixed effects model are 92.3% (88.7% to 94.8%). The estimates and limits change only slightly under than random effects model. The estimated specificity and 95% confidence limits under the fixed effect model are 39.5% (37.7% to 41.3%). Under the random effects model

Analysis of the diagnostic odds ratio

A third approach is to compute the diagnostic odds ratio

Notice how the denominator looks like we accidentally switched things. That was not a mistake. The diagnostic odds ratio is effectively the odds of TPR (the true positive rate or sens) divided by the odds of FPR (the false positive rate or 1-spec).

The first advantage of this approach is that you can use well-known approaches for combining multiple odds ratios. The other advantage is that is analyzes sensitivity and specificity as a pair. Some studies may exhibit heterogeneity in the individual sensitivity or specificity values because one researcher may have tried to maximize sensitivity at the expense of specificity

Although there are no guarantees

So you might interpret the diagnostic odds ratio as the spread between the two likelihood ratios. If

The book on meta-analysis by Sutton et al suggests that you model the heterogeneity in the diagnostic odds ratio using the following regression model

You might recognize D as the diagnostic odds ratio. The variable S is a bit harder to visualize

This represents the tendency of an individual study to skew the test more towards sensitivity or more towards specificity.

Here’s an example of the problems that can happen when different studies skew more towards sensitivity and others more towards specificity. Imagine a diagnostic test that takes on a range of values. This test follows a bell shaped curve both in the diseased and the healthy populations and the two bell curves are set two standard deviations apart. You could set a cutpoint to maximize specificity

This series of graphs shows what happens across a range of cutpoints.

When you graph the data on an SROC plot

By fitting a model to the diagnostic odds ratio

When you fit the regression model

It’s unclear whether to use a weighted regression model or an unweighted regression model for these data.

fn.adj <- pmax(fn,0.5)
tpr <- tp/(tp+fn.adj)
fpr <- fp/(tn+fp)
d <- logit(tpr)-logit(fpr)
s <- logit(tpr)+logit(fpr)
se.d <- sqrt(1/tp+1/fn.adj+1/tn+1/fp)
w <- 1/se.d^2
unweighted.regression <- lm(d~s)
weighted.regression <- lm(d~s,weights=w)
par(mar=c(5.1,4.1,0.6,0.6))
plot(s,d)
abline(unweighted.regression)
abline(weighted.regression,lty=2)`

For this data set

deeks5.ma <- metagen(TE=d,seTE=se.d,studlab=study,sm="Log Diagnostic Odds Ratio") summary(deeks5.ma) Number of trials combined: 20 Log Diagnostic Odds Ratio 95%-CI z p.value Fixed effects model 1.9772 [1.5400; 2.4145] 8.8633 < 0.0001 Random effects model 1.9732 [1.3618; 2.5847] 6.3249 < 0.0001 Quantifying heterogeneity: tau^2 = 0.6555; H = 1.27 [1; 1.67]; I^2 = 38.4% [0%; 64%] Test of heterogeneity: Q d.f. p.value 30.87 19 0.0418 Method: Inverse variance method

Additional reading