Criticisms of the IHME COVID-19 model

This blog post was created on 2020-04-23 and last modified on 2020-09-06.

I was asked to review the details of a prominent prediction model for the COVID-19 crisis. This model was developed by the Institute for Health Metrics and Evaluation and is commonly referred to as the IHME model. Here is a bit of background on the IHME model and some resources that discuss this model.

Overview of the IHME model

To avoid too much confusion, I will use the term “IHME model” to refer to the COVID-19 prediction model developed by the Institute for Health Metrics and Evaluation and reserve the term “the Institute” to refer to the organization itself. There are a class of alternative models based on Epidemiologic principles of infectious diseases. They go by a variety of acronyms, but to simplify the language, I will refer to all of these models as SEIR models.

The Institute has been in business since 2007, so they are no Johnny-Come-Lately operation. The page that lists their staff members has well over 400 pictures. Not all of these people are working on COVID-19, I’m sure, but it appears to be a solid team effort.

There is a wonderful sense of openness and transparency about research these days, and the Institute is a good example of this. The Institute shares the code and (when it can) the data on their github site. This allows other researchers to borrow their work and easily adapt it as needed. When the Institute makes changes to their work, anyone can quickly download these updates.

Here are some features of the IHME model that have stirred up so much debate.

The IHME model fits to deaths and not cases

The modelers made a concious decision to use the number of daily deaths to estimate the model parameters rather than the number of new daily infections. The rationale for this choice is that there are substantial variations in testing rates in different regions of the country and different countries of the world. This combined with the large number of people who are asymptomatic leads to huge uncertainties in what the number of new infections is really measuring.

The number of deaths in a region, however, is not perfect itself. Many deaths could be uncounted if they occur at home or among infected but undiagnosed patients. The flip side of the coin is that many people who were old and frail may have died with or without the infection.

There is strong empirical evidence from comparing death rates before and after the COVID-19 pandemic that we are greatly underestimating the number of deaths that we attribute to COVID-19.

Whether the underestimate of COVID-19 cases is worse than the underestimate of COVID-19 deaths is an open question. A key point to remember is that a consistent underestimate (everybody misses 25% of the cases or deaths) is not a serious problem for modeling, but a variable underestimate (some regions miss 10% and other miss 50%) is a big problem.

The alternative SEIR models generally use information from the daily infection rate to estimate parameters and build predictions, though they can incorporate information from the daily death rates as well. I have not seen any SEIR models that do this, but there are many SEIR models out there and I’ve only looked at a few.

The IHME model is atheoretical

The most widely criticized feature of the IHME model is that it is atheoretical. That means that the choices that the modelers based were driven solely by what seemed to fit the data best rather than what might be expected based on how infectious diseases are transmitted.

In contrast, most models of infectious disease epidemics use theoretical models. These models typical involve compartments that include groups of patients. The biggest compartment is the susceptible compartment which represents the vast population of people in an area that might get the disease. A second compartment represents infected individuals. This is just a small number at the start of the epidemic, but they interact with the susceptible population and pull some of them into the infected compartment. The people in the infected department don’t stay infected forever. They move over time into a third compartment that represents people who have recovered and are no longer infectious. You can introduce a fourth compartment, such as exposed but not infected, or a fifth, but a theoretical model will. at a minimum, consider how easily someone gets infected and how long that person stay infectious before recovering.

The rates at which people move from susceptible to infected and from infected to recovered are defined as a system of differential equations. The structure of these differential equations is critical in controlling how quickly the epidemic spreads, and how quickly it subsides.

These theoretical models are often referred to by the names of the compartments, so an SEIR model is one that has compartments for susceptible, exposed, infected, and recovered people.

There are simulation models that are very close in spirit to the SEIR models, but which require a much greater degree of computational power. These simulations create random interactions among thousands or millions of individuals where each interaction has a specified probability of spreading the disease during an interaction between an infected individual and a susceptible individual. Other than the far greater amount of computing power needed, these models have the basic features as the SEIR models.

The conflict between atheoretical models and theoretical models has a long, long history. The atheoretical propents believe in “letting the data speak for itself” while the theoretical proponets believe that models need a sound scientific foundation.

You can see some of this conflict play out in the interview that Dr. Murray has on the FiveThirtyEight website (see below). Dr. Murray argues that the theoretical models have too many assumptions and that modelers should let the actual data determine more of the model. The critics of the IHME model contend that the model changes as more data arrives, leading to unstable estimates. It’s important to note here that the model itself is changing and not just the estimates produced by the model.

You will never resolve the controversy over theoretical versus atheoretical models, but in infectious disease modeling, the theoretical models have several important advantages.

First, the SEIR models are battle-hardened. They have been used successfully many times in the past. Every year a new flu season offers a new test of these models and every year they seem to do reasonably well.

Second, the theoretical models are more transparent. You can explain a theoretical model in a paragraph or two and you can easily visualize how one theoretical model differs from another. The IHME model requires a lengthy appendix and is difficult reading even for a specialist like myself.

Third, the theoretical models allow you to more readily make extrapolations beyond the range of the data. You should always be cautious about extrapolations, but we do this all the time. Car manufactures will offer six year warranties on a car model that is less than six months old. The IHME model cannot extrapolate a decline in the daily death rate for a region that has not yet reached its peak as easily as an SEIR model can. We’ll talk more about this below.

Fourth, the theoretical models allow you to simulate scenarios that have not yet occured, such as what happens when we introduce a vaccine or whether a lack of permanent immunity after recovery will lead to a new wave of infections.

The IHME model is based on the standard bell shaped curve

To model the cumulative number of deaths, the IHME model chose something called the error function, known fondly by its nickname, erf. You may not know erf, but you do know its twin, the normal distribution, given the nikcname of bell. Fitting erf to the cumulative number of deaths is mathematically equivalent to fitting the classic bell shaped curve of the normal distribution to the daily number of deaths.

There is no good theoretical reason to expect death rates in an epidemic to follow a bell shaped curve, but the curve does have a fair amount of flexibility and can fit epidemics that start slowly and epidemics that come on quickly. They can fit death rates that peak at a modest number of daily deaths and death rates that reach a very large number of daily deaths.

The one thing that the bell curve cannot handle, however, is an epidemic that rises quickly to a peak, but then fades slowly. The symmetry of the bell shaped curve is a serious limitation.

The modelers fixed this in a revision. The new model identifies a peak and fits a single bell shaped curve to that peak. But then it allows for a mixture of 12 additional bell shaped curves on either side of the peak that can be weighted to produce an asymmetric curve that rises quickly and falls slowly (or vice versa).

This fixes one problem but it causes another. The large number of extra bell curves greatly increases the chances for overfitting. It looks like the modelers have taken steps to control for overfitting, but the whole concept screams for a simpler solution, such as a cubic spline fit.

The IHME model relies heavily on data from China, South Korea, and Italy

If you are in the early phase of an epidemic, everything looks like an exponential curve. Every day you see more deaths than the previous day. This is what that curve looks like.

But then the curve reaches an inflection point. The daily deaths are still increasing, but you can see a deceleration.

Once you are past the inflection point, you can start to forecast further into the future. You can get an initial estimate of when the number of deaths reachs its maximum and when it will die down to zero again. But these estimates are tricky and the confidence limits are very wide.

What you need is to predict when the epidemic might end is some data on the other side of the mountain, data where the number of daily deaths starts its decline.

Fair enough, but the regions that are still early in the pandemic still want predictions on when they will reach their peak and when they will eventually decline to zero. The IHME model does this through a mixed model approach.

The mixed model is a well-established approach in Statistics. It is used a lot in Epidemiology, especially in small area estimation (getting estimated rates in regions that don’t have a lot of data). The mixed model borrows information from other regions that are (hopefully) similar and provides much more precision than you could get by relying on the one region alone.

For the IHME model, the borrowing of information occurs mostly on the far-side (the declining side) of the daily death rate, This is especially true for those regions that still only have data on the near-side (the rising side) of the daily death rate. These regions that have data on the far-side are mostly from China, South Korea, and Italy.

Does it make sense to base the decline in the daily death rate in Chattahoochee county, Georgia on the decline in the daily death rate in Bergamo, Italy? Well, no not really, but what other choice do you have?

The theoretical models in Epidemiology provide an alternate way to extrapolate the far-side of these curves. The shape of the curves are estimated by the rates at which people move from one compartment to another in the SEIR model and variations on that model. You can get partial information on the various rates from the near-side data and then plug in information from earlier epidemics to help with predictions on the far-side.

In either the IHME model or the SEIR models require you to mix apples and oranges. In the IHME model, the oranges are places like Bergamo, Italy. In the SEIR mode, the oranges are epidemics like H1N1 influenza. The SEIR oranges, though, have one advantage of the IMHE oranges. There are many more SEIR oranges to choose from. The IMHE oranges are, at least for now, limited to China, South Korea, and Italy.

The IHME model has changed over time

All models provide different predictions as you accumulate more data. This leads to an inconsistent message over time and is largely unavoidable in any model. The IHME model, however, has undergone changes in the underlying mechanics that have increased its instability over time. Some of these changes were made to address criticisms in the model and some were made to get better fits to the data. One of the changes was noted above in the replacement of a single bell shaped curve with a weighted average of multiple bell shaped curves.

The desire for consistency can be overrated. Stephen Senn talks about a medical biostatistician who doesn’t believe in America because it wasn’t in Christopher Columbus’s original research plan. Still, the changes are a legitimate concern among some of the critics of the IHME model.

Resources

IMHE documentation of their model

Murray CJL. Forecasting the impact of the first wave of the COVID-19 pandemic on hospital demand and deaths for the USA and European Economic Area countries. Preprint, 2020-04-21. Available in pdf format.

At the time of the writing of this webpage, this was the most current description of the IHME model. The appendices provide many of the technical details needed to understand the basis of this model.

Murray CJL. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months. Preprint, 2020-03-26. Available in pdf format.

This article describes an earlier model IHME model.

Xu B, Gutierrez B, Mekaru S, Sewalk K, Goodwin L, Loskill A, Cohn EL, Hswen Y, Hill SC, Cobo MM, Zarebski AE, Li S, Wu C-H, Hulland E, Morgan JD, Wang L, O’Brien K, Scarpino SV, Brownstein JS, Pybus OG, Pigott DM, Kraemer MUG. Epidemiological data from the COVID-19 outbreak, real-time case information. Scientific Data, 2020-03-24. doi:10.1038/s41597-020-0448-0. Available in html format or pdf format.

This article describes how the initial datasets used by the IHME model were collected. In a rapidly changing environment, some of the details may have changed, and at least one of the links in the article points to a gitrhub page that no longer exists. But most of the links are good. The breadth of the data collection is very impressive and the authors do a good job documenting everything.

IHME COVID-19 model FAQs. Available in html format.

This webpage talks in non-technical terms about some of the choices that the researchers made for the IHME model.

Articles in the popular press

Aizenman N. 5 Key Facts Not Explained In White House COVID-19 Projections. National Public Radio 2020-04-01. Available in html format.

The focus of this article is on the White House press briefing and the uncertain source of some of the numbers used in that briefing. It provides, however, a good amount of detail about the IHME model and contrasts it with a model based on infectious disease theory, the Imperial College model.

Bui Q, Katz J, Parlapiano A, and Sanger-Katz, M. What 5 Coronavirus Models Say the Next Month Will Look Like. The New York Times, 2020-04-22. Available in html format.

This article got me interested in the differences of the various models and has links that provide more details about the IHME model and several different SEIR models in greater detail.

Chow D. What we know about the coronavirus model the White House unveiled. NBC News, 2020-04-01. Available in html format.

This article was also written in response to that White House press briefing. It emphasizes the many sources of uncertainty in these predictions.

Articles in the peer-reviewed literature

Jewell NP, Lewnard JA, Jewell BL. Caution Warranted: Using the Institute for Health Metrics and Evaluation Model for Predicting the Course of the COVID-19 Pandemic. Ann Intern Med. 2020. Available in html format.

This is a criticism of the IHME model by prominent researchers published in a prominent journal. The authors seem to throw everything they can at the model and not everything sticks. They note that the IHME model fails to account for the rebound that might occur if social distancing efforts are relaxed too soon, but the IHME model clearly mentions this in a disclaimer. It criticizes the symmetry of the IHME model, which was an issue with earlier versions, but which is not a problem with the current model. It criticizes the unreliability of the data, but this unreliability is an issue with any model. It mentions that the model fails to capture all sources of uncertainty, but this is also an issue with the competing models. Where the criticism hits the mark is in how much this model relies on extrapolations from places that have data well past the peak of maximum cases/deaths (such as Italy, China, and South Korea) to places that have not yet reached their peak or are just barely past their peak.

Chowell G, Sattenspiel L, Bansal S, Viboud C. Mathematical models to characterize early epidemic growth: A review. Phys Life Rev. 2016;18:66?97. doi:10.1016/j.plrev.2016.07.005. Available in html format or pdf format.

This article was written before the COVID-19 pandemic. It provides a nice review of how SEIR models work and applies them to a variety of pandemics.

Kissler SM, Tediganto C, Goldstein E, Grad YH, Lipsitch M. Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period. Science 2020-04-14. DOI: 10.1126/science.abb5793, Available in html format.

This article compares SARS-CoV-2 (the virus that causes COVID-19) to previous viruses to assess the chances that we will see future seasonal waves of COVID-19. This is an example of the type of modeling that can only be done with a theoretical model.

FRED Modeling System. Available in html format.

FRED stands for Framework for Reconstructing Epidemic Dynamics and represents a software system that uses simulation to model epidemics. It has been used to model outbreaks of measles and influenza, but not yet (as far as I can tell) for COVID-19. It is available for anyone to download and use on their own.

Ayoub HH, Chemaitelly H, Mumtaz GR, Seedat S, Awad SF, Makhoul M, Abu-Raddad LJ. Characterizing key attributes of the epidemiology of COVID-19 in China: Model-based estimations. Preprint published in Medrxiv, 2020-04-11. Available in pdf format.

This article fits an SEIR model to the Chinese COVID outbreak. It stratifies by age and outlines the differential equations in great clarity.

Alternative prediction models

There are many prediction models out there and each one seems to take a different approach. I’m just getting started but I want to provide a brief summary of some of these models.

Imperial College of London

US Predictions

Iowa State

Wang L, Wang G, Gao L, Li X, Yu S, Kim M, Wang Y, Gu Z. Spatiotemporal Dynamics, Nowcasting and Forecasting of COVID-19 in the United States. Preprint published in Arxiv, 2020-04-30. Available in pdf format.

This article fits an SEIR model to individual counties in the United States with a large number of county specific covariates. The model produces a nice dashboard.

Coda

I am not an expert in infectious disease epidemiology. In fact, I’m not much of an expert in anything. I am more or less a generalist. I know a little bit about a lot of different things. I can’t really compete with teams of researchers who have been modeling epidemics for many decades. If I can contribute something, though, it is in a critical appraisal of the various prediction models that are out there for COVID-19.

A critical appraisal is not a criticism. It is not a listing of all the faults that a model might have. Rather, it is a description of the features that a model has. It explains the rationale for these features and contrasts them with alternatives. It presents both the weaknesses AND THE STRENGTHS of a model.

Everyone, including myself, gets trapped into a hypercritical approach to critical appraisal. The easiest thing in the world is to build a laundry list of faults. This can lead to a sense of cynicism towards all research, as all research has flaws. A good critical appraisal compares not to a standard of perfection in modeling, but rather to other practical modeling approaches.

I hope I have not been too hypercritical in my review of the IHME model. Unfortunately, if there is anything I’ve learned over the years, it is a sense of humility about my own limitations. The faults I point out about others are the faults I suffer from myself. I can’t get rid of my faults, but I can try to mitigate them.

One of the sites listed above linked to an excellent article that got me to think a bit on the philosophical side of research.

Gog JR. How you can help with COVID-19 modelling. Nat Rev Phys, 2020-04-08. https://doi.org/10.1038/s42254-020-0175-7. Available in html format and pdf format.

Acknowledgments

Thanks to Chris Barker and Ruth Etzioni for helpful suggestions.

Steve Simon