1. Introduction
The ongoing COVID19 pandemic is the most significant pandemic since the 1918 Influenza pandemic. It has already caused over 21 Million confirmed cases and 758,000 deaths^{2}^{2}2The numbers reported are as of August 14, 2020. See https://coronavirus.jhu.edu/map.html and https://nssac.bii.virginia.edu/covid19/dashboard/ for most up to date surveillance information. The economic impact is already in trillions of dollars. As in other pandemics, researchers and public health policy makers are interested in questions such as^{3}^{3}3see https://www.nytimes.com/newsevent/coronavirus, () How did it start? () How is it likely to progress and how can we control it? () How can we intervene while balancing public health and economic impact ? () Why did some countries do better than other countries thus far into the pandemic? In particular, models and their projections/forecasts have received unprecedented attention. With a multitude of modeling frameworks, underlying assumptions, available datasets and the region/timeframe being modeled, these projections have varied widely, causing confusion among endusers and consumers. We believe an overview (nonexhaustive) of the current modeling landscape will benefit the readers and also serve as a historical record for future efforts.
1.1. Role of models
Models have been used by mathematical epidemiologists to support a broad range of policy questions. Their use during COVID19 has been widespread. In general, the type and form of models used in epidemiology depends on the phase of the epidemic. Before an epidemic, models are used for planning and identifying critical gaps and prepare plans to detect and respond in the event of a pandemic. At the start of a pandemic, policy makers are interested in asking questions such as: () where and how did the pandemic start, () risk of its spread in the region, () risk of importation in other regions of the world, () basic understanding of the pathogen and its epidemiological characteristics. As the pandemic takes hold researchers begin investigating: () various intervention and control strategies; usually pharmaceutical interventions do not work in the event of a pandemic and thus nonpharmaceutical interventions are most appropriate, () forecasting the epidemic incidence rate, hospitalization rate and mortality rate, () efficiently allocating scarce medical resources to treat the patients and () understanding the change in individual and collective behavior and adherence to public policies. After the pandemic starts to slow down, modelers are interested in developing models related to recovery and long term impacts caused by the pandemic.
As a result comparing models needs to be done with care. When comparing models: one needs to specify: () the purpose of the model, () the end user to whom the model is targeted, () the spatial and temporal resolution of the model, () and the underlying assumptions and limitations. We illustrate these issues by summarizing a few key methods for projection and forecasting of disease outcomes in the US and Sweden.
Organization. The paper is organized as follows. In Section 2 we give preliminary definitions. Section 3 discusses US and UK centric models developed by researchers at the Imperial College. Section 4 discusses metapopulation models focused on the US that were developed by our group at UVA and the models developed by researchers at Northeastern University. Section 5 describes models developed Swedish researchers for studying the outbreak in Sweden. In Section 6 we discuss methods developed for forecasting. Section 8 contains discussion, model limitations and concluding remarks. In a companion paper that appears in this special issue, we address certain complementary issues related to pandemic planning and response, including role of data and analytics.
Important note. The primary purpose of the paper is to highlight some of the salient computational models that are currently being used to support COVID19 pandemic response. These models, like all models, have their strengths and weaknesses—they have all faced challenges arising from the lack of timely data. Our goal is not to pick winners and losers among these model; each model has been used by policy makers and continues to be used to advice various agencies. Rather, our goal is to introduce to the reader a range of models that can be used in such situations. A simple model is no better or worse than a complicated model. The suitability of a specific model for a given question needs to be evaluated by the decision maker and the modeler.
2. Background: computational methods for epidemiology
Epidemiological models fall in two broad classes: statistical models that are largely data driven and mechanistic models that are based on underlying theoretical principles developed by scientists on how the disease spreads.
Datadriven models use statistical and machine learning methods to forecast outcomes, such as case counts, mortality and hospital demands. This is a very active area of research, and a broad class of techniques have been developed, including autoregressive time series methods, Bayesian techniques and deep learning
[AXR+19, PER20, DKB+19, RMY+19, FCK+18, MUR20]. Mechanistic models of disease spread within a population [ABV+08, NEW03, MV13, EKM+06] use mechanistic (also referred to as procedural or algorithmic) methods to describe the evolution of an epidemic through a population. The most common of these is the SIR type models. Hybrid models that combine mechanistic models with data driven machine learning approaches are also starting to become popular, e.g., [WCM19].2.1. Mass action compartmental models
There are a number of models, which are referred to as SIR class of models. These partition a population of agents into three sets, each corresponding to a disease state, which is one of: susceptible (), infective () and removed or recovered (). The specific model then specifies how susceptible individuals become infectious, and then recover. In its simplest form (referred to as the basic compartmental model) [ABV+08, NEW03, MV13], the population is assumed to be completely mixed. Let , and denote the number of people who are susceptible, infected and recovered states at time , respectively. Let , and ; then,
. Then, the SIR model can be described by the following system of ordinary differential equations
where is referred to as the transmission rate, and is the recovery rate. A key parameter in such a model is the “reproductive number”, denoted by
. At the start of an epidemic, much of the public health effort is focused on estimating
from observed infections [LCC+03].Mass action compartmental models have been the workhorse for epidemiologists and have been widely used for over 100 years. Their strength comes from their simplicity, both analytically and from the standpoint of understanding the outcomes. Software systems have been developed to solve such models and a number of associated tools have been built to support analysis using such models.
2.2. Structured metapopulation models
Although simple and powerful, mass action compartmental models do not capture the inherent heterogeneity of the underlying populations. Significant amount of research has been conducted to extend the model, usually in two broad ways. The first involves structured metapopulation models—these construct an abstraction of the mixing patterns in the population into different subpopulations, e.g., age groups and small geographical regions, and attempt to capture the heterogeneity in mixing patterns across subpopulations. In other words, the model has states for each subpopulation . The evolution of a compartment is determined by mixing within and across compartments. For instance, survey data on mixing across age groups [MHJ+08] has been used to construct age structured metapopulation models [MG09]. More relevant for our paper are spatial metapopulation models, in which the subpopulations are connected through airline and commuter flow networks [BCG+09, VCF+19, GyR+14, ZSC+17, CDA+20].
Main steps in constructing structured metapopulation models: This depends on the disease, population and the type of question being studied. The key steps in the development of such models for the spread of diseases over large populations include

Constructing subpopulations and compartments: the entire population is partitioned into subpopulations , within which the mixing is assumed to be complete. Depending on the disease model, there are compartments corresponding to the subpopulation (and more, depending on the disease)—these represent the number of individuals in in the corresponding state

Mixing patterns among compartments: state transitions between compartments might depend on the states of individuals within the subpopulations associated with those compartments, as well as those who they come in contact with. For instance, the transition rate might depend on for all the subpopulations who come in contact with individuals in . Mobility and behavioral datasets are needed to model such interactions.
Such models are very useful at the early days of the outbreak, when the disease dynamics are driven to a large extent by mobility—these can be captured more easily within such models, and there is significant uncertainty in the disease model parameters. They can also model coarser interventions such as reduced mobility between spatial units and reduced mixing rates. However, these models become less useful to model the effect of detailed interventions (e.g., voluntary home isolation, school closures) on disease spread in and across communities.
2.3. Agent based network models
Agentbased networked models (sometimes just called as agentbased models) extend metapopulation models further by explicitly capturing the interaction structure of the underlying populations. Often such models are also resolved at the level of single individual entities (animals, humans etc.). In this class of models, the epidemic dynamics can be modeled as a diffusion process on a specific undirected contact network on a population – each edge implies that individuals (also referred to as nodes) come into contact^{4}^{4}4Note that though edge is represented as a tuple , it actually denotes the set , as is common in graph theory. Let denote the set of neighbors of . For instance, in the graph in Figure 1, we have and . Node has and as neighbors, so . The SIR model on the graph is a dynamical process in which each node is in one of , or states. Infection can potentially spread from to along edge
with a probability of
at time instant after becomes infected, conditional on node remaining uninfected until time — this is a discrete version of the rate of infection for the ODE model discussed earlier. We let denote the set of nodes that become infected at time . The (random) subset of edges on which the infections spread represents a disease outcome, and is referred to as a dendogram. This dynamical system starts with a configuration in which there are one or more nodes in state I and reaches a fixed point in which all nodes are in states S or R. Figure 1 shows an example of the SIR model on a network.Main steps in setting up an agent based model. While the specific steps depend on the disease, the population, and the type of question being studied, the general process involves the following steps:

Construct a network representation : the set is the population in a region, and is available from different sources, such as Census and Landscan. However, the contact patterns are more difficult to model, as no real data is available on contacts between people at a large scale. Instead, researchers have tried to model activities and mobility, from which contacts can be inferred, based on colocation. Multiple approaches have been developed for this, including random mobility based on statistical models, and very detailed models based on activities in urban regions, which have been estimated through surveys, transportation data, and other sources, e.g., [EKM+06, BBK+09, EGK+04, LNX+05, FLN+20].

Develop models of withinhost disease progression: such models can be represented as finite state probabilistic timed transition models, which are designed in close coordination with biologists, epidemiologists, and parameterized using detailed incidence data (see [MV13] for discussion and additional pointers).

Develop high performance computer (HPC) simulations to study epidemic dynamics in such models, e.g., [BBE+08, BCF+09, DBC+12, GBR+13]. Typical public health analyses involve large experimental designs, and the models are stochastic; this necessitates use of such HPC simulations on large computing clusters.
Such a network model captures the interplay between the three components of computational epidemiology: () individual behaviors of agents, () unstructured, heterogeneous multiscale networks, and () the dynamical processes on these networks. It is based on the hypothesis that a better understanding of the characteristics of the underlying network and individual behavioral adaptation can give better insights into contagion dynamics and response strategies. Although computationally expensive and data intensive, networkbased epidemiology alters the types of questions that can be posed, providing qualitatively different insights into disease dynamics and public health policies. It also allows policy makers to formulate and investigate potentially novel and context specific interventions.
2.4. Models for epidemic forecasting
Like projection approaches, models for epidemic forecasting can be broadly classified into two broad groups: (
) statistical and machine learning based data driven models, () causal or mechanistic models – see [DKB+19, RMY+19, NBR+14, CGS+14, KS19, TCR+17b, BRO20] and the references therein for the current state of the art in this rapidly evolving field.Statistical methods employ statistical and timeseries based methodologies to learn patterns in historical epidemic data and leverage those patterns for forecasting. Of course the simplest yet useful class is called method of analogues. One simply compares the current epidemic with one of the earlier outbreaks and then uses the best match to forecast the current epidemic. Popular statistical methods for forecasting influenza like illnesses (that includes COVID19) include e.g. generalized linear models (GLM), autoregressive integrated moving average (ARIMA), and generalized autoregressive moving average (GARMA) [KS19, 42, CMo20]
. Statistical methods are fast, but they crucially depend on the availability of training data. Furthermore, since they are purely data driven, they do not capture the underlying causal mechanisms. As a result epidemic dynamics affected by behavioral adaptations are usually hard to capture. Artificial neural networks (ANN) have gained increased prominence in epidemic forecasting due to their selflearning ability without prior knowledge (See
[WCM19, WCM20, AXR+19] and the references therein). Such models have used a wide variety of data as surrogates for producing forecasts. This includes: () social media data, () weather data, () incidence curves and () demographic data.Causal models can be used for epidemic forecasting in a natural manner [FCK+18, NBR+14, FHC+18, TLH+17a, CGS+14, YKS17]. These models, calibrate the internal model parameters using the disease incidence data seen until a given day and then execute the model forward in time to produce the future time series. Compartmental as well as agentbased models can be used to produce such forecasts. The choice of the models depends on the specific question at hand and the computational and data resource constraints. One of the key ideas in forecasting is to develop ensemble models – models that combine forecasts from multiple models [CKL+14, RMY+19, YKS17, TLH+17a]. The idea which originated in the domain of weather forecasting has found methodological advances in the machine learning literature. Ensemble models typically show better performance than the individual models.
3. Models from the Imperial College Modeling Group (UK Model)
Background. The modeling group led by Neil Ferguson was to our knowledge the first model to study the impact of COVID19 across two large countries: US and UK, see [FLN+20]. The basic model was first developed in 2005 – it was used to inform policy pertaining to H5N1 pandemic and was one of the three models used to inform the federal pandemic influenza plan and led to the now well accepted targeted layered containment (TLC) strategy. It was adapted to COVID19 as discussed below. The model was widely discussed and covered in the scientific as well as popular press [ADA20]. We will refer to this as the ICmodel.
Model Structure. The basic model structure consists of developing a set of households based on census information for a given country. The structure of the model is largely borrowed from their earlier work, see [HFE+08, FCF+06]. Landscan data was used to spatially distribute the population. Individual members of the household interact with other members of the household. The data to produce these households is obtained using Census information for these countries. Census data is used to assign age and household sizes. Details on the resolution of census data and the dates was not clear. Schools, workplaces and random meeting points are then added. The school data for US was obtained from the National Centre of Educational Statistics, while for UK schools were assigned randomly based on population density. Data on average class sizes and staffstudent ratios were used to generate a synthetic population of schools distributed proportional to local population density. Data on the distribution of workplace size was used to generate workplaces with commuting distance data used to locate workplaces appropriately across the population. Individuals are assigned to each of these locations at the start of the simulation. The gravity style kernel is used to decide how far a person can go in terms of attending work, school or community interaction place. The number of contacts between individuals at school, work and community meeting points are calibrated to produce a given attack rate.
Each individual has an associated disease transmission model. The disease transmission model parameters are based on data collected when the pandemic was evolving in Wuhan; see page 4 of [FLN+20].
Finally, the model also has rich set of interventions. These include: () case isolation, () voluntary home quarantine, () Social distancing of those over 70 years, () social distancing of the entire population, () closure of schools and universities; see page 6 [FLN+20]. The code was recently released and is being analyzed. This is important as the interpretation of these interventions can have substantial impact on the outcome.
Model predictions. The Imperial college (IC Model) model was one of the first models to evaluate the COVID19 pandemic using detailed agentbased model. The predictions made by the model were quite dire. The results show that to be able to reduce to close to 1 or below, a combination of case isolation, social distancing of the entire population and either household quarantine or school and university closure are required. The model had tremendous impact – UK and US both decide to start considering complete lock downs – a policy that was practically impossible to even talk about earlier in the Western world. The paper came out around the same time that Wuhan epidemic was raging and the epidemic in Italy had taken a turn for the worse. This made the model results even more critical.
Strengths and Limitations. IC model was one of the first model by a reputed group to report the potential impact of COVID19 with and without interventions. The model was far more detailed than other models that were published until then. The authors also took great care parameterizing the model with the best disease transmission data that was available until then. The model also considered a very rich set of interventions and was one of the first to analyze pulsing intervention. On the flip side, the representation of the underlying social contact network was relatively simple. Second, often the details of how interventions were represented were not clear. Since the publication of their article, the modelers have made their code open and the research community has witnessed an intense debate on the pros and cons of various modeling assumptions and the resulting software system, see [CHA20]. We believe that despite certain valid criticisms, overall, the results represented a significant advance in terms of the when the results were put out and the level of details incorporated in the models.
4. Spatial metapopulation models: Northeastern and UVA models (US Models)
Background. This approach is an alternative to detailed agent based models, and has been used in modeling the spread of multiple diseases, including Influenza [BCG+09, VCF+19], Ebola [GyR+14] and Zika [ZSC+17]. It has been adapted for studying the importation risk of COVID19 across the world [CDA+20]. Structured metapopulation models construct a simple abstraction of the mixing patterns in the population, in which the entire region under study is decomposed into fully connected geographical regions, representing subpopulations, which are connected through airline and commuter flow networks. Thus, they lack the rich detail of agent based models, but have fewer parameters, and are therefore, easy to set up and scale to large regions.
Model structure. Here, we summarize GLEaM [BCG+09] (Northeastern model) and PatchSim [VCF+19] (UVA model). GLEaM uses two classes of datasets– population estimates and mobility. Population data is used from the “Gridded Population of the World” [LST17], which gives an estimated population value at a minutes of arc (referred to as a “cell”) over the entire planet. Two different kinds of mobility processes are considered– airline travel and commuter flow. The former captures longdistance travel, whereas the latter captures localized mobility. Airline data is obtained from the International Air Transport Association (IATA) [35], and the Official Airline Guide (OAG) [53]. There are about 3300 airports worldwide; these are aggregated at the level of urban regions served by multiple airport (e.g., as in London). A Voronoi tessellation is constructed with the resulting airport locations as centers, and the population cells are assigned to these cells, with a 200 mile cutoff from the center. The commuter flows connect cells at a much smaller spatial scale. We represent this mobility pattern as a directed graph on the cells, and refer to it as the mobility network.
In the basic SEIR model, the subpopulation in each cell is partitioned into compartments and , corresponding to the disease states. For each cell , we define the force of infection as the rate at which a susceptible individual in the subpopulation in cell becomes infected—this is determined by the interactions the person has with infectious individuals in cell or any cell connected in the mobility network. An individual in the susceptible compartment becomes infected with probability and enters the compartment , in a time interval . From this compartment, the individual moves to the and then the compartments, with appropriate probabilities, corresponding to the disease model parameters.
The PatchSim [VCF+19] model has a similar structure, except that it uses administrative boundaries (e.g., counties), instead of a Voronoi tesselation, which are connected using a mobility network. The mobility network is derived by combining commuter and airline networks, to model time spent per day by individuals of region (patch) in region (patch)
. Since it explicitly captures the level of connectivity through a commuterlike mixing, it is capable of incorporating weektoweek and monthtomonth variations in mobility and connectivity. In addition to its capability to run in deterministic or stochastic mode, the open source implementation
[61] allows fine grained control of disease parameters across space and time. Although the model has a more generic force of infection mode of operation (where patches can be more general than spatial regions), we will mainly summarize the results from the mobility model, which was used for COVID19 response.What did the models suggest? GLEaM model is being used in a number of COVID19 related studies and analysis. In [KYG+20] the Northeastern University team used the model to understand the spread of COVID19 within China and relative risk of importation of the disease internationally. Their analysis suggested that the spread of COVID19 out of Wuhan into other parts of mainland China was not contained well due to the delays induced by detection and official reporting. It is hard to interpret the results. The paper suggested that international importation could be contained substantially by strong travel ban. While it might have delayed the onset of cases, the subsequent spread across the world suggest that we were not able to arrest the spread effectively. The model is also used to provide weekly projections (see https://covid19.gleamproject.org/); this site does not appear to be maintained for the most current forecasts (likely because the team is participating in the CDC forecasting group).
The PatchSim model is being used to support federal agencies as well as the state of Virginia. Due to our past experience, we have refrained from providing longer term forecasts, instead focusing on short term projections. The model is used within a Forecasting via Projection Selection approach, where a set of counterfactual scenarios are generated based on ontheground response efforts and surveillance data, and the best fits are selected based on historical performance. While allowing for future scenarios to be described, they also help provide a reasonable narrative of past trajectories, and retrospective comparisons are used for metrics such as ’cases averted by doing X’. These projections are revised weekly based on stakeholder feedback and surveillance update. Further discussion of how the model is used by the Virginia Department of Health each week can be found at https://www.vdh.virginia.gov/coronavirus/covid19datainsights/#model.
Strength and limitations. Structured metapopulation models provide a good tradeoff between the realism/compute of detailed agentbased models and simplicity/speed of mass action compartmental models and need far fewer inputs for modeling, and scalability. This is especially true in the early days of the outbreak, when the disease dynamics are driven to a large extent by mobility, which can be captured more easily within such models, and there is significant uncertainty in the disease model parameters. However, once the outbreak has spread, it is harder to model detailed interventions (e.g., social distancing), which are much more localized. Further, these are hard to model using a single parameter. Both GLeaM and PatchSim models also faced their share of challenges in projecting case counts due to rapidly evolving pandemic, inadequate testing, a lack of understanding of the number of asymptomatic cases and assessing the compliance levels of the population at large.
5. Models by KTH, Umeå and Uppsala researchers (Swedish Models)
Sweden was an outlier amongst countries in that it decided to implement public health interventions without a lockdown. Schools and universities were not closed, and restaurants and bars remained open. Swedish citizens implemented “work from home” policies where possible. Moderate social distancing based on individual responsibility and without police enforcement was employed but emphasis was attempted to be placed on shielding the 65+ age group.
5.1. Simple model
Background. Statistician Tom Britton developed a very simple model with a focus on predicting the number of infected over time in Stockholm.
Model structure. Britton [BRI20] used a very simple SIR general epidemic model. It is used to make a coarse grain prediction of the behaviour of the outbreak based on knowing the basic reproduction number and the doubling time in the initial phase of the epidemic. Calibration to calendar time was done using the observed number of case fatalities, together with estimates of the time between infection to death, and the infection fatality risk. Predictions were made assuming no change of behaviour, as well as for the situation where preventive measures are put in place at one specific time–point.
Model predictions. One of the controversial predictions from this model was that the number of infections in the Stockholm area would quickly rise towards attaining herd immunity within a short period. However, mass testing carried out in Stockholm during June indicated a far smaller percentage of infections.
Strength and Limitations. Britton’s model was intended as a quick and simple method to estimate and predict an ongoing epidemic outbreak both with and without preventive measures put in place. It was intended as a complement to more realistic and detailed modelling. The estimationprediction methodology is much simpler and straightforward to implement for this simple model. It is more transparent to see how the few model assumptions affect the results, and it is easy to vary the few parameters to see their effect on predictions so that one could see which parameteruncertainties have biggest impact on predictions, and which parameteruncertainties are less influential.
5.2. Compartmentalized Models I: FHM Model
Background. The Public Health Authority (FHM) of Sweden produced a model to study the spread of COVID19 in four regions in Sweden: Dalarna, Skåne, Stockholm, and Västra Götaland.[HEA20].
Model structure. It is a standard compartmentalized SEIR model and within each compartment it is homogeneous, so individuals are assumed to have the same characteristics and act in the same way. Data used in the fitting of the model include point prevalences found by PCRtesting in Stockholm at two different time points.
Model predictions. The model estimated the number of infected individuals at different time points and the date with the largest number of infectious individuals. It predicted that by July 1, 8.5% (5.9 – 12.9%) of the population in Dalarna will have been infected, 4% (2.4 – 9.9%) of the population in Skåne will have been infected, 19% (17.7 – 20.2%) of the population in Stockholm will have been infected, and 9% (6.3 – 12.2%) of the population in Västra Götaland will have been infected. It was hard to test these predictions because of the great uncertainty in immune response to SARSCoV2 – prevalence of antibodies was surprisingly low but recent studies show that mild cases never seem to develop antibodies against SARSCoV2, but only Tcellmediated immunity [GRO20].
The model also investigated the effect of increased contacts during the summer that stabilises in autumn. It found that if the contacts in Stockholm and Dalarna increase by less than 60% in comparison to the contact rate in the beginning of June, the second wave will not exceed the observed first wave.
Strength and limitations. The simplicity of the model is a strength in ease of calibration and understanding but it is also a major limitation in view of the well known characteristics of COVID19: since it is primarily transmitted through droplet infection, the social contact structure in the population is of primary importance for the dynamics of infection. The compartmental model used in this analysis does not account for variation in contacts, where few individuals may have many contacts while the majority have fewer. The model is also not age–stratified, but COVID19 strikingly affects different age groups differently; e.g., young people seem to get milder infections. In this model, each infected individual has the same infectivity and the same risk of becoming a reported case, regardless of age. Different age groups normally have varied degrees of contacts and have changed their behaviour differently during the COVID19 pandemic. This is not captured in the model.
5.3. Compartmentalized Models II
Background. A group around statistician Joacim Rocklöv developed a model to estimate the impact of COVID19 on the Swedish population at the municipality level, considering demography and human mobility under various scenarios of mitigation and suppression. They attempted to estimate the time course of infections, health care needs, and the mortality in relation to the Swedish ICU capacity, as well as the costs of care, and compared alternative policies and counterfactual scenarios.
Model structure. [ROC20] used a SEIR compartmentalized model with age structured compartments (059, 6079, 80+) susceptibles, infected, inpatient care, ICU and recovered populations based on Swedish population data at the municipal level. It also incorporated intermunicipality travel using a radiation model. Parameters were calibrated based on a combination of values available from international literature and fitting to available outbreak data. The effect of a number of different intervention strategies were considered ranging from no intervention to modest social distancing and finally to imposed isolation of various groups.
Model predictions. The model predicted an estimated death toll of around 40,000 for the strategies based only on social distancing and between 5000 and 8000 for policies imposing stricter isolation. It predicted ICU cases of upto 10,000 without much intervention and upto 6000 with modest social distancing, way above the available capacity of about 500 ICU beds.
Strength and limitations. The model showed a good fit against the reported COVID19 related deaths in Sweden up to 20th of April, 2020, However, the predictions of the total deaths and ICU demand turned out to be way off the mark.
5.4. Agent Based microsimulations
Background. Finally, [GWv+20, KK20]used an individualbased model parameterized on Swedish demographics to assess the anticipated spread of COVID19.
Model structure. [GWv+20] employed the individual agentbased model based on work by Ferguson et al [FLN+20]
. Individuals are randomly assigned an age based on Swedish demographic data and they are also assigned a household. Household size is normally distributed around the average household size in Sweden in 2018, 2.2 people per household. Households were placed on a lattice using highresolution population data from Landscan and census dara from the Statstics Sweden and each household is additionally allocated to a city based on the closest city centre by distance and to a county based on city designation. Each individual is placed in a school or workplace at a rate similar to the current participation in Sweden.
Transmission between individuals occurs through contact at each individual’s workplace or school, within their household, and in their communities. Infectiousness is thus a property dependent on contacts from household members, school/workplace members and community members with a probability based on household distances. Transmissibility was calibrated against data for the period 21 March – 6 April to reproduce either the doubling time reported using panEuropean data or the growth in reported Swedish deaths for that period. Various types of interventions were studied including the policy implemented in Sweden by the public health authorities as well as more aggressive interventions approaching full lockdown.
Model predictions. Their prediction was that ”under conservative epidemiological parameter estimates, the current Swedish publichealth strategy will result in a peak intensivecare load in May that exceeds prepandemic capacity by over 40fold, with a median mortality of 96,000 (95% CI 52,000 to 183,000)”.
Strength and limitations. This model was based on adapting the well known Imperial model discussed in section 3 to Sweden and considered a wide range of intervention strategies. Unfortunately the predictions of the model were woefully off the mark on both counts: the deaths by June 18 are under 5000 and at the peak the ICU infrastructure had at least 20% unutilized capacity.
6. Forecasting Models
Forecasting is of particular interest to policy makers as they attempt to provide actual counts. Since the surveillance systems have relatively stabilized in recent weeks, the development of forecasting models has gained traction and several models are available in the literature. In the US, the Centers for Disease Control and Prevention (CDC) has provided a platform for modelers to share their forecasts which are analyzed and combined in a suitable manner to produce ensemble multiweek forecasts for cumulative/incident deaths, hospitalizations and more recently cases at the national, state, and county level. Probabilistic forecasts are provided by 36 teams as of July 28, 2020 (there were 21 models as of June 24, 2020) and the CDC with the help of [56] has developed uniform ensemble model for multistep forecasts [12].
6.1. COVID19 Forecast Hub ensemble model
It has been observed previously for other infectious diseases that an ensemble of forecasts from multiple models perform better than any individual contributing model [YKS17]. In the context of COVID19 case count modeling and forecasting, a multitude of models have been developed based on different assumptions that capture specific aspects of the disease dynamics (reproduction number evolution, contact network construction, etc.). The models employed in the CDC Forecast Hub can be broadly classified into three categories, datadriven, hybrid models, and mechanistic models with some of the models being open source.
Datadriven models: They do not model the disease dynamics but attempt to find patterns in the available data and combine them appropriately to make shortterm forecasts. In such datadriven models it is hard to incorporate interventions directly, hence, the machine is presented with a variety of exogenous data sources such as mobility data, hospital records, etc. with the hope that its effects are captured implicitly. Early iterations of Institute of Health Metrics and Evaluation (IHME) model [CMo20]
for death forecasting at state level employed a statistical model that fits a timevarying Gaussian error function to the cumulative death counts and is parameterized to control for maximum death rate, maximum death rate epoch, and growth parameter (with many parameters learnt using data from outbreak in China). The IHME models are undergoing revisions (moving towards the hybrid models) and updated implementable versions are available at
[36]. The University of Texas at Austin COVID19 Modeling Consortium model [WTD+20] uses a very similar statistical model as [CMo20]but employs realtime mobility data as additional predictors and also differ in the fitting process. The Carnegie Mellon Delphi Group employs the well known autoregressive (AR) model that employs lagged version of the case counts and deaths as predictors and determines a sparse set that best describes the observations from it by using LASSO regression
[31]. [18] is a deep learning model which has been developed along the lines of [AXR+19] and attempts to learn the dependence between death rate and other available syndromic, demographic, mobility and clinical data.Hybrid models: These methods typically employ statistical techniques to model disease parameters which are then used in epidemiological models to forecast cases. Most statistical models [CMo20, WTD+20] are evolving to become hybrid models. A model that gained significant interest is the Youyang Gu (YYG) model and uses a machine learning layer over an SEIR model to learn the set of parameters (mortality rate, initial R, postlockdown R) specific to a region that best fits the region’s observed data. The authors (YYG) share the optimal parameters, the SEIR model and the evaluation scripts with general public for experimentation [1]. Los Alamos National Lab (LANL) model [42] uses a statistical model to determine how the number of COVID19 infections changes over time. The second process maps the number of infections to the reported data. The number of deaths are a fraction of the number of new cases obtained and is computed using the observed mortality data.
Mechanistic models: GLEaM and JHU models are countylevel stochastic SEIR model dynamics. The JHU model incorporates the effectiveness of statewide intervention policies on social distancing through the R parameter. More recently model outputs from UVA’s PatchSim model were included as part of a multimodel ensemble (including autoregressive and LSTM components) to forecast weekly confirmed cases.
7. Comparative analysis across modeling types
We end the discussion of the models above by qualitatively comparing model types. As discussed in the preliminaries, at one end of the spectrum are models that are largely data driven: these models range from simple statistical models (various forms of regression models) to the more complicated deep learning models. The difference in such model lies in the amount of training data needed, the computational resources needed and how complicated the mathematical function one is trying to fit to the observed data. These models are strictly data driven and hence unable to capture the constant behavioral adaptation at an individual and collective level. On the other end of the spectrum SEIR, metapopulation and agentbased networked models are based on the underlying procedural representation of the dynamics – in theory they are able to represent behavioral adaptation endogenously. But both class of models face immense challenges due to the availability of data as discussed below.

Agentbased and SEIR models were used in all the three countries in the early part of the outbreak and continue to be used for counterfactual analysis. The primary reason is the lack of surveillance and disease specific data and hence purely data driven models were not easy to use. SEIR models lacked heterogeneity but were simple to program and analyze. Agentbased models were more computationally intensive, required a fair bit of data to instantiate the model but captured the heterogeneity of the underlying countries. By now it has become clear that use of such models for long term forecasting is challenging and likely to lead to misleading results. The fundamental reason is adaptive human behavior and lack of data about it.

Forecasting on the other hand has seen use of data driven methods as well as causal methods. Short term forecasts have been generally reasonable. Given the intense interest in the pandemic, a lot of data is also becoming available for researchers to use. This helps in validating some of the models further. Even so, realtime data on behavioral adaptation and compliance remains very hard to get and is one of the central modeling challenges.
8. Models and Policy making
Were some of the models wrong? In a recent opinion piece^{5}^{5}5Indian Express, July 30, 2020, Professor Vikram Patel of the Harvard School of Public Health makes a stinging criticism of modelling:
Crowning these scientific disciplines is the field of modelling, for it was its estimates of mountains of dead bodies which fuelled the panic and led to the unprecedented restrictions on public life around the world. None of these early models, however, explicitly acknowledged the huge assumptions that were made,
A similar article in NY Times recounted the mistakes in COVID19 response in Europe^{6}^{6}6NY Times July 20, 2020: https://www.nytimes.com/2020/07/20/world/europe/coronavirusmistakesfranceukitaly.html; also see [ABC+20].
Our point of view. It is indeed important to ensure that assumptions underlying mathematical models be made transparent and explicit. But we respectfully disagree with Professor Patel’s statement: most of the good models tried to be very explicit about their assumptions. The mountains of deaths that are being referred to are explicitly calculated when no interventions are put in place and are often used as a worst case scenario. Now, one might argue that the authors be explicit and state that this worst case scenario will never occur in practice. Forecasting dynamics in social systems is inherently challenging: individual behavior, predictions and epidemic dynamics coevolve; this coevolution immediately implies that a dire prediction can lead to extreme change in individual and collective behavior leading to reduction in the incidence numbers. Would one say forecasts were wrong in such a case or they were influential in ensuring the worst case never happens? None of this implies that one should not explicitly state the assumption underlying their model. Of course our experience is that policy makers, news reporters and common public are looking exactly for such a forecast – we have been constantly asked ”when will peak occur” or ”how many people are likely to die”. A few possible ways to overcome this tension between the unsatiable appetite for forecasts and the inherent challenges that lie in doing this accurately, include:

We believe that in general it might not be prudent to provide long term forecasts for such systems.

State the assumptions underlying the models as clearly as possible. Modelers need to be much more disciplined about this. They also need to ensure that the models are transparent and can be reviewed broadly (and expeditiously).

Accept that the forecasts are provisional and that they will be revised as new data comes in, society adapts, the virus adapts and we understand the biological impact of the pandemic.

Improve surveillance systems that would produce data that the models can use more effectively. Even with data, it is very hard to estimate the prevalence of COVID19 in society.
Communicating scientific findings and risks is an important topical area in this context, see [FIS19, MMP20, ADA20, VJ20].
Use of models for evidencebased policy making. In a new book, [KK20], Radical Uncertainty, economists John Kay and Mervyn King (formerly Governor of the Bank of England) urge caution when using complex models. They argue that models should be valued for the insights they provide but not relied upon to provide accurate forecasts. The socalled ”evidence–based policy” comes in for criticism where it relies on models but also supplies a false sense of certainty where none exists, or seeks out the evidence that is desired ex ante – or “cover” – to justify a policy decision. ”Evidence based policy has become policy based evidence”.
Our point of view. The authors make a good point here. But again, everyone, from public to citizens and reporters clamour for a forecast. We argue that this can be addressed in two ways:() viewing the problem from the lens of control theory so that we forecast only to control the deviation from the path we want to follow and () not insisting on exact numbers but general trends. As Kay and King opine, the value of models, especially in the face of radical uncertainty, is more in exploring alternative scenarios resulting from different policies:
a model is useful only if the person using it understands that it does not represent the ”the world as it really is” but is a tool for exploring ways in which a decision might or might not go wrong.
Supporting science beyond the pandemic. In his new book The Rules of Contagion, Adam Kucharski [KUC20] draws on lessons from the past. In 2015 and 2016, during the Zika outbreak, researchers planned largescale clinical studies and vaccine trials. But these were discontinued as soon as the infection ebbed.
This is a common frustration in outbreak research; by the time the infections end, fundamental questions about the contagion can remain unanswered. That’s why building long term research capacity is essential.
Our point of view. The author makes an important point. We hope that today, after witnessing the devastating impacts of the pandemic on the economy and society, the correct lessons will be learnt: sustained investments need to be made in the field to be ready for the impact of the next pandemic.
Concluding remarks The paper discusses a few important computational models developed by researchers in the US, UK and Sweden for COVID19 pandemic planning and response. The models have been used by policy makers and public health officials in their respective countries to assess the evolution of the pandemic, design and analyze control measures and study various whatif scenarios. As noted, all models faced challenges due to availability of data, rapidly evolving pandemic and unprecedented control measures put in place. Despite these challenges, we believe that mathematical models can provide useful and timely information to the policy makers. On on hand the modelers need to be transparent in the description of their models, clearly state the limitations and carry out detailed sensitivity and uncertainty quantification. Having these models reviewed independently is certainly very helpful. On the other hand, policy makers should be aware of the fact that using mathematical models for pandemic planning, forecast response rely on a number of assumptions and lack data to over these assumptions.
Acknowledgments
The authors would like to thank members of the Biocomplexity COVID19 Response Team and Network Systems Science and Advanced Computing (NSSAC) Division for their thoughtful comments and suggestions related to epidemic modeling and response support. We thank members of the Biocomplexity Institute and Initiative, University of Virginia for useful discussion and suggestions. This work was partially supported by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS1633028, NSF DIBBS Grant ACI1443054, NSF Grant No.: OAC1916805, NSF Expeditions in Computing Grant CCF1918656, CCF1917819, NSF RAPID CNS2028004, NSF RAPID OAC2027541, US Centers for Disease Control and Prevention 75D30119C05935, DTRA subcontract/ARA SD0018915TO01UVA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.
References
 [1] . Note: https://github.com/youyanggu/covid19_projections Cited by: §6.1.
 [ADA20] (2020) MODELLING the pandemic the simulations driving the world’s response to covid19. Nature 580 (7803), pp. 316–318. Cited by: §3, §8.
 [AXR+19] (2019) EpiDeep: exploiting embeddings for epidemic forecasting. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, New York, NY, USA, pp. 577–586. External Links: ISBN 9781450362016, Link, Document Cited by: §2.4, §2, §6.1.
 [ABV+08] (2008) Mathematical epidemiology. Vol. 1945, Springer. Cited by: §2.1, §2.
 [ABC+20] (2020) Policy implications of models of the spread of coronavirus: perspectives and opportunities for economists. Technical report National Bureau of Economic Research. Cited by: §8.
 [BCG+09] (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proceedings of the National Academy of Sciences 106, pp. 21484 – 21489. Cited by: §2.2, §4, §4.
 [BBK+09] (2009) Generation and analysis of large synthetic social contact networks. In Winter Simulation Conference, pp. 1003–1014. Cited by: 1st item.
 [BBE+08] (2008) EpiSimdemics: an efficient algorithm for simulating the spread of infectious disease over large realistic social networks. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp. 37. Cited by: 3rd item.
 [BCF+09] (2009) EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems. In Proceedings of the 23rd international conference on Supercomputing, pp. 430–439. Cited by: 3rd item.
 [BRI20] (2020) Basic prediction methodology for covid19: estimation and sensitivity considerations. medRxiv. Cited by: §5.1.
 [BRO20] (2020) Pancasting: forecasting epidemics from provisional data. Ph.D. Thesis, Centers for Disease Control and Prevention. Cited by: §2.4.
 [12] COVID19 forecasthub. Note: https://viz.covid19forecasthub.org/ Cited by: §6.
 [CKL+14] (2014) Forecasting a moving target: ensemble models for ili case count predictions. In Proceedings of the 2014 SIAM international conference on data mining, pp. 262–270. Cited by: §2.4.
 [CHA20] (2020) Critiqued coronavirus simulation gets thumbs up from codechecking efforts. Nature 582 (7812), pp. 323–324. Cited by: §3.
 [CDA+20] (2020) The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid19) outbreak. Science 368 (6489), pp. 395–400. External Links: Document, ISSN 00368075, Link, https://science.sciencemag.org/content/368/6489/395.full.pdf Cited by: §2.2, §4.
 [CGS+14] (2014) Influenza forecasting in human populations: a scoping review. PloS one 9 (4), pp. e94130. Cited by: §2.4, §2.4.
 [CMo20] (2020) Forecasting covid19 impact on hospital beddays, icudays, ventilatordays and deaths by us state in the next 4 months. MedRxiv. Cited by: §2.4, §6.1, §6.1.
 [18] . Note: https://deepcovid.github.io/ Cited by: §6.1.
 [DBC+12] (2012) Enhancing userproductivity and capability through integration of distinct software in epidemiological systems. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 171–180. Cited by: 3rd item.
 [DKB+19] (201908) Realtime epidemic forecasting: challenges and opportunities. Health Security 17, pp. 268–275. External Links: Document Cited by: §2.4, §2.
 [EGK+04] (2004) Modelling disease outbreaks in realistic urban social networks. Nature 429 (6988), pp. 180–184. Cited by: 1st item, 4th item.
 [EKM+06] (2006) Structure of social contact networks and their impact on epidemics. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 70, pp. 181. Cited by: 1st item, §2.

[FHC+18]
(2018)
Calibrating a stochastic, agentbased model using quantilebased emulation
. SIAM/ASA Journal on Uncertainty Quantification 6 (4), pp. 1685–1706. Cited by: §2.4.  [FLN+20] (2020) Report 9: impact of nonpharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand. Cited by: 1st item, 4th item, §3, §3, §3, §5.4.
 [FCF+06] (2006) Strategies for mitigating an influenza pandemic. Nature 442 (7101), pp. 448–452. Cited by: §3.
 [FIS19] (2019) Evaluating science communication. Proceedings of the National Academy of Sciences 116 (16), pp. 7670–7675. Cited by: §8.
 [FCK+18] (2018) Realtime forecasting of infectious disease dynamics with a stochastic semimechanistic model. Epidemics 22, pp. 56 – 61. Note: The RAPIDD Ebola Forecasting Challenge External Links: ISSN 17554365, Document, Link Cited by: §2.4, §2.
 [GWv+20] (2020) Intervention strategies against covid19 and their estimated impact on swedish healthcare capacity. External Links: Document Cited by: §5.4, §5.4.
 [GyR+14] (2014) Assessing the international spreading risk associated with the 2014 west african ebola outbreak. PLoS Currents 6. Cited by: §2.2, §4.
 [GBR+13] (2013) FRED (a framework for reconstructing epidemic dynamics): an opensource software system for modeling infectious diseases and control strategies using censusbased populations. BMC public health 13 (1), pp. 1. Cited by: 3rd item.
 [31] . Note: https://delphi.cmu.edu Cited by: §6.1.
 [GRO20] (2020) Robust t cell immunity in convalescent individuals with asymptomatic or mild covid19. Cell. Cited by: §5.2.
 [HFE+08] (2008) Modeling targeted layered containment of an influenza pandemic in the united states. Proceedings of the National Academy of Sciences 105 (12), pp. 4639–4644. Cited by: 4th item, §3.
 [HEA20] (2020) Estimates of the number of infected individuals during the covid19 outbreak in the dalarna region, skåne region, stockholm region, and västra götaland region, sweden. Public health agency of Sweden. Note: https://www.folkhalsomyndigheten.se/publiceratmaterial/publikationsarkiv/e/estimatesofthenumberofinfectedindividualsduringthecovid19outbreak/ Cited by: §5.2.
 [35] Air Traffic Statistics. Note: https://www.iata.org/en/services/statistics/airtransportstats/Last accessed: April 2020 Cited by: §4.
 [36] . Note: https://github.com/ihmeuw/covidmodelseiirpipeline Cited by: §6.1.
 [KK20] (2020) Managing covid19 spread with voluntary publichealth measures: sweden as a case study for pandemic control. Clinical Infectious Diseases. Cited by: §5.4.
 [KS19] (2019) Nearterm forecasts of influenzalike illness: an evaluation of autoregressive time series approaches. Epidemics 27, pp. 41–51. Cited by: §2.4, §2.4.
 [KK20] (2020) Radical uncertainty: decisionmaking beyond the numbers. W. W. Norton & Company. Cited by: §8.
 [KYG+20] (2020) The effect of human mobility and control measures on the covid19 epidemic in china. Science 368 (6490), pp. 493–497. Cited by: §4.
 [KUC20] (2020) The rules of contagion: why things spread–and why they stop. Basic Books. Cited by: §8.
 [42] LANL covid19 cases and deaths forecasts. Note: https://covid19.bsvgateway.org/ Cited by: §2.4, §6.1.
 [LCC+03] (2003) Transmission dynamics and control of severe acute respiratory syndrome. Science 300 (5627), pp. 1966–1970. Cited by: §2.1.
 [LST17] (2017) High resolution global gridded data for use in population studies. Scientific data 4 (1), pp. 1–17. Cited by: §4.
 [LNX+05] (200508) Containing pandemic influenza at the source. Science 309 (5737), pp. 1083–1087. Cited by: 1st item.
 [MV13] (2013) Computational epidemiology. Communications of the ACM 56 (7), pp. 88–96. Cited by: 2nd item, §2.1, §2.
 [MG09] (2009) Optimizing influenza vaccine distribution. Science 325 (5948), pp. 1705–1708. Cited by: §2.2.
 [MMP20] (2020) Mathematical models to guide pandemic response. Science 369 (6502), pp. 368–369. Cited by: §8.
 [MHJ+08] (2008) Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine 5. Cited by: §2.2.
 [MUR20] (202004) Forecasting the impact of the first wave of the covid19 pandemic on hospital demand and deaths for the usa and european economic area countries. External Links: Document Cited by: §2.
 [NEW03] (2003) The structure and function of complex networks. SIAM review 45 (2), pp. 167–256. Cited by: §2.1, §2.
 [NBR+14] (2014) A systematic review of studies on forecasting the dynamics of influenza outbreaks. Influenza and other respiratory viruses 8 (3), pp. 309–316. Cited by: §2.4, §2.4.
 [53] Official airline guide. Note: https://www.oag.com/Last accessed: April 2020 Cited by: §4.
 [PER20] (202003) An arima model to forecast the spread and the final size of covid2019 epidemic in italy (first version on ssrn 31 march). SSRN Electronic Journal. External Links: Document Cited by: §2.
 [RMY+19] (2019) Accuracy of realtime multimodel ensemble forecasts for seasonal influenza in the u.s.. PLoS Computational Biology 15. Cited by: §2.4, §2.4, §2.
 [56] Reich lab. Note: https://reichlab.io/ Cited by: §6.
 [ROC20] (2020) COVID19 healthcare demand and mortality in sweden in response to nonpharmaceutical (npis) mitigation and suppression scenarios. External Links: Document Cited by: §5.3.
 [TLH+17a] (2017) Epidemic forecasting framework combining agentbased models and smart beam particle filtering. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 1099–1104. Cited by: §2.4.
 [TCR+17b] (2017) A framework for evaluating epidemic forecasts. BMC infectious diseases 17 (1), pp. 345. Cited by: §2.4.
 [VJ20] (2020) Infodemic and risk communication in the era of cov19. Advanced biomedical research 9. Cited by: §8.
 [61] () NSSAC/patchsim: code for simulating the metapopulation seir model.. Note: https://github.com/NSSAC/PatchSim(Accessed on 08/14/2020) Cited by: §4.
 [VCF+19] (2019) Optimizing spatial allocation of seasonal influenza vaccine under temporal constraints. PLoS computational biology 15 (9), pp. e1007111. Cited by: §2.2, §4, §4, §4.
 [WCM19] (2019) DEFSI: deep learning based epidemic forecasting with synthetic information. In AAAI, Cited by: §2.4, §2.
 [WCM20] (2020) TDEFSI: theoryguided deep learningbased epidemic forecasting with synthetic information. ACM Transactions on Spatial Algorithms and Systems (TSAS) 6 (3), pp. 1–39. Cited by: §2.4.
 [WTD+20] (2020) Projections for firstwave covid19 deaths across the us using socialdistancing measures derived from mobile phones. medRxiv. Cited by: §6.1, §6.1.
 [YKS17] (2017) Individual versus superensemble forecasts of seasonal influenza outbreaks in the united states. PLoS computational biology 13 (11), pp. e1005801. Cited by: §2.4, §6.1.
 [ZSC+17] (2017) Spread of zika virus in the americas. 114 (22), pp. E4334–E4343. External Links: Document Cited by: §2.2, §4.
Comments
There are no comments yet.