According to the International Monetary Fund (IMF), the COVID-19 pandemic could cost the world economy up to $9 Trillion USD, nearly the combined gross domestic product (GDP) of Japan and Germany, or roughly half that of the US.111https://blogs.imf.org/2020/04/14/ Since the first official case was confirmed in Wuhan, China, over 2 million individuals have been infected at the time of writing, leading to over 150 thousand deaths.222https://coronavirus.jhu.edu Compared to such diseases as the 2003 SARS-CoV, the H1N1 influenza A, and the Ebola virus, COVID-19 shows strong infectiousness, with a reproductive number, , according to some studies (liu2020reproductive). In the absence of a specific treatment or vaccines, social distancing becomes the most effective strategy to protect from the virus (NPIS, Intervention). Prolonged social distancing has other side-effects, both socially and economically (BROOKS2020912, NBERw26867), leading to mounting pressure in the US to relax current policies. It is important to reach a finer-grained understanding of available crisis management options that minimize side-effects but do not overwhelm the health system.
We report a novel algorithm for modeling the impact of different social distancing policies on the future spread of COVID-19. The model is customizable to local populations using a simple latent parameter estimation technique that uses, for input, published contagion data and local statistics. The goal is to answer the question of “return to normal”. When is it safe to do which steps towards restoration of some elements of normalcy without risking another wave of the pandemic? How to customize the answers to the special circumstances of different regional populations?
The work borrows insights from information cascade propagation (abdelzaher2020multiscale). Information (similarly to viral contagion) propagates through broadcast channels. In the information space, a broadcast channel might be a Facebook wall, online subreddit, or virtual “hangout”. These virtual spaces create opportunities for information transmission among individuals who frequent them. In the world of viral contagion, physical spaces, such as stores, offices, public transport, and family residences, serve the role of broadcast channels. We shall henceforth call them, social mixing domains. Social distancing policies manipulate the availability of some of these domains. Information cascade models predict what happens to propagation when the underlying broadcast channels are manipulated. Leveraging this analogy, we capture the effects of a wide playbook of social distancing options on contagion dynamics.
The rest of the paper is organized as follows. Section 2 puts our model in the context of related work. Section 3 summarizes the main result and significance of the work. Section 4 elucidates the concept of mixing domains. Section 5 describes mixing domain parameter estimation, and presents the key analysis results and predictions. We conclude our paper in section 6.
2 Related Work
The work is new in developing a mesoscopic epidemiological model in which the fundamental abstraction is neither the individual agent (node) nor an entire community, but rather an element in between: a single broadcast channel or mixing domain. These domains vary in prevalence from small units such as individual households (of which there are many) to very large units such as concerts, malls, and large gatherings. Often popularity or size distribution of human and social artifacts follows standard profiles, such as Zipf Law (li2002zipf), resulting on the same striking statistical distribution regularity in fields as diverse as linguistics (powers1998applications, dahui2005true), urban populations (soo2005zipf, moura2006zipf), business income distributions (okuyama1999zipf), and the Internet (adamic2002zipf). Recent executive orders in the US, in effect, manipulate the distribution of these domains by closing or reopening some of them. For example, closing all venues larger than a given threshold will effectively remove the tail of the size distribution. Our mesoscopic model, therefore allows us to link social distancing policy decisions to the distribution of remaining available mixing domains, which in turn allows estimating future viral spread. The work aims to inform decision-making on crisis mitigation policies.
Mesoscopic models are a middle ground between two well-populated extremes in current literature; namely, agent-centric models and population-centric models (representing microscopic and macroscopic models, respectively). A recent survey discusses these existing models and the corresponding cascade mitigation policies they allow (nowzari2016analysis). On one hand, agent-centric models (IC, realNetoworkEigenvalue, PhysRevE.63.066117, urbanNetwork) start with the behavior of individual agents, as well as their connectivity graphs. They enable reasoning about fine-grained mitigation strategies, such as inoculation of specific agents to reduce disease spread. Agent-based models can also be used for detailed simulations to understand the impact of a large variety of detailed interventions. While very powerful and versatile, they require inputs that are difficult to collect, such as the interaction graph of all agents in the system. This limitation often renders them less suitable in practice. On the other hand, population-centric models, such as SIR (SIR), SEIR (SEIR), SIS (SIS), and SQIS (SQIS), focus on the total population. They reason about statistics of entire communities, such as the total number of infected, susceptible, and recovered individuals. They can also model high-level mitigation strategies such as inoculation of a specified fraction of the entire community (i.e., their removal from the susceptible list). These models, however, do not offer a clear way of reasoning about impacts of finer grained decisions, such as closure of some fraction of businesses or meeting venues. Macroscopic methods for COVID-19 trend prediction (NBERw26901, EARLY, TimeSIR, dataSIR, MLSIR, SEIR-covid19, Nesteruk2020.02.12.20021931) thus lack the ability to forecast effects of different policies.
To enable a more detailed analysis of epidemics, researchers extended the population-centric models by dividing the whole population into several groups, to form a finer-grained basis for analysis (complexNetworkEpidemic). Heterogeneous mixing approaches (HeterogeneousEpidemic, YANG2007189, complexHeterogeneousEpidemic, Bogua2003) split individuals by their contact degrees. For a given degree, they use three differential equations to describe the evolution of three states. Age-structured epidemiological models (singh20, ageSIR, Victor2020.03.28.20046300) assume different properties of people in different age groups. The epidemics are then governed by several sets of differential equations and some mixing policies. Some extensions (individualSIR) consider the state of each individual and study the state evolution during the infection process.
Unlike the above solutions that are based on partitioning of individuals or communities (by some demographic, geographic, of interaction-based pattern), we borrow inspiration from social media to focus on social mixing domains instead. Interactions need venues to facilitate them. The interaction patterns are thus a function of social mixing domains (i.e., the venues) that remain available. To the best of our knowledge, the advantage of our approach lies in its venue-centric model, as opposed to the more common community-centric or agent-centric models.
In summary, our mesoscopic model recognizes that both agent interaction and contagion rate are functions of available mixing domains. By deriving these functions, the model is equipped to reason about the impact of social distancing.
3 The Main Result
The main contribution of this work is the proof of the mixing theorem, stated below, and its application to investigate the impact of COVID-19 mitigation strategies.
The Mixing Theorem: Consider a geographic region where the total population is broken, for analysis purposes, into a set of non-overlapping groups (e.g., employed and unemployed, or minors, adults, and seniors). Individuals split their time among social mixing domains, such that individuals in group spend, on average, a fraction, of their time in domain (some fractions could be zero). Let be the average occupancy of domain , normalized to population size. Let be the average fraction of occupancy of domain who are from group , and let
be the probability that an infected person in domainwho encounters a member of group successfully infects them (i.e., transmissibility). This probability might vary across groups, for example, if some groups were more susceptible. Then, the equivalent overall transmissibility of the virus in the community is:
The above result is derived by analyzing the interactions between the susceptible and the infected individuals. It is valid for epidemiological models that include the susceptible and infected states, such as the SIS, SIR, and SEIR models. In the following, we derive the result in the context of SIR. The reader may convince themselves that the same theorem holds in other models, such as SIS, and SEIR. Below, for simplicity, let us consider an SIR model given by:
where , , , and are the susceptible, infected, recovered/removed, the recovery rate and the total population, respectively. Table 1 defines the used terminology. We also demonstrate the application of this result, as well as several simplified forms thereof (corollaries) that offer easier-to-compute approximate answers.
Significance: The result allows predicting the impact (on COVID-19 propagation in different communities) resulting from specified changes in social distancing policies that manipulate the availability of specific mixing domains, such as closure/resumption of business for some part of the workforce, changes in business opening hours, closure/resumption of schools, or cancellation of events above a specified size. The produced estimates allow making more informed policy choices as pressure mounts to relax some of the current distancing restrictions, while no treatment or vaccine are available.
4 The Prediction Model
The standard susceptible-infected-recovered (SIR) model is an epidemiological model that computes the theoretical number of those infected as a function of time. We begin by extending this model due to its simplicity. In the basic model, three differential equations relate the number of susceptible individuals, , the number of infected individuals, , and number of recovered individuals, :
In this model, the total population is , and is the effective rate of spread. Recovery rate, , is the rate at which infected individuals recover.
|Total community size.|
|Total number of mixing domains.|
|Total number of susceptible individuals at time .|
|Total number of infected individuals at time .|
|Total number of recovered individuals at time .|
|The th Mixing domain.|
|Number of members in domain|
|Per person rate of spread (over all encounters) in|
|domain , per unit time|
|Transmissibility per encounter in domain , per|
|The average time a member spends in domain .|
|Expected number of susceptible individuals in domain|
|at time .|
|Expected number of infected individuals in domain|
|at time .|
|The ratio .|
|The ratio .|
|The ratio .|
4.1 A Theoretical Model of Mixing Domains
Now assume that a closed region consists of mixing domains, as shown in Fig. 1 (e.g., individual residences, stores, transport vehicles, offices, clubs, etc). An individual divides their time among several domains (e.g., home, office, and other outlets). We call individuals who visit domain, , members of . Let be the average occupancy of at a given time. Let be the average time a member spends in domain .
Let us further denote the rate of transmission from one infected individual (to susceptible individuals) per unit time (say one day) in domain by , naturally . Let the expected number of susceptible and infected individuals in the -th domain, at time , be denoted by and , respectively. Furthermore, let us define and . The fractions, and are, respectively, the expected fraction of all those susceptible and the expected fraction of all those infected, who are members of mixing domain . The new domain-specific SIR differential equations can now be written as:
From the basic derivation of an SIR model, the above equations roughly assume that each infected individual in makes, say, encounters in the domain per unit time, of which, therefore, are in susceptible population ( is the probability of the susceptible in ) . If the probability of transmission per encounter is , then each infected individual passes the virus to others, leading to the above questions, where . In our analysis, we assume that the number of other encountered individuals in a domain grows with the size of the domain (e.g., one passes more people in a conference than in a small party). Thus, grows proportionally to , whereas (which can be redefined to absorb the proportionality constant) is generally higher for smaller domains, since people tend to have closer (and/or longer) encounters in smaller groups. Thus, we can rewrite as:
where is transmissibility (per encounter with another individual) within a domain, which tends to be higher (due to closer and longer encounters) for smaller domains. Substituting in Equations (4), (5), and (6), we thus get:
Adding up over all domains, we get:
Perfect Mixing Approximation.
Let us briefly discuss the implications of the above equations. In a system, where everyone is restricted to small domains that are perfectly quarantined (for example, restricted to their family residences under strict quarantine), the ratio of infected in the quarantine zone, , will be disproportionately higher than what domain size might predict this ratio to be (i.e., ). Similarly, the same ratio outside the quarantine zone will be lower. In our analysis, however, we assume that strict quarantine is no longer socially viable. Instead, individuals from different domains will mix (in other domains). For example, individuals from different households might mix in the same grocery store or office and individuals from different offices might mix at the same bus stop. (Of course, the opportunities for mixing are constrained by available mixing domains.) We assume that mixing fails to localize infections in any subset of domains, and instead spreads the infection as broadly as possible. The above mixing assumption leads to an important worst-case approximation. Namely, if mixing is perfect, the expected number of susceptible (infected) individuals in a domain is roughly proportional to the size of the domain. More specifically:
We consider this a worst-case approximation because the resulting analysis tends to maximize estimates of total spread. If mixing is imperfect, then virus spread will slow down sooner in more heavily impacted domains (due to scarcity of remaining susceptible individuals), while it will also proceed at a slower rate in other domains (due to scarcity of infected individuals). The worst-case assumption is helpful from the perspective of erring on the safe side. Let us define to be the fraction . From the above, we get:
Observe that the above equations have the same form as the basic SIR model. Thus, can be interpreted as the equivalent transmissibility of the virus by considering all mixing domains. The above equations, in fact, are a proper generalization of an SIR model to the case of multiple mixing domains.
Note that, Equation (20) has the interesting property that infection transmissibility, , depends on the overall distribution of quantities , , and , across the mixing domains. By manipulating some of the domains (e.g., closing them), one can thus reduce the equivalent transmissibility, , of the disease. For example, reducing the fraction of time, , that is spent in domains with a higher (i.e., bigger domains, where is high) can lead to a reduction in . Intuitively, this is the motivation for social distancing policies that prevent large meetings, close non-essential businesses, move towards remote instruction, and implement curb-side pickup (e.g., instead of dine-in) alternatives.
Now let us break the local population into a set of non-overlapping groups. Let denote the average amount of time that individuals in group spend in domain (some fractions could be zero). Also, let be the average fraction of occupancy of domain who are from group , and let be the probability that an infected person in domain who encounters a member of group successfully infects them (i.e., transmissibility). One can thus approximately rewrite the product as a weighted sum of per-group products:
This equation is the statement of the mixing theorem ∎
4.2 The Mitigation Policy
Consider some mitigation policy, , that affects the availability of different mixing domains. Let and be the fractional amount of time and fractional membership of the domain after the policy is implemented. Thus:
In general, announced social distancing measures might not come into effect immediately. In the model, we may consider an exponential function by which changes to . Thus, when a new policy is announced, starting at time , we can rewrite:
In our model, the parameters and can be estimated from past time-series data of the contagion cascade (say, before a mitigation policy is implemented). After the first change in policy, parameter can generally be assumed, as it is the inverse of the convergence time-constant. The fraction is then computed from Equations (20) and (22), by accounting for the implemented policy. To do so, it is useful to remember that and represent the average fraction of time an individual spends in domain , before and after a mitigation policy is implemented.
The experiments are conducted basically on the COVID-19 data in terms of Illinois, by considering from a fine-granularity, our model outperforms regular SIR model. We also analyze more than 30 highly impacted states (in terms of COVID-19 pandemic) in the US and obtain similar results. While the data shown in the experiment is from the end of March, 2020, and beginning of April 2020, readers could refer to our public website333https://covid19predictions.csl.illinois.edu/ for up-to-date predictions for more states.
The goal of the experiments is to demonstrate accuracy of prediction, especially in “what if” scenarios. Since we cannot perform empirical validation of counter-factual scenarios, we instead focus on predicting the contagion trend that follows a policy change using data measured before that change only. In essence, the prediction answers the question: what if policy was changed (to the policy that was subsequently implemented)? The prediction is shown to match the actual impact of the policy after implementation.
The government of Illinois announced the “social distancing" measures on March 21st, 2020, which limits certain social interactions and urges self-quarantine. The policy changes people’s mobility and the landscape of COVID-19 spread. We refer to Google Mobility Report444https://www.google.com/covid19/mobility/, US Bureau of Labor Statistics555https://www.bls.gov/news.release/metro.t01.htm and US Census Bureau666https://www.census.gov/data/datasets/time-series/demo/popest/2010s-state-total.html, for the total population, total labor force, and mobility trends of Retail & Recreation, Grocery & Pharmacy, Parks, Transit stations, Workplaces and Residential in Illinois. We also assume that only the employed people will be exposed in workplace domain, and every citizen in Illinois will be exposed to other five domains. Overall, the computed fraction before and after the policy is .
5.2 Experimental Setup.
We apply our model on the cumulative confirmed cases in Illinois starting from the date when the confirmed cases are larger than 10, which is March 10th, 2020. In the experiment, regular SIR and our broadcast SIR are compared for two scenarios. The first evaluation emphasizes the “what-if” condition. We try to present how well our model could predict after social distancing measures. Both models are trained before March 21st (the policy announcement) and predict for the next two weeks afterwards. Secondly, we compare the prediction accuracy using one-week data as test (April 07th - April 13rd) and the rest of the history (March 10th - April 06th) as training. For our model, we provide pessimistic Prediction, where , as computed above, and optimistic Prediction, where we assume closure of all non-necessary businesses and transportation (i.e., the occupancy for retail & recreation, parks, and transit stations are set to , grocery & pharmacy is cut by half). In the latter case, . An average prediction is taken as their average.
5.3 Experiment 1: The initial response to social distancing.
We attempt the important “what-if" analysis and ask: What If the Social Distancing Measures Take Effect Tomorrow? In this experiment, we fit the time series right before the announcement of social distancing and predit for the consecutive two weeks in Figure 2. The training data consists of a 12-day time series, and social distancing happens on March 21st. Each curve fits the ground truth data well during the training period. Since regular SIR model is agnostic to the policy, it (blue curve) increases exponentially and shows the consequences without social distancing. By considering the policy and utilizing external sources, we quantify the change of infectious broadcast domains and generate the pessimistic/optimistic ratio, 0.7829 and 0.5668, respectively. With these two ratios and the mitigation function (in Section 4.2), our domain-specific SIR (pink curve) provides a more realistic and relatively accurate prediction.
|Model||Apr. 07||Apr. 08||Apr. 09||Apr. 10||Apr. 11||Apr. 12||Apr. 13||MAPE|
5.4 Experiment 2: Prediction of a later Week.
Figure 3 shows the prediction for the second week of April. All the models are trained on a 28-day time series. The regular SIR model does not fit the data well (even in the training). The reason is that the real world changed (because government policies reshaped the landscape of the mixing domains). These changes reshaped the curvature and trends in the real projection. Our model captures the changes and provides a well-tailored prediction for this situation. To quantify the precision, we also report the absolute percentage error (APE) in Table 2.
The paper introduced a mesoscopic model of contagion spread based on mixing domains. The model breaks down the venues, where people meet, and relates virus transmissibility to the availability of these venues, manipulated by social distancing policies. Since individuals in this model move from venue to venue, they create mixing across them. To err on the safe side, we assumed that mixing (among domains that remain open) is perfect. The results show that the resulting simplified model is nevertheless capable of accurately predicting changes in the contagion time series.