1 Introduction
The COVID19 global pandemic poses a threat not only to public health, but also to the stability of the healthcare infrastructure and economies around the world. Forecasting the spread of the pandemic is crucial for informing governmental response (containment strategies and social distancing measures). Models that are capable of anticipating the different phases of the pandemic in a timely manner can be used to guide these decisions and inform future policy direction. Most existing models rely on mathematical compartmental approaches (e.g., the SEIR model) for estimating the potential magnitude of COVID19 patient volume. However, these models are sensitive to starting assumptions and thus different models provide considerably different forecasts, resulting in highly uncertain forecasts. Moreover, most of the existing mathematical models have failed to accurately forecast peaks in deaths or cases and subsequent declines because of their rigid assumptions and their inability to account for changes in governmentmandated societal interventions over time.
In this paper, we develop a Bayesian model for forecasting COVID19 cases and deaths over time. Our model uses a Gaussian process with a mean function defined through a compartmental model as a prior belief on how the pandemic curve will unfold, and then updates its posterior belief on the future forecasts based on current evidence from global disease tracking data. Our model is global in the sense that it jointly incorporates the expected effects of different policies on the pandemic curve by jointly modeling these effects across all countries affected by the pandemic. This is achieved through a hierarchical Gaussian process (GP) model where parameters used to define a nationspecific pandemic curve are shared across all nations based on countryspecific indicators. Compared to existing models, our model (1) is flexibly able to handle model missspecification in a datadriven fashion, (2) is able to quantify uncertainty in these forecasts, and (3) is able to capture the effect of interventions on these forecasts. Comparisons with existing models are provided in Table 1.
Approach  Uncertainty  Interventions  Sample efficiency  

SIR 

None  Not modeled 



Curve fitting  Frequentist  Not modeled 


Our model 

Bayesian  Modeled 

Most of the widelyused models for forecasting the COVID19 pandemic are based on either of the two modeling approaches highlighted in Table 1. For instance, the Institute for Health Metrics and Evaluation (IHME) model in [1] relies on the curve fitting approach to forecast cumulative number of deaths over time, whereas the model in [7] relies on a variant of the SIR model. However, these models do not allow for analyzing counterfactual scenarios on how the COVID19 fatalities would change under different possible policies for easing the lockdown.
The key objective of our model is to assist policy makers in assessing the potential impact of various lockdown imposing/relaxation policies on the future number of COVID19 fatalities. The model is fed with data on daily reported COVID19related deaths from all countries affected with the pandemic, along with the timeline for the government policies in each of these countries. Using this data along with economic, social, demographic, environmental and public health indicators for each country, the model predicts the effect of different future policies on the expected number of new fatalities as illustrated in Figure 1. In addition to the point predictions provided by the model, uncertainty intervals are also presented to the decisionmaker in order to obtain upper and lower bounds on the fatalities associated with the different policies.








2 Problem Setup: Forecasting the COVID19 Pandemic
Let be the number of reported COVID19related deaths in a given geographical area on the day since the beginning of the outbreak. Throughout this paper, we assume that a geographical area corresponds to a country, and consider a set of countries. Each country
is characterized by a feature vector
comprising economic, social, demographic, environmental and public health indicators (all listed in Table 2). Because the number of confirmed COVID19 cases depends greatly on the testing rates and testing strategy in each country, we use the reported daily deaths as a more concrete indicator for disease spread.2.1 Modeling Objectives
Our key objective is to forecast the future number of COVID19 deaths across all countries under different levels of policy stringency, i.e., the extent to which the government containment measures are restrictive. Using these forecasts to conduct scenario analyses, policymakers can decide how to ease lockdown and containment measures over time while retaining low mortality rates by examining the effects of different possible future policies on the expected number of future deaths.
Our model is trained on a data set for countries covering a period of days, i.e.,
(1) 
where is a quantitative measure of the stringency of the policy applied in country at time . A precise description of the data pertaining to variables , and is provided in Section 2.2.
For each country , our goal is to forecast the expected number of new COVID19 deaths at a future time horizon for a given future policy measures, i.e.,
(2) 
In addition to the point prediction in (2), we also estimate uncertainty intervals that cover the true number of future deaths,
, with high probability. By examining different settings of the future policy variables
, policy makers can use the predicted fatalities and the associated uncertainty measures to inform future policy direction.The prediction in (2) is made for each country by conditioning on data for all countries. Thus, the model transfers knowledge about COVID19 trends and policy effects across different countries based on their similarity with respect to the countrylevel feature vector .
2.2 Data Description
In this Section, we describe the data pertaining to the variables (countryspecific features), (policy stringency), and (reported deaths) in (1).
Countryspecific features. We characterize each country with the feature vector , which comprises a total of 35 economic, social, demographic, environmental and public health indicators. The list of these indicators is provided in Table 2. Data on these indicators was collated from statistical reports published by the World Bank (https://data.worldbank.org/).
COVID19 mortality data. Data on daily reported COVID19 deaths was collected from the COVID19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [6], through which information from local government, national government, WHO websites, and thirdparty aggregators were used to identify data on confirmed COVID19 deaths by day of death at the first administrative level.
Policy stringency index. We consider government policies through containment and closure indicators recorded by the Oxford Covid19 Government Response Tracker (OxCGRT), which collects systematic information on which governments have taken which measures, and when. This data was collated by the Blavatnik School of Government at Oxford University [14].
: School closure 




: Workplace closure 






A single Stringency Index is constructed via a ninepoint aggregation of the 9 containment and closure indicators listed in Table 3. The index reports a number between 0 to 100 that reflects the overall stringency of the governments response over time. This is a measure of how many of the these nine indicators (mostly around social isolation) a government has acted upon, and to what degree. This index is used to model the policy stringency variable in (1).
Figure 2 compares the policy directions and the number of reported daily deaths
in four Scandinavian countries (Sweden, Norway, Denmark and Finland). Compared to the lockdowns and shuttered businesses in countries across the world, Sweden is an outlier as officials have advised citizens to work from home and avoid travel, but most schools and businesses have remained open. Thus, the stringency index of Sweden over the months of March, April and May have been slowly increasing towards a maximum of 60
, which is significantly less than other Scandinavian countries which adopted an 8090 stringency since the months of February and March. Since Scandinavian countries have comparable countryspecific features , this data provide us with a natural experiment for the effect of policy stringency on the spread of the disease.3 Compartmental Gaussian Processes
We propose a (Bayesian) model that jointly captures COVID19 fatalities and the mitigating effect of policy stringency over time across different countries. The key idea of our model is based on the usage of a 2layer Gaussian process, with the first layer to model countryspecific COVID19 fatalities, and the second layer to share parameters across all countries.
Hierarchical Gaussian process model. We model using a Gaussian process, with countryspecific mean functions and a kernel function . The input to the Gaussian process is the time dimension and the output is the number of deaths. The parameters of the mean function are modeled through another Gaussian process as follows:
(3) 
The mean function shares parameters across different countries through the countryspecific feature and the policy stringency . The parameter determines our prior information on how the pandemic will spread based on the country features and policy given its spread based on other “similar” countries with “similar” levels of policy stringency.
Incorporating prior information. We model the mean functions using a baseline compartmental model. In particular, we model the mean functions through a Susceptible, Infectious, and Recovered (SIR) model [9] with timedependent parameters as follows:
(4) 
where the contact rate , the incubation rate and the mortality rate are the SIR model parameters. For a population of size , the SIR model comprises three compartments: is the number of people susceptible on day , is the number of people infected on day , and is the number of people recovered on day . The SIR model describes the evolution of these factors through the following differential equations:
(5) 
The model in (5) specifies our prior on the disease spread curve — the parameters of the model are learned jointly for all countries. The Gaussian process posterior further refines our belief on the disease forecast based on observed data at each new time step.
Incorporating policy effects. Unlike the standard SIR model with constant parameters, our model captures adopts a timedependent contact rate parameter , which is modulated by policy effects over time. Since the basic reproduction number , our model can learn how the policy can change the over time as illustrated in Figure 2.
Country  RMSE  

Our model  SIR model  IHME model  
United Kingdom  488  629  682 
United States  1,590  1,803  723 
Italy  335  462  383 
Spain  291  358  304 
Germany  175  198  — 
Russia  56  57  — 
Turkey  83  102  — 
France  233  270  393 
Brazil  291  316  — 
4 Preliminary Results
We validated our model using data for 70 of the time since the reporting of the first COVID19 deaths, validation on data for 7 days and testing performance on the remaining data. The results where evaluated based on the root mean squared error (RMSE) of the different methods in 11 different countries with a significant number of COVID19 cases. Results are provided in Table 4.
4.1 Evaluating the lockdown lifting policy in the UK
In Figure 4, we plot the predicted daily number of COVID19 deaths under three possible policies: (1) the lockdown being abruptly lifted, (2) the lockdown continuing, and (3) the announced UK policy for gradual lockdown lifting. We evaluated the stringency index corresponding to these three policies and plotted the forecasted daily deaths starting from May 13th up until July 1st. As we can see, a sharp lifting of the lockdown would result in a second temporary rise in number of deaths, with around 200 more deaths each day compared to the announced gradual lockdown lifting policy.
References
[1] IHME COVID19 health service utilization forecasting team and Christopher J. Murray. “Forecasting the impact of the first wave of the COVID19 pandemic on hospital demand and deaths for the USA and European Economic Area countries.” medRxiv, 2020.
[2] R. Li, S. Pei, B. Chen, et al. “Substantial undocumented infection facilitates the rapid 439 dissemination of novel coronavirus (SARSCoV2).” Science, 2020.
[3] N. M. Ferguson, D. Laydon, G. NedjatiGilani, et al. “Impact of nonpharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand.” Imp Coll COVID19 Response Team, 2020.
[4] A. J. Kucharski, T. W. Russell, C. Diamond, et al. “Early dynamics of transmission and control of COVID19: a mathematical modelling study.” Lancet Infect Dis, 2020.
[5] J. T. Wu, et al. “Nowcasting and forecasting the potential domestic and international spread of the 2019nCoV outbreak originating in Wuhan, China: a modelling study.” pp. 689–697, The Lancet, 2020.
[6] JHU CSSE. 2019 Novel Coronavirus COVID19 (2019nCoV) Data Repository by Johns Hopkins CSSE. GitHub. 2020 (https://github.com/CSSEGISandData/COVID19).
[7] J. Lourenço, et al. “Fundamental principles of epidemic spread highlight the immediate need for largescale serological surveys to assess the stage of the SARSCoV2 epidemic.” medRxiv, 2020.
[8] C. C. McCluskey. “Complete global stability for an SIR epidemic model with delay—distributed or discrete.” Nonlinear Analysis: Real World Applications, pp. 5559, 2010.
[9] W. O. Kermack, and A. G. McKendrick. “A contribution to the mathematical theory of epidemics.” Proceedings of the Royal Society of London, pp. 700721, 1927.
[10] H. W. Hethcote, “The mathematics of infectious diseases.” SIAM review, pp. 599653, 2000.
[11] R. Lemonnier, K. Scaman, and N. Vayatis. “Tight bounds for influence in diffusion networks and application to bond percolation and epidemiology.” Advances in Neural Information Processing Systems (NeurIPS), 2014.
[12] D. B. Neill, and A. W. Moore. “A fast multiresolution method for detection of significant spatial disease clusters.” Advances in Neural Information Processing Systems (NeurIPS), 2004.
[13] D. B. Neill, and A. W. Moore. “A fast multiresolution method for detection of significant spatial disease clusters.” Advances in Neural Information Processing Systems (NeurIPS), 2004.
[14] T. Hale, A. Petherick, T. Phillips, and S. Webster. “Variation in government responses to COVID19.” Blavatnik School of Government Working Paper, 2020.
[15] P. Teles. “A timedependent SEIR model to analyse the evolution of the SARScovid2 epidemic outbreak in Portugal.” Bull World Health Organ, 2020.
Comments
There are no comments yet.