## 1 Introduction

The COVID-19 epidemic initially developed in the Chinese region of Wuhan [1] before becoming a pandemic. As of public records on 21 April 2020, 2,478,634 people have been affected internationally, and more than 26% of the world population is involved in some form of shelter-in-place or lockdown order. In the public political debate and in the scientific debate as well, great attention is posed on the effectiveness of intervention measures. Time-wise, Italy has been one of the first Countries, after China and Iran, significantly hit by the pandemic and then forced to resort to the implementation of strictly increasing quarantine measures by an exponential increase in the COVID-19 infection. The situation in Italy up to the first half of March is well described in [2, 3]. Embracing the recommendations of the scientific experts (see also the considerations in [2] concerning the exponential growth of the epidemic at the beginning of March 2020), since March 08 and March 09 2020 the local (Lombardy) and national (Italy) governments have imposed severe containment measures (lockdown, for simplicity, henceforth). The measures have, indeed, impacted the progression of the epidemic and avoided the exponential growth. On 20 April 2020 Italy counts a total of 181,228 confirmed individuals, with 24,114 deaths and a number of currently infected individuals equal to 108,237.

[4] poses the question of how country-based mitigation measures may influence the course of the COVID-19 epidemic world wide. We have seen alternative attitudes internationally, with some countries intervening with strict lockdown measures within a short amount of time since the insurgence of the epidemics, and other adopting a more stage-phased approach.

Questions that emerge are, then, whether the effect of the intervention can be quantitatively observed from the data and whether its effect is immediate or delayed. In fact, it is natural to expect a time-lag between the moment an intervention is decided and the time in which it is actually taking in place. A the same time, the intervention time is not be the sole factor explaining a change in the pandemic time progression. The quarantine, the infection rate, the protection rate, the average latent time are factors determining an epidemic evolution that are expected to play a role. We then wish to quantify the relative importance of the intervention time with respect to these other factors.

To answer these questions, we resort to mathematical modelling coupled with global sensitivity analysis. Forecasting the expansion of the COVID-19 epidemics is subject of intensive research investigation [5, 6]. We investigate the behavior of the pandemic combining predictions from a generalized SEIR model recently used in [7, 8, 9]

to describe the COVID-19 pandemic in China with machine learning approaches to increase interpretability of the findings

[10]. We fit the generalized SEIR model to data of the pandemic in Italy covering the period 24 February to 20 April. We then run an uncertainty analysis with 1,000,000 Monte Carlo simulations and apply global sensitivity measures to quantify the relative importance of the factors mentioned above. We employ a replicated finite-difference decomposition method to study the intensity and sign of factor interactions.## 2 Results

#### Time Delay of Intervention Effects.

We start with results for the effects of the intervention time.

Figure 1 shows the following. The graphs represents the dynamic evolution of the pandemic comparing real data (circles) against the generalized SEIR model forecast (continuous line). The first graph report the results obtained for the case in which the intervention is modeled as fully active in the first day. We register a poor fit (), with the forecast largely underestimating the number of totally infected individuals (the root mean squared error (RMSE) is ). The second graph shows reports the comparison in the case the intervention is assumed to have effect 1 day later than the official issuance date. We register better fit, confirmed by an increase of from to (the RMSE is now , with a decrease of ). Then, graphs (c) to (f) show the effect of simulating the intervention with one additional day of delay per graph. Note that by simulating the intervention as starting 5 days later than the actual issuance date, the fitting performance improves notably. We register and , with a decrease of with respect to graph (a).

Due to the fact that intervention time is not the only factor influencing the forecast, we also tried to tune the forecast by changing other parameters such as and and combinations, while leaving the intervention time at the issuance date. Our search did not lead to satisfactory results, signaling the intervention time as the main source of discrepancy. To confirm the relevance of the intervention time, we performed further analyses via global sensitivity measures.

#### Uncertainty quantification

Figure 2

displays the empirical density function for the SIER model forecast of the total number of infected individuals on April 20th, 56 days after the data collection start. The empirical density is skewed, with a mean of

individuals, standard deviation of

,upper quantile on

. Thus, the model foresees an expected number of infected individuals between and , which is consistent with actual data (red vertical line). The empirical density is computed on a sample of 1,000,000 realizations of the forecasts of total infected individuals. Uncertainty in the model predictions is the result of uncertainty in six factors acting as parameters of the generalized SEIR model: protection rate (), infection rate (), average latent time (), average quarantine rate (), the number of initial infected people () and the intervention time (Interv.). The distributions are discussed in Section 4. From these distributions uncertainty is propagated in the model through Monte Carlo simulation.#### Factor Importance

From the previous Monte Carlo sample, we have calculated global sensitivity measures that determine quantitatively the relative importance of the inputs [See Supplementary Appendix A]. Figure 3 shows the results.

Figure 3

shows that time to intervention (Interv.) is the key-driver in the variability of model predictions according to all the sensitivity measures used, i.e., variance-based total indices (

), first order indices () and the distribution-based sensitivity measure () (see the supplementary quantitative appendix A for further details on the definition of these indices and the motivation for their simultaneous utilization.) It is followed by quarantine rate, number of initially infected individuals () and infection rate (). On a relative scale time to intervention is (depending on the sensitivity measure) to times more important than the quarantine rate. In turn, the quarantine rate is to times more important than the initial number of affected individuals, which, in turn, is to times more important than the infection rate. Protection rate, infection rate and average latent time play an even minor role.#### Direction of influence

Figure 4 shows the following. Each of the variables has a separately monotonic effect on the output. For instance, the expected total number of infected individuals is linearly and monotonically decreasing in the protection rate (). Similarly, we expect a decrease associated with an increase in latent time and quarantine rate, but an increase associated with infection rate (), in the number of initially infected individuals () and with the time of intervention (Interv.). These qualitative results are in accordance with intuition. The only difference is that the conditional regression function shows a (still decreasing) non-linear behaviour for the total number of infected individuals as a function of the quarantine rate. The last graph can be used to obtain a quantitative indication: when the intervention time is delayed by 7 days, the median total number of infected people would be expected to increase by around . [This increase is calculated by comparing the median conditional on the intervention being effective on March 09 2020 to the median conditional the intervention being effective on March 16 2020.]

#### Interaction Quantification

Figure 5 displays the magnitude of the interaction effects among the factors. The highest recorded interactions (the first six bars are individual effects) are among , and the intervention time, namely the three most important factors.

In terms of overall relevance, interaction effects account for less than of the variance of the affected individuals. The mean dimension is 1.1683 (see Appendix A for the mathematical definitions). This means interactions have a low overall effect, although the effect is not entirely negligible. In fact, the highest interaction (between intervention time and ) is of the highest individual effect. Aside this interaction, we register a non-negligible interaction between intervention time and initial number of infected individuals (). Note that these three factors are also the ones identified as most influential by the global sensitivity measures.

## 3 Discussion

We have made inference on publicly available data of the COVID-19 pandemics in Italy through application of a SEIR model and the use of statistical techniques for sensitivity analysis. The investigation has evidenced a time-lag between the moment interventions have been issued and the moment in which their effect has actually taken place changing the pandemic path. Indeed this result seem to reflect a physical mechanism. While shelter-in-place measures are officially established on a given date, it is unlikely that they have an immediate effect. For instance, individuals might not be able to interrupt all contacts immediately, they do not have instantaneous access to masks or other protecting equipment. Moreover, some activities cannot shutdown immediately due to technical reasons, and certain industrial compartments have to remain open to keep delivering essential services to the population.

Our results have both policy-making and modeling implications. From a policy making viewpoint, global sensitivity measures indicate that the first two most important factors are the intervention time and the quarantine rate. These two factors relate to social distancing measures. Not only, but combined with the results of the time to intervention analysis, the findings indicate that the earlier intervention measures are introduced, but also the more rapidly (strongly) they can be implemented, the more effective they are in flattening the pandemic curve.

From a modeling viewpoint, the strong impact of the intervention time suggests indeed that this variable needs to be taken into account to increase the prediction accuracy of SEIR models.

Regarding limitations of the present analysis, further work can be done to determine the distributions of the factors more precisely and, potentially, to include statistical correlations among the factors. Also, the analysis has been focused on the total number of infected individuals at 56 days from the initial observations. However, the analysis can also be made time dependent, varying the final time (e.g., using an analysis at 50 days or at 100 days). Of course, the longer the time horizon, the larger the uncertainty in the forecast we expect. Also, we have chosen one specification of the SEIR models and alternative specifications may be used.

The paper moves in the direction advocated by [10]. The uncertainty quantification is directly applicable to other SIER models and/or even to other type of models used to forecast or study the COVID-19 pandemics. Also, while we have analyzed data from Italy, the analysis can be replicated with data coming from other Countries and regions. The approach yields coherent insights not only regarding the most important inputs but also about the role of interactions and direction of change. This enriches the spectrum of insights gained from the model and the data with respect to current practice, with analyses that are limited only to varying one factor at a time. The analysis helps modelers in understanding key-factors of uncertainty increasing transparency. Collecting information on these factors might reduce uncertainty in the forecasts improving modelling efficacy and helping result communications to the public (see the recent critique in [14]).

## 4 Methods

We consider a representative of the SEIR family of models. These models are a cornerstone in epidemiological studies [11, 12] and have been used in the very recent studies of [7, 8] regarding the COVID-19 pandemics. We consider the version used to simulate the COVID-19 outbreak in China in the work of [9]. This generalization allows a new quarantined state that includes the effect of soft measures before lockdown. In total, the following seven different states are considered: susceptible , insusceptible , exposed (but not yet infectious) , infectious , quarantined , recovered , deceased .

The generalised SEIR model. Adapted from [9].

The model includes the following parameters: i) protection rate: ; ii) infection rate, ; iii) average latent time: ; iv) average quarantine rate: ; v) time-dependent cure rate coefficients: and (the time dependent cure rate is written as by ); vi) time-dependent mortality rate coefficients: and (the mortality rate curve is written as ). For simplicity, the birth and natural death processes are not considered so that a constant population is assumed. The above model has been implemented in Matlab [13] and can be openly accessed at https://github.com/ECheynet/SEIR.

#### Data Collection and Model Fitting

We collected data from Italy’s the Department of Civil Protection publicly available repository https://github.com/pcm-dpc/COVID-19.

Figure 6 displays the number of total, quarantined, recovered, deceased individuals as a function of time, as per the publicly available data from 24 February till 20 April. The vertical lines display the dates of 09 March 2020, at which the lockdown measures were officially deployed. Fitting the generalized SEIR model discussed above on the data from 24 February to 08 March (13 days) for Italy, we obtain the pre-lockdown parameter estimates. Then, fitting the same model on the data from 09 March to 20 April (42 days), we obtain the after-lockdown parameter estimates reported in Table 1.

Pre-lockdown | 0.0000 | 1.1801 | 2.182 | 0.5985 | ||

After-lockdown | 0.1098 | 2.0000 | 14.2091 | 0.3750 |

Note that the after-parameters estimates are significantly different from the pre-lockdown estimates. For instance, the protection rate increases by over 2,000 times; The average latent time increases by over times; the quarantine rate decreases by .

Figure 7 displays the total number of infected individuals calculated by the fitted models (continuous lines) against the real observations (circles). In terms of fitting performance, we register an average coefficient of determination () over the quarantined, recovered and deceased cases equals to in the pre-intervention period, and in the after-intervention period.

#### Uncertainty quantification

To further understand the results, we have let the model undergo an uncertainty quantification in order to be able to compute global sensitivity measures. We used the parameter distribution assignments displayed in Table 2 (see also [9]) and generated a random Monte Carlo sample of size .

Distribution | Uniform | Uniform | Uniform | Uniform |
---|---|---|---|---|

Support | ||||

The intervention day | ||||

Distribution | Discrete uniform | Discrete uniform | ||

Support | 09-Mar with |

. The intervention time is assumed to be a uniformly distributed random variable in a one-week range after the actual issuance date. Independence is assumed.

From this sample, we have quantified the uncertainty about the forecasted number of infected individuals in Italy from 24 February to 20 April (56 days). Also, we have calculated importance indices that go under the name of global sensitivity measures to determine the quantitatively the relative importance of the inputs [See Supplementary Appendix A]. Specifically, we have used the first and total order global sensitivity measures based on variance contribution and a sensitivity measures based on the distance between cumulative distribution functions. The rationale of the choice is as follows. Regarding importance, the total order as well as the distribution-based sensitivity measure possess the properties that they are null if and only if the model output is probabilistically independent of the input of interest. Thus, their null value reassures us that the input is, indeed, not influential. Regarding interaction quantification, the difference between the first and the total order indices provides information about the relevance of interactions, giving insights on whether a factor’s influence is due to its individual action or due to interactions with other factors. This insight, however, is an overall insight and does not reveal which interactions (if any) are important. Towards this understanding we use a sample of size 20,000, replicating a full factorial design and decomposing the output changes via a finite difference operation (see Appendix

A fpor further details). The scheme allows us to compute a total of 20,000 first order effects for each input, 20,000 second order interaction indices for all pairs, and 20,000 interaction indices of order 3, 4, and 5 and 6, enabling interactions to be thoroughly studied.From the original sample, we also computed conditional regression curves. These curves express the expected number of infected individuals as a function of each of the uncertain parameters. The graphs lead information about whether the increase in a parameter leads to an increase (decrease) of the number of infected individuals.

## Appendix A Supplementary Appendix 1: Quantitative Details on Sensitivity Analysis

This section is divided into two parts. In the first part, we address the definition of the global sensitivity measures used in this work. In the second part we ketch the principles of their estimation and list the corresponding subroutines.

### a.1 Sensitivity Measures: a Concise Overview

For broad overviews on sensitivity analysis, we refer the reader to [15, 17]. We concisely overview the methods used here. Let , where is a multivariate input-output mapping,

be the vector of the model inputs, with

and the output of interest. Let also be the measure space of the model inputs and let , ,…, ,,…, be a collection of points insampled from the input probability measure. Let

, ,…, be the corresponding values of the model the output. Then, let denote the variation in the model output registered with the inputs shift from to . We then have a sequence of finite changes , . We can decompose each change in effects, writing:(1) |

where

(2) |

In eq. (2), the point is obtained shifting from to , while keeping the remaining inputs at their values in . Similarly, denotes the point obtained by simultaneously shifting and from to while keeping the remaining inputs at their values in .

The terms in equation (1) are individual contributions, second order contributions, quantifying the residual interaction between and ; a similar interpretation is shared by higher order terms. Dividing these second order effects by , we obtain the second order Newton ratios that provide indications about whether interactions are locally positive (synergistic) or negative (antagonistic).

Sampling the model -times, we obtain replicates of the finite change indices, . These sensitivity measures can be used, on the one hand, to obtain detailed regional information on interactions. Assume the inputs are independent. Let the variance of induced by uncertainty in the model inputs. As proven in [20], we can write

(3) |

where

(4) |

and the functions obtained by taking appropriate conditional expectations [20, 17]. In equation (3), is the portion of the variance attributed to alone, the portion of the variance attributed to the residual interaction between and . In the literature, one considers the normalized version

(5) |

called first order Sobol’ indices. One also defines the total sensitivity index of , here denoted by [21]. The index is written as:

(6) |

and represents the total portion of the variance of associated with . We note that by a result due to [22], the sum of the total order indices equals the mean dimension of in the superimposition sense.

There is a bridge between finite change sensitivity indices and the total index In fact, let denote the random variable whose realizations are the . Then, from the theory of the Sobol’ estimation design [23] [24] [25], we get

(7) |

That is, by calculating the variance of the available first order effects and halving we obtain an estimate of the total order indices for each input. These estimates, compared with estimates of the first order indices provides information on how much each input is involved in interaction with the remaning ones. In particular, we note the notion of mean effective dimension provides a quantitative indication about the relevance of interactions. This notion has been introduced in [26, 22]. The intuition is as follows. The terms are positive and sum to unity. Thus, they can be regarded as forming a probability mass function. Actually, they place a mass on the random variable whose realizations all the possible combinations of indices. Then, one defines the dimension distribution of in the superimposition sense as the distribution of the cardinality of . The mean dimension is then [22]:

(8) |

respectively. A mean effective dimension in the superimposition sense equal to unity indicates the absence of interactions. [22] also proves that the mean effective dimension is equal to the sum of total effects. That is, we have: .

We recall that first order variance-based sensitivity measures fall in the so-called common rationale of global sensitivity measures [27]. That is, they are sensitivity measures of the type

(9) |

where is a distance or divergence between distributions, is the marginal probability measure of and is the conditional probability measure of given . Depending on the choice of , one obtains alternative ways of measuring importance. To illustrate, let and denote the cumulative distribution function of and the conditional distribution function of given . Selecting the Kuiper distance between cumulative distributions functions, we get [28]

(10) |

We note that a null value of and indicates that is independent of . First order variance-based sensitivity measure () do not possess this property and their null value is not necessarily a signal of the fact that does not depend on . However, first order variance-based sensitivity measures remain well defined when inputs are dependent, similarly to , while problems associated with the interpretation of arise.

### a.2 Estimation

The selected sensitivity measures for our calculations are the first order main effects , the higher order effects computed at randomized locations. The implementation of the decomposition in eqs. (1) and (2) is implemented in the surboutine finitechanges.m, which is available at https://github.com/LuXuefei/FiniteChanges. Then, such decomposition is performed at randomized locations appropriately generated through random Monte Carlo generation implemented in the code rand.m and randsample.m, available at Matlab.

From the randomized first order effects , it is then possible to obtain estimates of the first order indices of all inputs from

(11) |

The estimation of the sensitivity measures and is based on the so-called given data approach. We refer to [29] and [27] for a detailed treatment. We just recall the key intuition here. Let a subset of the reals that represents the support of model input . Then, consider a partition of into bins, such that and for , The intuition of a given-data estimation is then that of substituting the condition (that is is exactly at ) with the bin condition . Intuitively, this condition asks that is in a suitably defined interval around . With this intuition, one obtaines the estimate

(12) |

where is equal to the probability that belongs to bin , and where and are the marginal empirical distribution function of and the conditional empirical distribution function given that . Then, one links the partition size (M) to the sample size (), so that as increases the parition size tends to zero and the limiting partition is itself. [27] prove that, under mild assumptions, (the continuity of the operator in both its arguments and the Riemann-Stieltjes integrability of ), converges to as the sample size increases; that is, is an asymptotically consistent estimator of .

These calculations are implemented in the subroutine betaKS3.m. Finally, the conditional regression functions have been estimated using the subroutine cosi.m. Both subroutines are developed by Elmar Plischke and available at https://zenodo.org/record/885332#.XpnB44gzZGM (see [30] and [31] for details).

## References

- [1] F Wu, S Zhao, B Yu, Y.-M. Chen, W Wang, Z.-G. Song, Y Hu, Z.-W. Tao, J.-H. Tian, Y.-Y. Pei, M.-L. Yuan, Y.-L. Zhang, F.-H. Dai, Y Liu, Q.-M. Wang, J.-J. Zheng, L Xu, E C Holmes, and Y.-Z. Zhang. A new Coronavirus Associated with Human Respiratory Disease in China. Nature, 579(7798):265–269, 2020.
- [2] A. Remuzzi and G. Remuzzi. COVID-19 and Italy: what next? The Lancet, 395:1225–1228, 2020.
- [3] G. Grasselli, A. Pesenti, and M. Cecconi. Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy: Early experience and Forecast during An Emergency Response. JAMA. Journal of the American Medical Association, 2020.
- [4] Roy M. Anderson, Hans Heesterbeek, Don Klinkenberg, and T. Déirdre Hollingsworth. How will country-based mitigation measures influence the course of the COVID-19 epidemic? The Lancet, 395(10228):931–934, 2020.
- [5] J T Wu, K Leung, and G M Leung. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. The Lancet, 395(10225):689–697, 2020.
- [6] M Enserink and K Kupferschmidt. With COVID-19, modeling takes on life and death importance. Science, 367(6485):1414–1415, 2020.
- [7] A J Kucharski, T W Russell, C Diamond, Y Liu, J Edmunds, S Funk, R M Eggo, F Sun, M Jit, J D Munday, N Davies, A Gimma, K van Zandvoort, H Gibbs, J Hellewell, C I Jarvis, S Clifford, B J Quilty, N I Bosse, S Abbott, P Klepac, S Flasche, and Centre for Mathematical Modelling of Infectious Diseases COVID-19 working group. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The Lancet Infectious Diseases, 2020.
- [8] H Wang, Z Wang, Y Dong, R Chang, C Xu, X Yu, S Zhang, L Tsamlag, M Shang, J Huang, Y Wang, G Xu, T Shen, X Zhang, and Y Cai. Phase-adjusted estimation of the number of Coronavirus Disease 2019 cases in Wuhan, China. Cell Discovery, 6(1), 2020.
- [9] Liangrong Peng, Wuyue Yang, Dongyan Zhang, Changjing Zhuge, and Liu Hong. Epidemic analysis of COVID-19 in China by dynamical modeling. feb 2020.
- [10] A. Saltelli. A Short Comment on Statistical versus Mathematical Modelling. Nature Communications, pages 1–2, 2019.
- [11] W.O. Kermack and A.G McKendrick. A Contribution to the Mathematical Theory of Epidemics. Proceedings of the Royal Society A, 115(772):700–721, 1927.
- [12] Herbert W. Hethcote. Mathematics of infectious diseases. SIAM Review, 2000.
- [13] E Cheynet. Generalized SEIR Epidemic Model (fitting and computation) (https://www.github.com/ECheynet/SEIR), 2020.
- [14] Quoctrung Bui, Josh Katz, Alicia Parlapiano, and Margot Sanger-Katz. What 5 Coronavirus Models Say the Next Month Will Look Like. The New York Times, apr 2020.
- [15] A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana, and S. Tarantola. Global Sensitivity Analysis – The Primer. Chichester, 2008.
- [16] T.J. Sullivan. Introduction to Uncertainty Quantification. Springer Verlag, 2015.
- [17] E. Borgonovo. Sensitivity Analysis: An Introduction for the Management Scientist. Springer, NY, 2017.
- [18] G. Li, C. Rosenthal, and H. Rabitz. High dimensional model representations. The Journal of Physical Chemistry A, 105(33):7765–7777, 2001.
- [19] E. Borgonovo. Sensitivity analysis with finite changes: An application to modified EOQ models. European Journal of Operational Research, 200(1):127–138, 2010.
- [20] B. Efron and C. Stein. The Jackknife Estimate of Variance. The Annals of Statistics, 9(3):586–596, 1981.
- [21] T. Homma and A. Saltelli. Importance Measures in Global Sensitivity Analysis of Nonlinear Models. Reliability Engineering & System Safety, 52(1):1–17, 1996.
- [22] A. B. Owen. The Dimension Distribution and Quadrature Test Functions. Statistica Sinica, 13:1–17, 2003.
- [23] I.M. Sobol’. Sensitivity Estimates for Nonlinear Mathematical Models. Mathematical Modelling & Computational Experiments, 1:407–414, 1993.
- [24] F. Gamboa, A. Janon, T. Klein, A. Lagnoux, and C. Prieur. Statistical inference for Sobol pick-freeze Monte Carlo method. Statistics, 50(4):881–902, 2016.
- [25] E. Borgonovo, G. Rabitti Simulator Inputs Screening: from Elementary Effects to Mean Dimensions. Work in Progress, 1–39, 2020.
- [26] R. E. Caflisch, W. Morokoff, and A. B. Owen. Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. Journal of Computational Finance, 1:27–46, 1997.
- [27] E. Borgonovo, G. Hazen, and E. Plischke. A Common Rationale for Global Sensitivity Measures and their Estimation. Risk Analysis, 36(10):1871–1895, 2016.
- [28] M. Baucells and E. Borgonovo. Invariant Probabilistic Sensitivity Analysis. Management Science, 59(11):2536–2549, 2013.
- [29] E. Plischke, E. Borgonovo, and C.L. Smith. Global Sensitivity Measures from Given Data. European Journal of Operational Research, 226(3):536–550, 2013.
- [30] E Plischke. An effective algorithm for computing global sensitivity indices (EASI). Reliability Engineering & System Safety, 95 (4):354–360, 2010.
- [31] E. Plischke. How to Compute Variance-Based Sensitivity Indicators with Your Spreadsheet Software. Environmental Modelling & Software, 35:188–191, 2012.

Comments

There are no comments yet.