During the outbreak of infectious diseases (e.g. EVD, SARS, MERS and COVID-19), quarantine measures are commonly implemented to limit disease transmission and morbidity. Extensive research has shown that quarantine is important in reducing the number of people infected and the number of deaths (Lipsitch et al., 2003; Ferguson et al., 2006), especially when there is no effective treatment for the disease, see Nussbaumer-Streit et al. (2020) for a recent review. To establish a quarantine strategy, some studies use epidemic models such as SEIR type model to determine the optimal time-varying quarantine rate by optimal control theory, see for instance Behncke (2000); Yan and Zou (2008); Ahmad et al. (2016). Lipsitch et al. (2003) discussed the relationship between the quarantine fraction of each infectious case’s contacts and the number of person-days in quarantine. However, a key problem when imposing the quarantine measure is to determine the quarantine duration. An extremely long quarantine duration makes sure that most infected individuals would exhibit symptom under quarantine and then get further quarantine and medical treatment. That is, a long quarantine duration can stop virus from spreading to others. Nevertheless, this may inconvenience uninfected individuals, incur many extra financial and social costs and even affect economic development (Reich et al., 2018). Hence a good quarantine measure should balance effectiveness and cost of the quarantine measure and have a proper duration.
analyzed the apporpriate quarantine period using the quantiles of the incubation period distribution. The existing methods do not consider the characteristics of quarantined individuals and suggest the same quarantine duration for every individual. Nevertheless, different people may have different probability of being infected and different incubation period of a disease. Indeed, the probability of being infected for every individual is unknown. However, some individual characteristics such as age, sex, infection rate in the region from which the individual comes and whether an individual is a close contact, which may affect the incubation period distribution or the infected probability, can be observed. Thus to guarantee the effectiveness and minimize the cost of the quarantine measure, one may intend to set a proper quarantine duration for each potentially exposed individual based on his or her characteristics. To the best of our knowledge, no literature addresses this issue.
In this paper, we consider the problem and develop an optimal quarantine rule. The proposed quarantine rule implements different quarantine duration for different individual depending on his or her characteristics. We make the rule optimal by minimizing the average quarantine duration of uninfected people with the constraint that the probability of symptom presentation for infected people attains any given value, which may be close to 1. We obtain the optimal solution for the problem and estimate the optimal solution by some statistical methods. Coronavirus disease COVID-19 pandemic is known to become a global health crisis since its emergence in Asia late last year. Considerable attention has paid to studying the optimal prevent and control strategy of COVID-19 and various public health measures such as testing, social distancing, lockdown and quarantine in a macro prespective (Piguillem and Shi, 2020; Charpentier et al., 2020; Acemoglu et al., 2020; Alvarez et al., 2020). Quarantine is one of the key aspects of infection control during the pandemic of COVID-19. This paper focuses on study of the optimal quarantine duration for infectious diseases with application to data analysis of COVID-19, which is not discussed in all the aforementioned literature. Comparing to the standard quantile methods due to Farewell et al. (2005), Nishiura (2009) and Liu et al. (2020), the data analysis results demonstrate that our method suggests a shorter average quarantine duration while keep the risk of virus spreading below a given level. That is, the proposed method can keep the risk of virus spreading at the same low level as the standard methods in addition to saving cost of days lost. After quarantine, uninfected individuals may work and study by keeping some social distances or some other simpler measures. Hence, this papers make significant contribution to decrease financial and social costs and impact on economic development with the assurance of controlling the epidemic.
2 Optimal quarantine rule
be a feature vector describing the characteristics of a potentially exposed individual. Letbe the support of and let be a variable indicating whether or not the individual has been infected ( if infected and otherwise). Clearly, is unobservable before quarantine. A quarantine rule is a map from to , that is, . Let be the incubation period of the infectious disease for and the incubation period is not defined for . An infected individual has low risk of infecting others if the individual has symptom presentation and hence is diagnosed during the quarantine. A good quarantine duration should ensure a large enough probability that an infected individual has sympton presentation during the quarantine and minimize the average quarantine duriation of uninfected individuals. Then the problem of finding the optimal quarantine rule can be expressed as finding a map that minimizes the following problem
where is a predefined small positive number (e.g. 0.05) and the subscript or denotes that the expection or probability is taken conditional on or . In this paper, we call the redundant length of the quarantine rule and call the finding probability of the quarantine rule .
If there is no available feature , problem  reduces to
This just defines the quantile of incubation period distribution. In particular, this suggests the 0.95 quantile method due to Farewell et al. (2005) when .
Suppose is the proportion of quarantined infected people in all the infected people and is the basic reproductive number of the disease. Then every infected individual who is not quarantined or released early causes infections approximately. Hence an infected individual causes infections on average. If , the virus spreading can be controlled and the disease will die out exponentially. For example, suppose and , then the epidemic will be controlled if we choose an smaller than . However, the main purpose of quarantine is to stop the spread of the virus as soon as possible, and hence we usually take to be a smaller constant such as .
2.1 Derivation of the optimal solution
Suppose , where
is a categorical variable that takes value inand is a vector of continuous variables. Let be the product of the counting measure on and the Lebesgue measure on . Let be the density function of conditional on w.r.t. and be the density function of conditional on w.r.t. . We use to denote the distribution function of conditional on and and use to denote the corresponding density function w.r.t. the Lebesgue measure. Then problem  can be reformulated as
This is a variation problem and not easy to solve in general. However, we find that the solution of this problem is easy to handle under the following conditions.
, for any and is continuous with respect to . Moreover, is either strictly monontonous with respect to or unimodal and strictly monontonous with respect to on both of the monotone intervals.
Condition 1 is a mild condition and can be satisfied by many commonly used parameterizations of the incubation period (e.g. weibull, lognormal, gamma and Erlang distributions). Condition 2 is a mild regular condition. It is not of practical significant to consider the case where . If we assume for any , the conditional distribution is weibull distribution with shape parameter and scale parameter , then a sufficient condition for is . By Bayes formula,
By Condition 2, we have
then we can establish the following theorem.
For any and , define . Under Conditions 1 and 2, if is small enough such that , then there is a unique constant such that and is the unique minimum point of problem .
The proof of Theorem 1 is given in SI Appendix. In what follows, let us make some intuitive explaination for Theorem 2.1. Our optimal quarantine rule is determined based on the density ratio
which is kind of like the likelihood ratio in hypothesis testing Lehmann (2005). Suppose we need to determine the quarantine duration for an individual with feature value , then is a curve of . For a given , we call the set
the high density ratio period, see the following picture for an illustration (Fig. 1).
In the high density ratio period, the individual has relatively high probability density of symtom presentation if an individual is infected. A possible quarantine policy is“release the individual if an individual does not develop any symtom until the end of the high density ratio period” and we denote the resulting quarantine duration by . A question is how to determine the threshold value . Clearly, for every , cannot be larger than , the peak of the curve. This implies that cannnot be larger than . The larger is, the smaller the finding probability is. If , then the finding probability at is less than or equal to . Condition 1 implies the monotonicity and continuity of on . Theorem 1 states that there exists a unique constant such that and is the optimal quarantine rule.
In practice, the loss of being quarantined for different individual may also be different. We can easily adapt our framework to this scenario by extending problem  to a more general form
where is a weighting function which indicates different costs of quarantine for different individuals. In this case a modified version of Theorem 1 with in the definition of replaced by follows directly under Conditions 1 and 2 if .
Now we propose an estimation procedure for the optimal quarantine duration for any . To estimate the optimal quarantine duration given in Theorem 1, we need to estimate , and . Suppose we have historical quarantine data denoted by . Note that in the historical data we know whether an individual is infected. Here we define for samples with for . Then , and can be estimated consistently by either standard parametrical or nonparametrical methods, e.g. maximum likelihood method or kernel smooth method Hansen (2008); van der Vaart (1998). Suppose , and are the resulting estimators. Then can be estimated by . Let be the estimated conditional distribution and , then can be estimated by the solution of as an equation of on the interval , where is a user specified positive number smaller than . The resulting estimator of is denoted by . Finally, the estimator of the optimal quarantine duration is .
3 Application to COVID-19 Data
3.1 Optimal quarantine rule using age as a feature
In this subsection, we apply our method to analyzing COVID-19 data. Demographic features such as age, sex and comorbidities are important in analyzing epidemiological data Dowd et al. (2020). The incubation period data along with age information are available from the websites of the centres of disease control, or the daily public reports on COVID-19 in 29 provinces in China and are reported by Liu et al. (2020). In this subsection we use this dataset to construct the optimal quarantine rule using age as the feature . Here we only use the information of patients who are infected before Jan 23th to avoid the biased sampling problem disscussed in Liu et al. (2020). The total number of samples is 1770. We use these data to estimate and . In the dataset, the proportions of patients younger than 11 and patients older than 80 are very small (1.9% and 0.6% respectively). Considering the accuracy of the estimation we focus on the people aged between 11 and 80 and take these people as the whole population in our analysis (i.e. ). We apply the kernel method with a Gaussian associate kernel introduced in Kokonendji and Kiesse (2011) to estimate .
The reported integer value incubation period is regarded as the least integer greater than or equal to the true incubation period. Let where is the ceiling function, then the data are regarded as i.i.d. sample from and denoted by . We assume conditional on the incubation period follows a weibull distribution, which is commonly used in analyzing incubation period Lauer et al. (2020). And we further assume the conditional density has the form
where and and are unknown parameters satisfying and . Let , and for , then the log likelihood function Odell et al. (1992) is
and can be estimated by where is the maximum likelihood estimator. Here we use a quadratic function to fit the conditional distribution based on the exploratory data analysis. The estimated value of the parameters are listed as follow:
|(9.09, -0.11, 0.0015)|
Since the number of infected people in China is relative small compared to the entire population, we use the age distribution of the entire population of China to estimate the age distribution conditional on and apply the kernel method with a Gaussian associate kernel to estimate .
In this section, we choose which is sufficient to control the epidemic under the scenario discussed in Remark 1. There are two other ways to make sure . One is to omit the feature and use the sample quantile of the incubation period as the quarantine duration for everyone Farewell et al. (2005) and another is to use the estimated quantile of the conditional incubation period distribution as the quarantine duration for people at the corresponding age Liu et al. (2020). Quarantine durations for people at different ages obtained by the proposed method and the two quantile methods are plotted in Fig. 2.
Figure 2 shows that the 0.95 sample quantile of incubation period is 15 days, which is one day longer than the current quarantine duration in China. The estimated 0.95 conditional quantile of incubation period of middle-aged people is shorter compared to the young people and the old people. The estimated optimal quarantine duration is close to 15 days for people older than 30 and are shorter than 15 days for people younger than 30. This is because the optimal quarantine rule denpends on the probability that an individual is infected and young people is less likely infected in the dataset we consider. For , let and be the quarantine durations obtained by sample quantile and estimated conditional quantile, respectively. To compare the performance of these two methods and the optimal quarantine rule, we calculate the redundant length and finding probability by
where denote or respectively. Because non-integer quarantine duration is not practical, the quarantine duration is rounded to the nearest integer in calculation. The results are listed in Table 2.
|Method||Redundant Length||Finding Probability|
|0.95 conditional quantile||15.04||96.7%|
|optimal quarantine rule||14.32||96.2%|
Table 2 shows that the optimal quarantine rule has the shortest redundant length with guaranteed finding probability. The 0.95 conditional quantile and the optimal quarantine rule are derived based on the conditional distribution model of incubation period. The reasonable finding probabilities in Table 2 also justify our model assumption. The improvement is not great in terms of redundant length. The reason may be that age does not provide sufficient information for obtaining a quarantine rule with good performance. Next, let us consider an example with infection rate in the individual’s origin country observed in addition to age.
3.2 Optimal quarantine rule based on age and infection rate of origin country
Travel quarantine for out-of-country travelers and residents from another country is a common policy around the world during COVID-19 pandemic. When determining quarantine duration, the traveller’s age and infection rate of the disease in the origin country can be observed. In this case, infection rate in a traveller’s origin country is an important feature that reflects the probability that the traveller is infected. For every country we can calculate a current infection index (CII): where is the number of new cases in the country during the last two weeks and is the total population of the country. Here we multiply the rate by a constant to avoid this index being too small. We only consider the number of infections in the last two weeks because the number of infections before two weeks provide little information about the infection probability of current traveller. We divide the countries with different CII into three groups because many countries have similar infection rates. Countries with CII larger than 300 are divided into the high risk group, countries with are divided into the medium risk group and countries with are divided into the low risk group. Besides age, we take the risk level of the traveller’s origin country as a feature.
In this subsection, we obtain the optimal quarantine rule using information from multiple datasets. We consider 79 countries in our model since their data are relatively complete in all the data sources. The number of confirmed cases of each country is reported by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). We use the number of cases confirmed between May 1st to May 14th in each country to calculate the current infection index. SI Appendix, Table S1, shows countries with different risk levels.
As in the previous subsection, we focus on the people aged between 11 and 80. We approximate the feature distribution of uninfected people by the distribution of the entire population (people in the 79 countries) and estimate by the kernel method with a Gaussian associate kernel Kokonendji and Kiesse (2011) using data from the website of United Nations. Data of 5008 COVID-19 patients from Xu et al. (2020) are used to estimate . However, we take the proportion of confirmed cases from different countries reported by CSSE at JHU instead of that in the dataset of Xu et al. (2020) since the proportion reported by CSSE at JHU are regarded more exact.
The dataset of Xu et al. (2020)
does not contain incubation period of the patients. To overcome this difficulty, we assume that the distribution of incubation period for patients at the same age are the same across countries at different risk levels. Thus we can use the conditional distribution model of incubation period fitted in the previous subsection to impute the missing incubation period. Then we can estimate the three individualized quarantine durations using the imputed dataset. Here we employ the mutiple imputation method which is standard in missing data literatureLittle and Rubin (2019). We impute the dataset for ten times and average the resulting estimators over different imputed datasets. Quarantine durations obtained by the sample 0.95 quantile, the estimated 0.95 conditional quantile and the optimal quarantine rule are plotted in Fig. 3.
It can be seen that the optimal quarantine rule gives a much longer duration to travellers from the high risk countries, a duration slightly longer than the 0.95 quantile to travellers from the medium risk countries and a very short duration to travellers from the low risk countries. Optimal quarantine durations for travellers from high, medium and low risk countries shows different trends on age. The trend of high and medium risk countries is consistent with the trend of the conditional quantile curve. This may be because if the infection rate is relatively high, optimal quarantine duration mainly depends on the incubation period. For travellers from low risk countries, the optimal quarantine rule gives shorter quarantine duration for young people compared to old people. The reason may be that in the low risk countries, infection rate of young people is relatively low. The sample 0.95 quantile and the estimated 0.95 conditional quantile methods give quarantine durations that are not dependent on the risk level of the orgin country since the these two methods are independent of the national infected rate by definition.
We calculate the reduncant length and the finding probability for the three methods by a similar procedure as in the previous subsection. The results are reported in Table 3.
|Method||Redundant Length||Finding Probability|
|0.95 conditional quantile||14.99||95.1%|
Table 3 shows that our optimal quarantine rule shorten the average quarantine duration of uninfected people greatly with the guaranteed probability of finding infected individual. Comparing the results in Table 2 and 3, we can see that it is significant to add the risk level of the traveller’s origin country as a feature for the optimal rule. If one can collect other features which are associated with the incubation period or the probability that an individual is infected, the optimal quarantine rule may perform even better.
Although we apply our method to analyzing the COVID-19 data, our method is general and can be applied to establishing optimal quarantine rule for any infectious disease as long as some historical quarantine data are available. Clearly, the conception “optimal” depends on the available features. As mentioned before, if there is no available feature, then our optimal quarantine duration reduces to the quantile of the incubation peiod. There may be some other features that are useful to determine the quarantine duration. For example, a test result for the pathogen can serve as an important feature even though sensitivity and specificity of the test are not that high. It is of great importance to select features which are useful to determine the quarantine duration. This may be an interesting topic for future works.
Age-specific population data of each country are available from the website of United Nations: https:// population.un.org/ wpp/ Download/ Standard/ CSV/ total. Age information of 5008 COVID-19 patients from different countries is available from the website https:// github.com/ beoutbreakprepared/ nCoV2019. The number of confirmed cases of each country reported by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) can be found on the website https:// github.com/ CSSEGISandData/ COVID-19. All the analyses are performed with the use of R software, version 3.6.3. All the code and data involved in this paper are deposited in Open Science Framework, dio: 10.17605/OSF.IO/5437G.
R.W. designed research, performed research, analyzed data and wrote the manuscript; Q.W. oversaw the project, designed research, assisted with conceptualization, edited the manuscript.
The authors declare no conflict of interest.
This research was supported by the National Natural Science Foundation of China (General program 11871460 and program for Creative Research Group in China 61621003), and a grant from the Key Lab of Random Complex Structure and Data Science, CAS.
- Acemoglu et al.  Daron Acemoglu, Victor Chernozhukov, Iván Werning, and Michael D Whinston. Optimal targeted lockdowns in a multi-group sir model. Working Paper 27102, National Bureau of Economic Research, 2020.
- Ahmad et al.  M. D. Ahmad, M. Usman, A. Khan, and Imran M. Optimal control analysis of ebola disease with control strategies of quarantine and vaccination. Infect Dis Poverty, 5(72), 2016.
- Alvarez et al.  Fernando E Alvarez, David Argente, and Francesco Lippi. A simple planning problem for covid-19 lockdown. Working Paper 26981, National Bureau of Economic Research, 2020.
- Behncke  Horst Behncke. Optimal control of deterministic epidemics. Optimal Control Application and Methods, 21:269–285, 2000.
- Charpentier et al.  Arthur Charpentier, Romuald Elie, Mathieu Laurière, and Viet Chi Tran. Covid-19 pandemic control: balancing detection policy and lockdown intervention under icu sustainability. medRxiv, 2020.
- Dowd et al.  J. Dowd, L. Andriano, D. M. Brazel, V. Rotondi, P. Blick, X. Ding, Y. Liu, and M. C. Mills. Demographic science aids in understanding the spread and fatality rates of COVID-19. Proc Natl Acad Sci USA, 117(18):9696–9698, 2020.
- Farewell et al.  V. T. Farewell, A. M. Herzberg, K. W. James, L. M. Ho, and G. M. Leung. Sars incubation and quarantine times: when is an exposed individual known to be disease free? Statist. Med., 24:3431–3445, 2005.
- Ferguson et al.  Neil M. Ferguson, Derek A. T. Cummings, Christophe Fraser, James C. Cajka, Philip C. Cooley, and Donald S. Burke. Strategies for mitigating an influenza pandemic. Nature, 442:448–452, 2006.
- Hansen  B. E Hansen. Uniform convergence rates for kernel estimation with dependent data. Econometric Theory, 24(3):726–748, 2008.
- Kokonendji and Kiesse  C. C. Kokonendji and T. Senga Kiesse. Discrete associated kernels method and extensions. Statistical Methodology, 8(6):497–516, 2011.
- Lauer et al.  S. A. Lauer, K. H. Grantz, Q. Bi, F. K. Jones, Q. Zheng, H. R. Meredith, A. S. Azman, N. G. Reich, and J. Lesser. the incubation period of Coronavirus Disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Annals of Internal Medicine, 2020.
- Lehmann  E. L. Lehmann. Testing Statistical Hypothesis. Springer, 2005.
- Lipsitch et al.  Marc Lipsitch, Ted Cohen, Ben Cooper, James M. Robins, Stefan Ma, Lyn James, Gowri Gopalakrishna, Suok Kai Chew, Chorh Chuan Tan, Matthew H. Samore, David Fisman, and Megan Murray. Transmission dynamics and control of severe acute respiratory syndrome. Science, 300:1966–1970, 2003.
- Little and Rubin  R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. 3 edition, 2019.
- Liu et al.  Xiaohui Liu, Lei Wang, Xiansi Ma, and Jiewen Wang. Conditional quantiles estimation of the incubation period of covid-19. Preprint, 2020.
- Nishiura  H. Nishiura. Determination of the appropriate quarantine period following smallpox expsure: An objective approach using the incubation period distribution. Int. J. Hyg. Environ. Health, 212:97–104, 2009.
- Nussbaumer-Streit et al.  B. Nussbaumer-Streit, V. Mayr, A. lulia Dobrescu, A. Chapman, E. Persad, I. Klerings, G. Wagner, U. Siebert, C. Christof, C. Zachariah, and G. Gartlehner. Quarantine alone or in combination with other public health measures to control covid-19: a rapid review. Cochrane Database of Systematic Reviews 2020, 4, 2020.
- Odell et al.  P. M. Odell, K. M. Anderson, and R. B. D’Agostino. Maximum likelihood estimation for interval-censored data using a weibull-based accelerated failure time model. Biometrika, 48:951–959, 1992.
- Piguillem and Shi  Facundo Piguillem and Liyan Shi. Optimal covid-19 quarantine and testing policies. CEPR Discussion Paper, 2020.
- Reich et al.  N. G. Reich, J. Lessler, J.K. Varma, and N.M. Quantifying the risk and cost of active monitoring for infectious diseases. Scientific Reports, 8:1093, 2018.
- van der Vaart  A. W. van der Vaart. Asymptotic Statistics, volume 3. Cambridge University Press, 1998.
- Xu et al.  B. Xu, B. Gutierrez, S. Mekaru, K. Sewalk, L. Goodwin, A. Loskill, E.L. Cohn, Y. Hswen, S.C. Hill, M.M. Cobo, A.E. Zarebski, S. Li, C. Wu, E. Hulland, J.D. Morgan, L. Wang, K. O’Brein, S.V. Scarpino, Brownstein J.S., O.G. Pybus, D.M. Pigott, and U.G.K. Moritz. Epidemiological data from the covid-19 outbreak. Scientific Data, 7(106), 2020.
- Yan and Zou  X. Yan and Y. Zou. Optimal and sub-optimal quarantine and isolation control in sars epidemics. Mathematical and Computer Modelling, 47:235–245, 2008.