Since December 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) has had a rapid spread over the world. By September 30, 2021, there have been 0.23 billion confirmed cases of COVID-19, including 4.8 million deaths. Many studies show the necessity of interventions in the early stage of the pandemic (hao2020reconstruction; wang2020epidemiological). The screening testing is an effective intervention approach. COVID-19 has the characteristics of high infectiousness, a considerable proportion of asymptomatic infections, a long incubation period, and early symptoms hard to distinguish from other diseases such as influenza. Therefore, the active screening of hidden cases, followed by subsequent quarantines and treatments, plays a key role in blocking the virus transmission path more timely and thoroughly, especially before vaccines are available.
A common testing method for COVID-19 is the SARS-Cov-2 nucleic acid test. The daily testing capacity is usually limited, because of the limit on producing test kits and the manpower on sample collection and lab work. We are interested in developing optimal strategies to allocate the limited testing resources. Existing strategies include utilization of the population age structure, priority to healthcare workers, contact tracing (e.g., testing of the close contacts of confirmed cases), and group testing to increase efficiency. In this paper, we consider a scenario where a state has a limited daily testing budget and aims to optimally allocate it to different counties. We propose to design the allocation strategy based on a ‘commute network’ of counties and the history of confirmed cases. The commute network is a weighted network, where the weighted edge between two counties is a decreasing function of the traffic distance between them. A high-degree node (county) in the network has small traffic distances with many other counties, so that a high infection rate of this county is more likely to cause a rapid spread of disease in the state. Our allocation strategy will give priority to those counties that have a high degree in the commute network or a large number of confirmed cases in the past.
We introduce a 4-compartment SIR disease model on the network. The four compartments are S (susceptible), H (hidden), C (confirmed) and R (removed), where individuals in compartments H and C are both infected and infectious.
We choose these four compartments by taking into account special characteristics of COVID-19: (i) The infected individuals soon become infectious, hence, we do not model the exposure stage. (ii) There are a considerable fraction of asymptomatic cases (Byambasuren2020), making the timely diagnosis not easy; hence, we divide the infected individuals into two compartments, H and C.
We note that U.S. and many other countries have had strict isolation policies for confirmed cases. For this reason, we make an assumption that the main resource of infections comes from those undiagnosed infected cases, i.e., the individuals in compartment H. The goal of screening testing is to minimize the number of individuals in compartment H.
Suppose a state has counties. Let denote the respective number of individuals in four compartments at county , for . We propose an ordinary differential equation model for to estimate model parameters.
In the second stage, we solve an optimization to minimize
. We propose an ordinary differential equation model for, where the number of newly infected cases of each county depends on the number of past hidden cases of neighbor counties on the network (including the county itself), as well as the allocated testing rate of this county. The state administrator has to decide the allocation rates of all counties to minimize the total number of hidden cases in the future. We solve this problem by a 2-stage procedure. In the first stage, we use data of newly confirmed cases of all counties at
to estimate model parameters. In the second stage, we solve an optimization to minimize.
We implement our method on the commute network for Massachusetts, United States and the commute network for Hubei, China (the traffic distances are from online map services, and the disease data are simulated from our model). We observe an advantage of our method over the allocation strategies that ignore network structures, e.g., allocation by county populations and allocation by county-wise infection rates.
There have been studies on the cross-region allocation of testing resources for an infectious disease. Yin2021 proposed to optimize the distribution of treatment centers and resources among regions to minimize the total number of new infections and funerals. Baunez2020 studied the sub-national allocation of testing resources in Italy, using the slopes of confirmed cases versus test numbers in different regions. buhat2020optimal proposed to use natural language processing tools to allocate test kits more fairly among test centers in Philippines. However, none of these methods are based on the commute network structure.
proposed to use natural language processing tools to allocate test kits more fairly among test centers in Philippines. However, none of these methods are based on the commute network structure.
ou2020and proposed a framework for active screening of an SIS disease based on a contact tracing network. In their model, the disease has two compartments, S (susceptible) and I (infected). Each node in the network is an individual, and the state variable of a node is a binary variable indicating whether this individual is infected. They introduced a model to characterize how the network structure and screening strategy affect disease spread. They also proposed an optimization algorithm to compute the optimal testing allocation strategy.
Our disease model and optimization algorithm borrow ideas from , but there are major differences. We consider an SIR disease with four compartments, S, H, C and R. Our network is a commute network, where each node is a county and the state variable of a node is the 4-dimensional vector containing the number of individuals in each compartment. Our model, parameter estimation, and computation of allocation strategies are different from those in
proposed a framework for active screening of an SIS disease based on a contact tracing network. In their model, the disease has two compartments, S (susceptible) and I (infected). Each node in the network is an individual, and the state variable of a node is a binary variable indicating whether this individual is infected. They introduced a model to characterize how the network structure and screening strategy affect disease spread. They also proposed an optimization algorithm to compute the optimal testing allocation strategy. Our disease model and optimization algorithm borrow ideas fromou2020and
, but there are major differences. We consider an SIR disease with four compartments, S, H, C and R. Our network is a commute network, where each node is a county and the state variable of a node is the 4-dimensional vector containing the number of individuals in each compartment. Our model, parameter estimation, and computation of allocation strategies are different from those inou2020and.
The remainder of this paper is organized as follows: Section 2 contains our main results, including the disease model, parameter estimation, and computation of the optimal allocation strategy. Section 3 contains semi-synthetic experiments on the real commute networks for Massachusetts, USA and Hubei, China. Section 4 generalizes our model and algorithm to the problem of vaccine allocation. Section 5 concludes the paper.
2 The network-based active screening strategy
In an SIR disease model, the population is divided into three compartments: S (susceptible), I (infected, including the asymptomatic, presymptomatic and symptomatic cases), and R (removed, including the dead, recovered and vaccinated cases). Since an infected individual may not be diagnosed, we further divide compartment I into two sub-compartments: H (hidden) and C (confirmed). This results in four compartments: S, H, C and R, as shown in Figure 1. Suppose there are different counties. For simplicity, we assume the population of every county is time-invariant during the period of interest. Let denote the population of county , for , and write . Let , , and be the respective number of individuals in four compartments for county at time . Furthermore, let be the number of newly confirmed cases, where a newly confirmed case is an individual transferred from compartment H to compartment C. Write and define similarly. We only observe and at time .
We consider the scenario where the testing capacity is limited: At each time , we can only actively test individuals for all counties. An active screening strategy decides how many individuals to test for each county. It is represented by an allocation vector , where individuals of county will be tested at time , . We call the testing rate of county at time . This testing rate will affect the number of confirmed cases at . This is captured by an ordinary differential equation model to be introduced in Section 2.1.
Suppose testing is not available during (a period where the disease develops but no intervention is provided). After that period, a capacity of tests is available at every . At time , the administrator has to decide the allocation vectors in order to minimize the total number of hidden cases in all counties during to :
The data available to the administrator at are and . The administrator will first use these historical data to learn the patterns of disease progression (i.e., to estimate model parameters) and then use the estimated model to find the optimal allocation vectors.
Below, in Section 2.1, we introduce the disease model, where a key ingredient is using the commute network of counties to help model disease progression. In Section 2.2, we describe how to find the optimal allocation vectors when model parameters are known. In Section 2.3, we describe how to estimate model parameters.
2.1 A network SIR model with the screening intervention
For any , let be the traffic distance between county and county . We define a (weighted) commute network with nodes, where each node represents a county, and the weighted edges between two counties are
We use this network to describe the cross-county interactions. The larger , the more interactions between individuals of counties, and the easier transmission of disease. The parameter controls the level of travel restrictions; e.g., a ‘lockdown’ policy of all counties means .
We recall that is the population of county . Let , , and , for and . It suffices to model these ratios . We impose an ordinary differential equation model. The model has three parameters, the infection rate , the diagnosis rate and the recovery rate . Fixing a county , from time to time , we assume that a fraction of the infected cases are recovered, which gives . We also assume that the fraction of hidden cases getting confirmed is equal to , where is the testing rate allocated to county . It follows that . Furthermore, let denote the probability of a susceptible individual getting infected. We assume
where and is the adjacency matrix of the commute network. only depends on the fractions of hidden cases, but not the fractions of confirmed cases. This is because we make an (idealized) assumption that each confirmed case is properly quarantined and will not infect other susceptible individuals. The commute network plays a role in the disease spread across counties. If county is isolated (i.e., for all ), then . When county is not isolated, is affected by the of all counties connected to node in the commute network. Using the notion of , we further have and . This model is summarized in the following set of differential equations:
In the case that there is no intervention (i.e., is a zero vector) and that all counties are isolated (i.e., is a diagonal matrix), this model reduces to a standard 4-compartment SIR model for every county.
When there is no intervention, the way we incorporate the effect of the commute network is similar to that in a network-based compartmental model (wang2003epidemic). However, a standard network-SIR model is defined on the contact tracing network, where each node is an individual, and the state of a node at is a categorical variable describing which compartment this individual belongs to. In comparison, our model is defined on the commute network, where each node is a county, and the state of a node is characterized by a vector
is a categorical variable describing which compartment this individual belongs to. In comparison, our model is defined on the commute network, where each node is a county, and the state of a node is characterized by a vector. The way we incorporate the effect of intervention is similar to the model used in active screening (ou2020and). The model in ou2020and is for the SIS disease and on the contact tracing network of individuals, while our model is for the SIR disease and on the commute network of counties.
2.2 Optimization of allocation vectors
Since ’s depend on the allocation vectors in a complicated way as specified by models (3)-(4), it is not easy to solve this optimization directly. We borrow the idea in ou2020and to minimize a different objective function, , which is an (approximate) upper bound of .
By model (4), for every and ,
To simplify (6), first, note that we are interested in the early period of disease progression when the number of individuals in compartments , and are negligible compared with the number of individuals in . It yields . This motivates us to set for simplicity. Next, we consider the expression of in (3). We apply the universal inequality , for any and . It implies . When is sufficiently small, we apply the Taylor expansion to at the origin. It gives . We combine the above to get
By stacking (8) for , we obtain its matrix form as , where ‘’ is in the entry-wise fashion. In the parameter range of interest, and are all much smaller than . Therefore, the matrix is always a nonnegative matrix. We can iterate this inequality to get , where and ‘’ in the entry-wise fashion. It yields a relaxation of (5) as
We propose to compute the allocation vectors by solving (9). The gradient of has a nice form. Let be the gradient of with respect to . By direct calculations,
We apply the Frank-Wolfe algorithm. Let be the initial solution. At iteration , given , we compute as follows:
Compute at . Solve by minimizing , subject to the constraints of for every . This is a standard linear problem, which we solve using Dantzig’s simplex method.
Update , for every .
The input of the algorithm include the adjacency matrix of the commute network, model parameters , and the vector that contains the fractions of hidden cases at for all counties. In the next subsection, we describe how to obtain these input from historical data.
2.3 Estimation of model parameters
At time , the administrator observes the historical data of newly confirmed cases . Additionally, the county-wise populations and the between-county traffic distances are known. To run the algorithm in Section 2.2, we need to know and , where is the parameter in (2) for defining the commute network. We first estimate from knowledge of the disease; next, we estimate by fitting a network SIR model without intervention; last, we estimate directly from other estimated parameters.
For an SIR disease, the diagnosis rate and the recovery rate are related to the nature of the virus, thus invariant with locations or time. Since we only observe the number of newly confirmed cases, it is infeasible to estimate from data. We follow the convention to estimate them from clinical records. Let be the average time for an infected person to get recovered. It is common to estimate by . To get an estimate of the diagnosis rate , we note that there are two kinds of infected cases: One is the asymptomatic case, who never shows symptoms. We assume that these individuals will never be diagnosed without screening interventions (but they are infectious during the infected period). The other is the symptomatic case, who have symptoms after an ‘incubation’ period (in the incubation period, they have no symptoms but are still infectious). Let be the percent of asymptomatic cases, and let be the average time of the incubation period for symptomatic cases. We estimate by
For COVID-19, there have been many medical literature that provide information of . We set following Byambasuren2020, days following Li2020, and days following Beigel2020. It gives and for COVID-19.
We then estimate given . During the time period of , the progression of the disease follows a natural process without the screening intervention. We use a special case of models (3)-(4) with . The differential equation model becomes
The observed data are the numbers of newly confirmed case . By model (12), . It follows that
We introduce two approaches for the estimation of . Write to indicate its dependence on the unknown parameters. In the first approach, we note that when , by model (12), . Therefore, at each , we make a 1-step ahead forecast of by , where and is as in (2). We estimate by minimizing the sum of squared forecast errors:
In the special case of , it holds that , and the above reduces to a least-squares of versus , which agrees with the method in Becker1976 for a standard SIR model. In the second approach, given any , we can use to forecast based on model (12): We apply the recursive formula of , where is the same as above. This produces recursively. We minimize the total forecast errors:
The two approaches both have reasonably good performances in simulations.
Last, we estimate . Since (13) gives , we estimate by the 1-step ahead forecast:
3 Semi-synthetic experiments on real networks
We conduct semi-synthetic experiments on real commute networks to test the performance of our method. We consider the state of Massachusetts in the United States and the province of Hubei in People’s Republic of China. The traffic distance data came from Google Map and Bing Map (for Massachusetts) and Amap (for Hubei). We searched for the average driving distances between counties on these map services and used them to build the commute network as in (2). We also used the data of daily confirmed COVID-19 cases from the COVID-19 dashboard of Johns Hopkins University (for Massachusetts) and the National Health Commission of People’s Republic of China (for Hubei).
Hubei is the most severely infected province in China during February 2020, with a total of 49,497 confirmed cases on Feb. 29th. Wuhan, the capital city of Hubei, reported the first COVID-19 case in December 2019. Hubei has a population of around 58.5 million. It contains one big capital city, Wuhan, several middle-size cities like Huanggang, Xiangyang and Jingzhou, and some small provincial administered county-level divisions like Tianmen, and Shenlongjia (see Figure 2). Massachusetts is a state on the east coast of U.S. with a population of around 6.9 million, and had a total of 757,849 confirmed cases by Sep. 30, 2021. Compared with commute network of Hubei, the commute network of Massachusetts has less severe degree heterogeneity, and the population is also more evenly distributed among counties (see Figure 2).
3.1 Comparison of allocation strategies
We use the traffic distances and population sizes from real data, and simulate using the model in Section 2.1, with various choices of parameters . We fix and . The initial numbers of individuals in four compartments are set as follows: we pick a day at the early stage of the pandemic as and obtain , , and from real data; we then set , and . In this subsection, we assume that the true parameters are given and focus on evaluating the performance of the allocation strategy. Parameter estimation will be investigated in next subsection.
We compare our screening strategy with two other strategies: (a) Allocation by population: it sets a flat testing rate for all counties, so that the testing budget allocated to a county is proportional to its population. This strategy guarantees fairness, but it is less satisfactory for controlling the spread of disease. (b) Allocation by infection rate: it sets the testing rate to be proportional to . This strategy takes into account the difference of infection rates among counties, but it does not utilize the commute network. Additionally, we include the ‘no screening’ strategy as a reference. For each strategy, we measure its performance by tracing the number of cumulative confirmed cases, . Since only newly confirmed cases are reported in reality, this performance metric is natural, and it makes the comparison with real data easy. We note that this metric is not equivalent to counting the total number of hidden cases, and so it does not automatically favor our method.
The Massachusetts network. Massachusetts has 14 counties. Since Dukes and Nantucket reported very few confirmed cases in the early stage of the pandemic, for simplicity, we remove these two counties and consider the commute network of 12 counties. In the default parameter setting, we let . The default diagnosis rate and the default recovery rate are from the estimates based on clinical data of COVID-19 (see Section 2.3). We vary the other parameters . The resulting of different strategies are shown in Figure 3 (top). In all settings considered here, our allocation strategy outperforms the other strategies. The parameter controls the weights in the commute network. We consider (other parameters take the default values; same below). The larger , the more impact of the network structure on the pandemic progression, hence, the more advantage of our network-aware strategy. We also consider (the results for are in the top plot, and the results for are in the middle plots). The curve has different behaviors as varies: it has a super-linear growth for and a sub-linear growth for . In both cases, our strategy performs the best, and the no-screening strategy performs the worst; the strategy of allocation by infection rate is better than the strategy of allocation by population. We also consider (the results for M=100K are in the top plot, and the results for are in the bottom plots). Massachusetts has a population of 6.9 million. A daily testing budget of 10K has a small effect on the pandemic progression, so the performances of different strategies are pretty close. When the daily budget is increased to 200K, the effect of the screening strategy becomes significant: the number of cumulative confirmed cases on day 30 by our strategy is only half of the number associated with no screening.
In Figure 3 (bottom), we plot the cumulative number of confirmed cases on day 30 for each county, when parameters take the aforementioned default values. Interestingly, although our method is not a ‘fair’ screening strategy (non-flat testing rate), the benefit is ‘universal’ — compared with other strategies, there is a decrease of confirmed cases for all counties. We note that for some counties, they get lower allocation rates than they would have got from allocation by population or infection rate, but their confirmed cases are still reduced. It suggests that, to reduce the confirmed cases at one county, it is sometimes more effective to increase the testing rate of nearby highly infected counties than increasing the testing rate of this county. This explains why it is useful to take into account the network structure. In Figure 4, we take a close look at the allocation vectors. For each , we plot the allocated testing rate versus the infection rate of the same day. Only counties with a nonzero are shown in the plots. Our screening strategy is ‘sparse’: Each day, it puts all testing budget to a small number of counties. This is different from allocation by population or infection rate, where almost every county gets some testing resource each day. For all 30 days, the most frequently selected counties are Essex (17 days), Suffolk (17 days) and Norfolk (15 days). Norfolk and Suffolk have the highest degrees in the commute network, and Essex is close to Suffolk and Middlesex (the largest-population county). Our screening strategy tends to select counties with a high infection rate (e.g., Barnstable, Berkshire) in early periods and counties with a large population (e.g., Middlesex, Worcester) in the late periods.
The Hubei network. Hubei has 17 cities, including 12 prefecture-level cities, 1 autonomous prefecture and 4 provincial administrated county-level cities. Compared with the Massachusetts network, the Hubei network has several special characteristics. First, the capital city, Huhan, plays a dominating role. Wuhan has far more confirmed cases than other cities at the beginning of the pandemic. Wuhan also has the largest population. Second, the population of Hubei is nearly 8 times of the population of Massachusetts. Given the same testing budget (e.g., 100K), the effect of active screening is less significant for Hubei, and the allocation vector is also sparser. Third, the Hubei network has more severe degree heterogeneity. We use to measure the degree of a node. In the Hubei network, when , the three largest ’s are 0.848 (Ezhou), 0.797 (Huanggang) and 0.217 (Huangshi). Ezhou and Huanggang are hub nodes, having much higher larger degrees than the other nodes. In comparison, the three largest ’s for Massachusetts is 0.652 (Norfolk), 0.636 (Suffolk) and 0.556 (Hampshire), there is no clear hub node. Due to these special characteristics, the simulation results for Hubei are different from those for Massachusetts.
We set the default parameters as . We also consider , and (every time we vary one parameter and fix the other parameters as the default values). The results are in Figure 5 (top). We also observe that our screening strategy outperforms the competitors. Since Hubei has a much larger population and more confirmed cases than Massachusetts, the relative improvement of our strategy is smaller; however, the absolute number of reduced confirmed cases is indeed large. Figure 5 (bottom) displays the numbers of cumulative confirmed cases on day 30 for all 17 cities, when the parameters take default values. Our method does not uniformly reduce the confirmed cases for all cities. Instead, our method puts major efforts on reducing the confirmed cases of Huhan and Huanggang, which results in a significant reduction of overall confirmed cases in all cities. For cities such as Xiaogan and Jingzhou, our method yields a larger number of confirmed cases, compared with allocation by infection rate. This is different from the situation for Massachusetts, where the benefit our method is universal for all counties. Figure 6 shows the nonzero entries of for . Interestingly, our method puts all testing resources on Wuhan and Ezhou. As we have mentioned, Wuhan plays a dominating role in the pandemic progression. It is not surprising that Wuhan is always selected by our method. Ezhou is a comparably small city, but it adjoins three big cities, Wuhan, Huanggang and Huangshi, all of which have a large number of confirmed cases. Ezhou also has the largest degree in the commute network. This explains why Ezhou is always selected by our method. We note that this solution is for a daily budget of M = 200K. This is a very limited budget, given that Hubei has a population of 58.9 million. As a result, our method tends to put all testing resources on Wuhan and its close neighbor Ezhou. In the supplemental material, we plot the allocation vectors for M = 500K. The results are more similar to those for Massachusetts.
3.2 Parameter Estimation
Our screening strategy requires the input of , where is known and are related to the nature of the disease and estimated from clinical data. We only use the historical data of confirmed cases to estimate . We now study the performance of the estimators in Section 2.3. Due to space limit, we only present the results for the second estimator (12). The results for the first estimator are in the supplemental material. We fix the time period of March 15, 2021 to April 1, 2021 and use the daily new confirmed cases of counties in Massachusetts. It gives and . Since there is no ground truth, we evaluate the prediction performance. We use these estimates and model (12) to forecast the numbers of confirmed cases during March 23-29, 2021 and compare them with the actual reported numbers. For Franklin, Hampden and Suffolk, the predictions fit the real data well. For the other counties, the predicted numbers are higher than the actual reported numbers.
The results are not surprising. Our model in Section 2.3 assumes there is no intervention. However, in reality, Massachusetts took actions such as sending students home and requesting quarantines of travelers to help slow down the spread of disease. This explains why the predicted numbers are higher than the actual numbers. It also suggests that the administrator should frequently update the estimates using a moving window, because can vary with time, as a response to the change of disease control measures.
4 Extension to the allocation of vaccines
Vaccination is another way to control disease spread. When the daily capacity of vaccination is limited, we are interested in optimizing the allocation of vaccines. Existing literatures proposed vaccine allocation strategies based on the contact tracing network of individuals (ZhangYao2015) or socioeconomic features such as age and whether being a healthcare worker (Talbot2005). We instead consider the vaccine allocation among counties. Similarly as before, we assume that the administrator has historical data of and needs to decide a vaccination strategy for every , where is the vaccination rate allocated to county at time . We first extend the model in (3)-(4). The main difference between screening and vaccination is that the screening intervention moves individuals from compartment H to compartment C, while the vaccination intervention moves individuals from compartment S to compartment R. The modified model is
Suppose the total vaccination budget is . We compute the allocation vector from solving the optimization:
We can similarly develop an Frank-Wolfe algorithm to solve it. Comparing with in (8), we see the difference between the vaccination strategy and the screening strategy: The past screening allocations will not affect , but the past vaccination allocations will affect . The reason is that a susceptible individual can get multiple rounds of screening but only one round of vaccination; consequently, we cannot neglect the past vaccination information, unlike in the situation of screening.
We consider the allocation of testing resources on a commute network of counties. We model the progression of pandemic using a 4-compartment network-SIR model, that simultaneously incorporate the effects of network structure and screening strategy. We propose algorithms for estimating model parameters and computing the optimal allocation strategy. We evaluate their performances on real data for Massachusetts and Hubei.
There are future directions for extending our work. First, we assume that the infection rate and network effect parameter are time-invariant. In real life, they may change with the disease control measures such as travel restrictions and quarantine policies (yan2021better). It is interesting to extend our model to time-varying . In fact, in our current method, by using a moving window for parameter estimation and letting for strategy computation, it does support a timely, frequent update of testing resource allocation.
Second, our screening strategy is purely based on ‘efficiency’, without considering ‘fairness’. As a result, the strategy computed by our method tends to put testing resources on a small number of counties each day.
We can modify our method to encourage ‘fairness’. One possible approach is adding a penalty to the objective (9), where and controls the trade-off between ‘efficiency’ and ‘fairness’. Our optimization algorithm can be easily extended to accommodate this penalty. Third, our disease model uses no demographic variables of counties. We may borrow the deep neural network approach in
controls the trade-off between ‘efficiency’ and ‘fairness’. Our optimization algorithm can be easily extended to accommodate this penalty. Third, our disease model uses no demographic variables of counties. We may borrow the deep neural network approach intang2021interplay to incorporate county-wise feature vectors into our network-SIR model.