I Introduction
At the time of writing, the COVID-19 pandemic had already caused more than half a million deaths worldwide; at the time of re-writing, some weeks later, this number was approaching one million. The effects of the virus have been widespread and substantial, from the collapse of healthcare systems Nations (2020); yem (2020); sal (2020) to the enforcement of isolation and quarantine. In the case of Nepal, the national lockdown lasted for days uninterrupted Pradhan (2020).
In these circumstances, identifying reliable and effective disease control policies is of utmost importance. Here by “policy” we mean a deliberate intervention intended to mitigate the effects of a disease as it runs its course. In much of the mathematical literature on epidemiology, the process of generating a policy is as follows Keeling and Rohani (2008): (1) based on their intuition, expert epidemiologists propose a number of suitable policies to control the disease; (2) the impact on the population of each of the considered policies is assessed through dynamical models of disease spread; (3) the outcomes of all policies are compared and a decision is taken as to which one is deemed to be the best.
This three-step process has two disadvantages. First of all, the class of policies devised by an expert could well be suboptimal, since the optimal policy (under some figure of merit) could be extremely complicated and counter-intuitive. Second, the method requires one to numerically simulate each policy. Given the exponential growth of the number of policies in the number of control parameters, the number of policies considered may be on the order of billions; hence, this strategy is not guaranteed to identify the optimal disease control policy in time to enforce it.
In this paper we formulate the problem of identifying an optimal policy for disease control as an optimization problem. Using tools from optimization theory and machine learning Goodfellow et al. (2016)
, we propose efficient heuristics to find optimal disease policies for a given model of a disease.
To illustrate the power of our approach, we use these optimization techniques to generate long-term plans to fight COVID-19, under the assumption that the disease’s dynamics are accurately captured by a variant of the Susceptible-Exposed-Infected-Recovered (SEIR) compartmental model Keeling and Rohani (2008) proposed in Kissler et al. (2020a). Our results confirm that optimal policies tend to be too complicated to be devised by a human.
In reality, most epidemiological models only provide short-term approximations to the spread of the disease, with long term projections becoming less and less reliable Reich et al. (2019). In addition, notwithstanding the enormous knowledge gathered since the initial COVID-19 outbreak, many questions remain to be answered regarding the correctness and accuracy of compartmental models such as SEIR: their basic assumptions (e.g., are recovered patients temporarily or permanently immune to the disease?); the actual value of the model’s parameters (e.g., the basic reproduction number ); and the role of variables not modeled (e.g., age, geographic distribution, contact tracing policies, role of superspreaders).
In this regard, the goal of this work is not to propose a concrete government policy, but rather to present an efficient method to obtain an optimal one, given all the available information. To estimate the effect of our methods in a realistic scenario, we conduct a numerical simulation where we re-calculate the optimal policy plans every month, based on new, incoming data.
Ii The framework
Our starting point is an epidemic that affects a closed population. This assumption is not limiting, since a large ensemble of population centres where individuals are free to commute can also be modeled as a closed system Keeling and Rohani (2002). To gain an understanding of how the disease spreads, it is standard to divide the population into different sectors or compartments Keeling and Rohani (2008) (see Fig. 1
). One can define, e.g., the compartment of all those individuals who are currently infected. This compartment can, in turn, be sub-divided into different compartments, such as symptomatic/asymptomatic. Once the number of relevant compartments is fixed, one can estimate the occupation of each of them and arrange the resulting numbers in a vector
. The disease is subsequently analysed by looking at how changes with time.In order to control or even extinguish an epidemic, governments can enforce a number of different measures: mass vaccination, physical (or social WHO ) distancing measures, or even a full lockdown, are common examples of interventions aimed at fighting the disease. When and to which degree such measures are applied is determined by the disease control policy. Consider a disease control policy based on random vaccination campaigns, where the intervention consists of vaccinating a number of individuals at random per day. Call the number of individuals vaccinated on day . This function, between the initial and final times and (i.e. on the interval ) determines the government’s vaccination policy. Similarly, let take the value if the country is in lockdown on day and if it is not. Then the government’s lockdown policy between times and corresponds to the function .
(1) |
If the government is intervening both through vaccination and lockdown, then its disease control policy will be identified by both the functions and .
The above are instances of non-adaptive policies for disease control, because the functions just depend on the time , and not, e.g., on the current value of the death toll. A general (adaptive) policy for disease control would take into account the whole past history of data gathered by the government before deciding what to do at each step. Although in the following all our proposed policies are non-adaptive, the formalism we introduce allows one to optimize over adaptive policies as well.
In conclusion, a disease control policy can always be identified with a vector function of the time and perhaps some other observed variables , where each vector entry represents a type of government intervention at time . In turn, we can use a variable vector to parametrize the class of considered policies, that is, . Since completely determines the policy , we can also regard the parameters as the disease policy. We will do so from now on.
The applied policy is assumed to influence the compartment occupation within the time interval . That is, is both a function of and . In this work we are interested in devising policies for disease control which guarantee that the spread of the disease evolves under certain conditions. For example, any country has a fixed number of critical care capacity beds, which we denote by . At each time , it is desirable that the number of individuals admitted to critical care in hospitals, , does not exceed that capacity. That is, we require that
(2) |
We will call any such condition on the evolution of the disease a constraint.
Finally, among all policies satisfying the desired constraints, we typically wish to identify the one that minimizes a certain quantity. For example, for many diseases, a simple physical distancing policy that satisfies the constraint (2) consists of declaring a lockdown throughout the whole time interval , i.e., for . This policy is arguably impractical, difficult to enforce and harmful to its citizens’ psychological health as well as to the national economy. More rationally, one is interested in finding alternative disease policies which, while respecting the critical care occupation constraint, minimize the number of days of lockdown. Alternatively, one may wish to minimize the total number of deaths during the interval , or the number of infected people. In general, the figure of merit, or quantity that we wish to minimize will be a complicated functional of the considered policy and . We will call this functional the objective function.
Iii Models and optimization
The optimization problem sketched above is mathematically ill-defined, unless we specify how varies with the parameters determining the policy. In order to predict the natural course of a disease or how a given policy might affect its spread, epidemiologists make use of mathematical models. In this paper, we will mainly be concerned with deterministic compartmental models
. In these models, the whole population is divided into a number of basic compartments and the interactions between those compartments are modeled through a system of ordinary differential equations. Given the occupation
of the compartments at time , these models allow us to compute the value of at any instant as a function of the policy . That is, each model provides an implicit functional relation of the form .Past literature on disease control has made extensive use of compartmental models to recommend specific strategies for policy-makers in an effort to control the spread of disease. In all cases that we know of, suitable policies are devised through a mixture of intuition and grid search, see McLean and Blower (1993); Agur et al. (1993); Keeling et al. (2001); Kissler et al. (2020a). The starting point is a family of disease control policies with one or two unknowns. For instance, in pulse vaccination Agur et al. (1993), those unknowns are the time intervals between two random mass vaccinations and the vaccination rate. By running many computer simulations for different values of these unknowns or parameters, one eventually finds a policy for disease control satisfying the desired constraints that minimizes the considered objective function.
With this scheme, the number of computer simulations required to identify the best policy within the family scales exponentially with the number of parameters that specify the policy. As soon as exceeds or , it is unrealistic to expect this method to provide an optimal disease policy in any reasonable amount of time, let alone in a time where it can be implemented.
In this paper, we propose a scheme that allows for the optimization over policies specified by thousands of parameters in the space of a few hours. Our scheme, detailed in Appendices B and C, is based on a standard tool in optimization theory and machine learning known as gradient descent Boyd et al. (2004); Duchi (2018). Starting with a rough guess for the optimal policy , gradient descent methods generate a sequence of policies
which typically exhibit increasingly better performance. Although the gradient method is not guaranteed to converge to the optimal policy, after many iterations it generates solutions that are good enough for many practical problems. In fact, gradient descent is the method most commonly used to train deep neural networks
Goodfellow et al. (2016)Cristianini and Shawe-Taylor (2000).Importantly, the computational resources required to carry out the gradient descent method are comparable to the cost of running a full simulation between times and with the considered disease model. Furthermore, the necessary computations can be parallelized for policies depending on too many parameters. Although the focus of this paper is on compartmental disease models, our main ideas can also be used to understand ecological systems undergoing a more complex dynamics Keller et al. (2013), see Appendix E. Even in such complicated scenarios, the former scaling laws hold: provided that we can run the considered disease model, we can apply the gradient method to optimize over policies of disease control.
To illustrate our approach for policy design, we will use a variant of the extended “susceptible-exposed-infected-recovered” (SEIR) disease model proposed in Kissler et al. (2020a) to predict the impact of COVID-19 in the USA, a schematic of which is shown in Fig. (1) and the details of which appear in Appendix A. The clinical parameters of the model, such as recovery and hospitalization rates, were estimated in et al. (2020); Kissler et al. (2020b), based on early reports from COVID-19 cases in the UK, China and Italy. Following the model in Kissler et al. (2020a), the disease transmission is assumed to be seasonal by analogy with the known behavior of betacoronaviruses such as HCoV-OC43 Kissler et al. (2020b), with a baseline reproduction number between and , following fits of the early growth-rate of the epidemic in Wuhan Li et al. (2020); Riou and Althaus (2020).
A relevant compartment in this model is , the population occupying a critical care bed^{1}^{1}1More precisely, in our model, the relevant compartment is , or the fraction of the population in critical care at instant . This has to be compared with , the number of critical care beds per inhabitant. relate to through the expressions , where is the population size. at time . The patients sent to critical care cannot breathe unassisted, and thus it is fundamental to ensure that such capacity is not surpassed, namely, that the constraint (2) holds. For our simulations, we chose a population size of million and . That is, we assumed that the healthcare system provides critical care beds per million inhabitants. This is a good approximation to the healthcare capacity of many European countries, as well as the USA.
According to the chosen model, without intervention the number of citizens requiring a bed in a critical care unit evolves according to Figure 2 (see Appendix A for the exact initial conditions of our numerical simulations). As the reader can appreciate, between the third and seventh month, the number of people in need of critical care exceeds the capacity of the considered healthcare system by times.
At the time of this writing, none of the potential vaccines for COVID-19 have passed the necessary clinical trials to be considered safe to administer vac . In lieu of this, most governments have opted to control the disease via distancing measures and/or lockdowns. The effect of implementing a policy is to multiply the disease’s basic reproduction number by a factor of Kissler et al. (2020a, b), i.e. , where
(3) |
At this point we shall make a distinction: a policy which outputs a binary value shall be known as discrete and correspond to situations in which a lockdown is either on or off. For discrete as in Eq. (1), the effect of a lockdown () results in the reduction of the transmission rate in Eq. (3). On the other hand, when the population is free to interact () then and there is no change in the disease’s basic reproduction number.
On the other hand, a policy , which outputs a value on the interval will be known as continuous and correspond to physical-distancing measures. Continuous values of correspond to intermediate policies (for example mandatory face masks, suspension of sport events, remote working, school closures), the effect of which can be tentatively estimated from available data et al. (2020).
If distancing measures are the only type of intervention that a government uses, then a non-adaptive continuous policy is fully determined by the function . To begin with, we will assume that the government can only declare new measures at the beginning of each week, i.e., that does not vary within the weekly intervals , for . With these conditions, can be expressed as
(4) |
where
is the characteristic function of week number
, i.e., equals if is in the -th week; and , otherwise. The characteristic function, thus, ensures that there is only one policy per week.denotes the sigmoid function, which guarantees that
by continuously mapping the variables , such that it is everywhere differentiable, making it amenable to the gradient method. The parameters to optimize over are , since they fully define the government’s disease policy.Having chosen either discrete or continuous measures, one must choose a figure of merit or objective function to optimize over. Our formalism, explained in detail in Appendix C, allows us to minimize expressions of the form
(5) |
In this paper, we consider optimizations that never allow the number of people in the critical care compartment to exceed the maximal occupancy, i.e., we impose the constraint in Eq. (2). Thus we consider optimizations of the form
(6) |
In what follows, we split our analysis by choosing different objective functions to minimize over, considering policies which are either continuous, as in Eq. (4), or discrete as in Eq. (1).
iii.1 Objective function 1: minimizing physical distancing and lockdown measures
Suppose, for instance, that one is confident that a vaccine for COVID-19 will be developed within the next two years. In this case, one is interested in minimizing the aggregate economic cost associated with the physical-distancing measures implemented by the government over these two years. Thus for the optimization in Eq. (6), we consider Lagrangians (objective functions) of the form
(7) |
Figure 3 shows the result of applying the gradient method to minimize this functional for the cost function , under continuous policies of the form (4) satisfying the critical care capacity constraint (2). The plot shows both the critical care occupancy and the physical-distancing measure between times and . The aggregate cost of the optimal policy is equal to the economic cost of sustaining a full lockdown for days. As anticipated, the optimal policy found by the computer is very complicated. Notice that the critical care occupancy grows quickly towards the end of the plot. The reason for this is that we asked the computer to minimize the total time in lockdown over a fixed period of two years, which is exactly what it did: it minimized the physical-distancing measures in the first two years with complete disregard for what could happen next. To overcome such a pathological case, one can introduce additional terms into Eq. (7) that would constrain the final slope of the curve , to make it less steep, or even decreasing.
In some circumstances, the only distancing measures considered by governments are discrete: lockdown on or off as in Eq. (1). These kinds of optimizations over discrete variables cannot be carried out directly with the gradient method. In Appendix D, we propose two different ways to tackle this problem.
One way consists of using stochastic gradient descent to obtain an optimal non-deterministic weekly policy that is later “binned” to arrive at a deterministic policy for disease control, see Appendix
D. The results of this first method are shown in Figure 4. This time the critical care occupancy curve touches the critical care capacity just after the end of year . The reason for this is that we demanded lockdowns to last exactly one week: had we allowed the government to declare a lockdown on any day of the week, the computer would have found a tighter solution, with every peak of the red curve touching the dashed line. Even under this discrete weekly simplification, the solution found by the computer is non-trivial: it requires the government to declare a lockdown times. The total length of the lockdown in the course of two years is days.Our second discretization method consists in parametrizing the disease control policy through the specific times where lockdown is declared. In other words, , and lockdown is assumed to take place within the time intervals . In this parametrization, lockdowns can be declared or lifted at arbitrary times within , and not only on Mondays, like in the first method. This second discretization method has the advantage of allowing one to set the maximum number of lockdowns throughout the period . For , the corresponding critical care occupation and lockdown graphs are shown in Figure 5. The total length of the lockdown is days.
iii.2 Objective function 2: achieving herd immunity
One problem with the policies above is that they require one to implement lockdown measures over long periods of time. Another strategy to fight COVID-19 consists of steering the population towards herd immunity, without violating the constraint on the critical care capacity. This is captured by using a functional of the form
(8) |
in Eq. (6), where (as depicted in Fig. 1) is the component of that denotes the proportion of individuals in the population who are susceptible to the disease. is the proportion of susceptible individuals required for herd immunity to be guaranteed; thus, once although people will continue to become infected, the natural evolution of the disease will be such that the rate and number of infected quickly dies out.
Fig. 6 illustrates the results of optimizing the objective function in Eq. (8) for continuous physical-distancing measures. The number of susceptible individuals reaches on day , after which the disease can be considered extinct.
Notwithstanding the unprecedented velocity in the development of a COVID-19 vaccine, with many pharmaceutical companies and research institutions that already reached advanced stages in clinical trials on humans, as of now, no vaccine has completed all testing phases before being approved vac . One may wonder what will happen if the situation were different with a possible vaccine not available in a time span of months.
A possible policy to reach faster herd immunity, while not saturating the critical care capacity, could be to deliberately infect a fraction of the population at the same time as imposing physical distance measures. It is important to remark that, even allowing this procedure on a voluntary basis among the population which are at the lowest health risk (e.g., young adults in good health), this practice poses high ethical concerns; as such, we remark that we do not endorse applying such “infection policies” on a human population. We just consider them in this paper for the academic purpose of exploring the power of the gradient method to devise complex policies for disease control. In fact, the results of our simulations indicate that the advantage with respect to simple physical distancing policies are minimal (cf. Fig. 7).
That being said, consider the following model of such an unethical policy. Let be the rate at which the government intentionally infects the general population on day . Then, a lockdown-infection policy is identified with the functions and . As with , one can parametrize the function to change once per week. Denoting by the maximum rate of this deliberate infection, a suitable parametrization for is
(9) |
The disease model used to predict the impact of this policy will depend on how the government decides to deal with those infected: (a) the infected are quarantined until they overcome the disease; (b) the infected are allowed to mingle with the general population. In either compartmental model, one can compute the functions , with . We are interested in minimizing the functional given by (8) for the problem in Eq. (6).
The results for both quarantined/non-quarantined policies for are shown in Figure 7, where we included the curves of the no-infection policy for comparison. As can be seen, the unethical policy of intentional infection brings the population to herd immunity, and thus the disease eventually to extinction, just slightly earlier than the optimal policy based on physical-distancing measures only.
Note how tight the constraint on critical care bed occupation becomes for policy (b). Figure 8 shows the strategy achieving this result. As in the previous cases, the optimal policy is extremely complicated in terms of physical distancing measures. The infection policy is, however, very simple: except at the beginning of the outbreak and after the disease’s extinction, infection occurs at the maximum rate. Notice also that, once the disease is extinct, the computer recommends a permanent lockdown, the reason being that the objective function in Eq. (8), contrary to Eq. (7), does not penalize the overzealous use of physical distancing measures.
Iv Models vs. reality
In practice, the predictions of any mathematical model for a physical system will not be perfect for a number of reasons. First, basic parameters of the model, such as the transmission rate or the initial occupation , are only known up to approximations. Even if reality were exactly described by a particular mathematical model, small errors in such parameters would accumulate in the long run, making long-term predictions unreliable. Second, reality is never exactly described by mathematical models: on the contrary, any tractable disease model is, at best, a rough approximation to reality. Consequently, even the most successful disease models in the market cease to deliver solid predictions beyond weeks Reich et al. (2019).
These considerations make us question how practical a two-year disease control policy really is. Consider the policy depicted in Figure 3, which was obtained by applying the gradient method to the SEIR model in Kissler et al. (2020a). Here, the model parameters correspond to the average values of the parameter ranges in Table 1 of Appendix A. In reality of course, the values of the parameters are never all equal to their averages, so we proceed to generate sets of parameters with some fluctuations. Let
be the vector with entries given by the difference between the upper and lower bounds of all the entries of the table, and suppose that the actual parameters of “reality” are unknown and uniformly distributed in the region of values
, where can be interpreted as the amount of noise or uncertainty. How robust is a policy to uncertainty in the initial parameters?Fig. 9 shows the result of generating independent parameter samples from the region , corresponding to a uncertainty, with respect to the given interval of values, and running the corresponding models for the optimal physical distancing policy in Fig. 3. As one can see, for some sampled values of parameters, the critical care capacity of the healthcare system is exceeded. This is not surprising, since the policy depicted in Fig. 3 was devised to perform well under the assumption that and not , i.e., for the model parameters corresponding to their average value.
In order to tame the behavior in the plot in Fig. 9, one can employ a variant of the gradient method called stochastic gradient descent (see Appendix B). This allows one to optimize over long-term policies without violating the constraints of the problem for a range of values for the parameters of the model as well as the initial conditions ^{2}^{2}2Rigorously speaking, stochastic gradient descent just minimizes the average value of the objective function . By adding to
sufficiently strong penalties to the violation of each optimization constraint, we make sure that such constraints will hold with high probability when
and are sampled from the training distribution.. Fig. 10 shows a lockdown policy minimizing the physical distancing measures under the condition that constraint (2) holds for different values of , i.e., the critical care capacity is not exceeded. This time, the violation of condition (2) is neither so extreme nor so frequent. This comes, however, at the cost of enforcing physical distancing measures with a cost equivalent to days of lockdown. Repeating the optimization for , Fig. 11, we see that the critical care capacity is rarely surpassed. However, this time the total cost is equivalent to days of lockdown.This result is what one would have expected. As time goes by, the predictions of the disease model for different values of diverge: any policy which aims to satisfy constraint (2) for large ranges of these parameters will necessarily require extensive physical distancing measures.
In practical policy-making, graphs such as Figs. 10 and 11 should not be understood to represent the actual physical distancing policy, but rather to provide a provisional policy plan. A policy plan gives a recommendation for action for the immediate future, given the current knowledge of the disease. In Fig. 10, the policy plan is advising not to declare physical distancing measures in the first weeks. That is the measure that the government should adopt then. After a first time period, say four weeks, more data will have been gathered: this will allow us to obtain a better estimate of the parameters , and then re-run the models for another two years ahead. The measure to enforce should then be whatever the new policy plan recommends for the following four-week time period. The process is then repeated.
To test how this idea would perform in practice, we consider a scenario where the parameters defining the disease model are unknown, but the region in parameter space in which they live shrinks every month (to be precise, we used a -day period, corresponding to four weeks). That is, at month , the government is informed that the parameters satisfy . Every four weeks, the policy is recalculated to minimize the physical distancing measures for the rest of the two years ahead, using the range . The final curves for the critical care occupation and the physical distancing measures are shown in Fig. 12 for , in a sequence of shrinking regions , for each month (red region) and in the case of a fixed uncertainty region corresponding to the last month, i.e., (inner dark blue region). The total cost of the physical distancing measures is equivalent to lockdown days. This has to be compared with the cost of days predicted by the initial policy plan under the assumption .
In principle, one could further decrease the total planned cost by devising adaptive policy plans
, where the measure to be taken at each moment depends not only on the current time
, but also on the past history of physical distancing measures and their observed effects. In fact, we tried optimizing over adaptive policies described by a continuous version of a neural network architecture known as Long Short-Term Memory (LSTM)
Hochreiter and Schmidhuber (1997). In all our numerical experiments, such simple LSTM architectures could not improve the performance of non-adaptive strategies, but this could be due to ineffective training on our side.V Conclusion
In this paper, we have applied standard tools from optimization theory and machine learning to identify optimal disease control policies, given an epidemiological model. This is in stark contrast to standard practice in mathematical epidemiology, where human intuition is used to narrow down the considered set of policies to a uni-parametric family. We saw that the optimal solutions found by our algorithms are highly counter-intuitive, and thus unlikely to be identified by a human. This supports the idea that policies for disease control should be based on a combination of both human expertise and machine learning.
To illustrate our ideas, we studied a scenario in which a computer is tasked with outputting the minimal amount of physical distancing measures for an epidemic in a hypothetical country, in such a way as to never exceed the critical care bed capacity. We looked at situations in which these measures were continuous (recommendations on the interval ) as well as discrete (either or ) – a lockdown that is off or on, respectively for periods of 2 years. We experimented with measures which are just allowed to change weekly, as well as those in which there is a maximum number of lockdowns that is allowed to be declared. When considering unethical interventions, in which parts of the population are deliberately infected, we found that there is little difference in the time for the disease to become extinct, compared with the natural evolution of the disease.
We examined the problems that one may encounter when applying these techniques to scenarios where the model parameters are not known with high accuracy, which led us to propose practical policy plans which must be continually revisited, to account for our ever-changing and ever-growing knowledge in an epidemic. We tested the viability of this approach by simulating a scenario where the uncertainty on the disease model parameters decreases with time. As expected, the final policy implemented was safe for the final range of parameters and required considerably less physical distancing measures than the initial policy plan hinted.
In this last regard, an interesting problem for future research is how to devise adaptive policy plans for disease control, where the actual measure at each time depends on the whole history of disease indicators accessible to the government. In theory, such plans should predict lower values of the average objective function in scenarios where the model parameters are unknown. In our experience, though, gradient descent alone seems to be unable to beat the non-adaptive score.
Finally, we would like to remark once more that, since the optimization problems we dealt with in this paper are non-convex, the gradient method is not guaranteed to converge to the minimum of the (average) objective function. While conducting this research, in order to convince ourselves that the solutions found by our numerical methods were close to optimal, we had to repeat our optimizations several times, with different initial policies and learning rates . Such a redundant use of computational resources would have been entirely avoidable if we had had some rough approximation to the exact solution of the problem. Hence we conclude this paper with a challenge for the operations research community: develop mathematical tools which allow one to lower bound the solution of minimization problems involving ordinary differential equations.
Acknowledgements.
We thank Luca Gerardo-Giorda and Mario Budroni for useful discussions. C.B. and Y.G. acknowledge funding from the Austrian Science Fund (FWF) through the Zukunftskolleg ZK03.References
- Nations (2020) U. Nations, “A un framework for the immediate socio-economic response to covid-19,” (2020).
- yem (2020) Medecins sans Frontieres (2020).
- sal (2020) Medecins sans Frontieres (2020).
- Pradhan (2020) T. R. Pradhan, The Kathmandu Post (2020).
- Keeling and Rohani (2008) M. J. Keeling and P. Rohani, Modeling Infectious Diseases in Humans and Animals (Princeton University Press, 2008).
- Goodfellow et al. (2016) I. J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, Cambridge, MA, USA, 2016).
- Kissler et al. (2020a) S. Kissler, C. Tedijanto, M. Lipsitch, and Y. Grad, “Social distancing strategies for curbing the COVID-19 epidemic,” (2020a).
- Reich et al. (2019) N. G. Reich, L. C. Brooks, S. J. Fox, S. Kandula, C. J. McGowan, E. Moore, D. Osthus, E. L. Ray, A. Tushar, T. K. Yamana, M. Biggerstaff, M. A. Johansson, R. Rosenfeld, and J. Shaman, Proceedings of the National Academy of Sciences 116, 3146 (2019).
- Keeling and Rohani (2002) M. J. Keeling and P. Rohani, Ecology Letters 5, 20 (2002).
- (10) “World Health Organization, Press briefing, March 20th 2020,” .
- McLean and Blower (1993) A. R. McLean and S. M. Blower, Proceedings of the Royal Society of London. Series B: Biological Sciences 253, 9 (1993).
- Agur et al. (1993) Z. Agur, L. Cojocaru, G. Mazor, R. M. Anderson, and Y. L. Danon, Proceedings of the National Academy of Sciences 90, 11698 (1993).
- Keeling et al. (2001) M. J. Keeling, M. E. J. Woolhouse, D. J. Shaw, L. Matthews, M. Chase-Topping, D. T. Haydon, S. J. Cornell, J. Kappey, J. Wilesmith, and B. T. Grenfell, Science 294, 813 (2001).
- Boyd et al. (2004) S. Boyd, L. Xiao, and A. Mutapcic, lecture notes of EE392o, Stanford University, Autumn Quarter (2004).
- Duchi (2018) J. Duchi, https://web.stanford.edu/class/ee364b/lectures.html (2018).
- Cristianini and Shawe-Taylor (2000) N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (Cambridge University Press, 2000).
- Keller et al. (2013) J. P. Keller, L. Gerardo-Giorda, and A. Veneziani, Journal of biological dynamics 7 Suppl 1, 31—46 (2013).
- et al. (2020) F. et al., Imperial College (2020).
- Kissler et al. (2020b) S. M. Kissler, C. Tedijanto, E. Goldstein, Y. H. Grad, and M. Lipsitch, Science 368, 860 (2020b).
- Li et al. (2020) Q. Li, X. Guan, P. Wu, X. Wang, L. Zhou, Y. Tong, R. Ren, K. S. Leung, E. H. Lau, J. Y. Wong, X. Xing, N. Xiang, Y. Wu, C. Li, Q. Chen, D. Li, T. Liu, J. Zhao, M. Liu, W. Tu, C. Chen, L. Jin, R. Yang, Q. Wang, S. Zhou, R. Wang, H. Liu, Y. Luo, Y. Liu, G. Shao, H. Li, Z. Tao, Y. Yang, Z. Deng, B. Liu, Z. Ma, Y. Zhang, G. Shi, T. T. Lam, J. T. Wu, G. F. Gao, B. J. Cowling, B. Yang, G. M. Leung, and Z. Feng, New England Journal of Medicine 382, 1199 (2020), pMID: 31995857.
- Riou and Althaus (2020) J. Riou and C. L. Althaus, Eurosurveillance 25, 2000058 (2020).
- (22) “Coronavirus Vaccine Tracker,” .
- Hochreiter and Schmidhuber (1997) S. Hochreiter and J. Schmidhuber, Neural computation 9, 1735 (1997).
- Kingma and Ba (2015) D. P. Kingma and J. Ba, in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, edited by Y. Bengio and Y. LeCun (2015).
- num (2016) Front Matter (John Wiley & Sons, Ltd, 2016).
- Cook (1971) S. A. Cook, in Proceedings of the Third Annual ACM Symposium on Theory of Computing, STOC ’71 (Association for Computing Machinery, New York, NY, USA, 1971) p. 151–158.
- Logan (2000) D. L. Logan, A First Course in the Finite Element Method Using Algor, 2nd ed. (Brooks/Cole Publishing Co., USA, 2000).
Appendix A Models for the spread of COVID-19
In all our numerical simulations, we will assume that the dynamics of the COVID-19 are well approximated by a compartmental model of the SEIR type. When the government policy reduces to enforcing physical distance measures, we will adopt a simplified version of the model used in Kissler et al. (2020a). This model divides those infected with COVID-19 into three different compartments: or those who recover by themselves from the disease; , those who require hospitalization but do not enter a critical care unit; and , those who are both hospitalized and visit a critical care unit before recovery. The dynamics of the model are governed by the system of ordinary differential equations below, see the diagram in Figure 13:
(10) |
Here
(11) |
denotes the virus’ transmitivity, that is subject to seasonal variability. models the effect of a government-mandated lockdown on the virus’ transmission rate. The values of the remaining parameters are taken from Kissler et al. (2020a), and appear in Table 1.
parameter | value | units |
---|---|---|
none | ||
none | ||
days | ||
days | ||
none | ||
none | ||
none | ||
days | ||
days | ||
days | ||
days | ||
[2, 2.5] | None |
If the government has the option of deliberately infecting random samples of the population with COVID-19, then the model gets slightly more complicated. If those infected are quarantined until they overcome the disease, the new model, depicted in Figure 14, is described by the system of equations:
(12) |
If, on the contrary, the government allows those deliberately infected to mingle with the general population, the equations above simplify considerably: it suffices to replace all instances of the term in eqs. (10) by .
In all our numerical simulations, we take the total population to be million; the critical care bed capacity per inhabitant is also taken to be . We assume that the government starts its intervention on day , days after the outbreak of the disease. We model the disease outbreak by assuming that, at time , there are individuals in compartment . By default, the values of the disease parameters are taken to be the arithmetic means of the intervals shown in Table 1. We assume that the government can deliberately infect individuals per day. This sets a value for of .
Appendix B The gradient method
Given functions , for , consider the following optimization problem:
such that | (13) |
Each of the conditions is called a constraint.
Define . If is sub-differentiable, a simple heuristic to solve this problem is the projected gradient method Boyd et al. (2004). Call the solution of the problem. Starting from an initial guess , the gradient method generates a sequence of values with the property that , provided that are, respectively, a convex set and a convex function Boyd et al. (2004). The sequence is generated recursively via the relation
(14) |
where, for any set , denotes the point that minimizes the Euclidean distance, i.e., .
Unfortunately, in the problems we encounter in the main text, is not convex and, sometimes, neither is . This implies that the sequence output by the projected gradient method is not guaranteed to converge to the solution of the problem, but to a local minimum thereof.
In machine learning, optimization problems with non-convex objective function and
are legion. To solve them, deep learning practitioners typically use variants of the gradient method sketched above. One of these variants, Adam
Kingma and Ba (2015), is extensively used to train neural networks.Adam works as follows. Starting with the null vectors , vector sequences are generated according to the following iteration rule:
(15) |
where denotes the vector of the element-wise product; similarly, the fraction and square root in the definition of are defined element-wise. Recommended values for the free parameters are , , and Kingma and Ba (2015).
Like the projected gradient method, Adam is not guaranteed to converge to the optimal solution of the problem. However, provided that the initial conditions and the learning rate are chosen with care, Adam has been observed to typically output a local minimum that is “good enough”.
Note that, by taking , it is not clear how to enforce that the solution satisfies constraints of the form . The answer is to include those constraints as penalties in the objective function. That is, rather than minimizing , we apply Adam to minimize the function
(16) |
where and denotes the positive part of , i.e., for ; otherwise, . For high enough values of , the solution of the problem will just violate the constraints slightly, i.e., , for . If no violation whatsoever is desired then one can instead optimize over a function of the form
(17) |
with .
In some situations, the objective function will be complicated to the point that computing its exact gradient is an intractable problem. It might be possible, though, to generate a random vector with the property
(18) |
In such a predicament, we can solve the original optimization problem (13) through stochastic gradient descent methods Duchi (2018). Stochastic gradient descent consists in applying the considered gradient method, with the difference that, every time that the method requires the gradient of
, we input the random variable
instead. Namely, it suffices to replace by in the iterative equations (14), (15). As before, if both and are convex, stochastic gradient descent methods are guaranteed to converge to the optimal solution of problem (13) Duchi (2018).Appendix C Optimization over continuous policies for disease control
Our starting point is an ordinary differential equation of the form
(19) |
The entries of vector represent the occupations of the different compartments of a disease model^{3}^{3}3In the case of adaptive policies, some of such entries might also represent the components of the cell state Hochreiter and Schmidhuber (1997), i.e., the internal variables used by the government to keep track of the evolution of the disease and guide future government interventions.. represents a parametrization of the effects of a given policy. In the examples of the main text, or . Call the solution of Eq. (19) with initial conditions and .
Given the set , we consider the problem of finding the parameters such that minimizes a given functional . This functional defines how we wish to control the disease and what for: it might represent the number and duration of lockdown, etc. For the time being, let us assume this functional to be of the form
(20) |
Note that we are assuming to know the initial conditions with precision. We will relax this requirement by the end of the section.
From the discussion in section B, functional (20) might also contain constraints which we wish the solution to satisfy. For instance, if we want Eq. (2) to hold, then one of the terms in could be
(21) |
with . In the optimizations presented in the paper, we did just this, with (figures 12, 11), (figure 10) or (rest of the figures).
As we saw in section B, optimizations over a large number of parameters are usually conducted via gradient descent methods, such as Adam Kingma and Ba (2015). Such methods require us to compute . For functionals of the form (20), we have that
(22) |
The next question is thus how to compute the derivatives . To this aim, define the variables . Differentiating equation (19) by , we have that
(23) |
In order to obtain for each time , it hence suffices to solve the system of coupled differential equations given by (19), (23) with initial conditions , . This can be achieved numerically through several different methods, depending on the desired accuracy. In all our numerical simulations, we used the Euler explicit method num (2016). Namely, for , we regarded time as a discrete variable of the form , for . We obtain the quantities by recursively applying the relations
(24) |
We chose to generate all our plots, with the exception of Figure 5, where .
In some situations in the main text, our functional is more complicated than (20). Some parameters (not policy parameters) regulating the evolution (19), such as the disease’s basic reproduction number, might be unknown, or perhaps the initial conditions are just known within some bounds. In such cases, the problem’s objective function might adopt the form
(25) |
Again, we wish to minimize over . As explained in section B, this can be achieved via stochastic gradient descent methods Boyd et al. (2004)
: all we need is an unbiased estimator
of . We obtain this estimator by taking independent samples from the measure and using them to compute the quantity(26) |
The optimal value of
depends on the variance of the specific estimator, and we used different values for different levels of noise, defining the set
in the main text.Appendix D Optimization over discrete policies of disease control
The above section explains how to conduct optimizations over disease control policies, as long as the parameters defining the policy are allowed to vary all over . Some policies, though, are by their very nature, discrete. For instance, on day we can either declare a lockdown () or not declare a lockdown (). The effect of the policy measure over the disease’s transmission rate will be to multiply the latter by the amounts and , respectively.
In principle, we could model this situation via a continuous variable and write the effect on the transmitivity by means of a piece-wise continuous function of , e.g.: , where denotes the Heaviside function ^{4}^{4}4 equals for , or , otherwise..
In that case, however, the gradient method would not work. Note that the Heaviside function has zero derivative everywhere except at . In every iteration of Adam, would be null, and so for all .
Optimizing over discrete variables is, in general, a very difficult endeavor: apparently simple problems can be argued not to have an efficient solution Cook (1971). In this paper, we propose a simple heuristic to attack this problem in this particular situation.
Suppose, for the time being, that our lockdown policy were probabilistic, i.e., at each week , we declare a lockdown with probability ; otherwise, with probability , we let the population roam freely. We wish to minimize our average objective function, that is, the expression
(27) |
where is the whole vector of weekly lockdowns, and corresponds to the continuous parameters of the policy, e.g.: vaccination rates.
In principle, we could apply gradient descent to minimize (27). Estimating the exact gradient of the above expression is, however, unrealistic, as it involves summing a number of terms exponential in the number of weeks . Instead, we will produce a random unbiased estimate of the gradient and invoke stochastic gradient descent methods, see Appendix B.
Let us first differentiate Eq. (27) with respect to the continuous variables . The result is
(28) |
where
(29) |
and the components of the random variable are generated by sequentially sampling from the Bernouilli distributions . Note that the expression in the integrand of (29) can be computed using the techniques discussed in Appendix C.
Differentiating Eq. (27) with respect to we find that
(30) |
with
(31) |
and the average
is obtained via sampling over the product of Bernoulli distributions for
and fixing .Putting this all together, we have that the random vectors satisfy
(32) |
Since both vectors can be sampled efficiently, we can use them (and their averages) to optimize over via stochastic gradient descent.
At this point, the reader might object that our original goal was to minimize (20) over policies with deterministic lockdown. Very conveniently, independently of the initial values of the stochastic gradient method will converge to a policy such that the deterministic policy with the same continuous parameters and lockdown given by
(33) |
has the same objective value.
Indeed, for , fix . Then,
(34) |
with
(35) |
Since is a local minimum of , it follows that, either or
(36) |
In either case, fixing through the procedure (33) cannot increase the average value of the objective function. Iterating over , we prove the claim.
Note that this method easily extends to optimizations over discrete adaptive policies. In that case, one can model the probability of lockdown on week as a function of both the current input and the cell state , i.e., . Let us assume that the family of functions available is rich enough to regard as independent. Then one can argue as before and conclude that a deterministic policy can be inferred from the limiting probabilistic policy generated by stochastic gradient descent.
d.1 Optimization over discrete policies with continuous lockdown times
In this section, we explain how to optimize over lockdown policies of the following form:
Comments
There are no comments yet.