Disease control as an optimization problem

by   Miguel Navascues, et al.

Traditionally, expert epidemiologists devise policies for disease control through a mixture of intuition and brute force. Namely, they use their know-how to narrow down the set of logically conceivable policies to a small family described by a few parameters, following which they conduct a grid search to identify the optimal policy within the set. This scheme is not scalable, in the sense that, when used to optimize over policies which depend on many parameters, it would likely fail to output an optimal disease policy in time for its implementation. In this article, we use techniques from convex optimization theory and machine learning to conduct optimizations over disease policies described by hundreds of parameters. We illustrate our approach by minimizing the total time required to eradicate COVID-19 within the Susceptible-Exposed-Infected-Recovered (SEIR) model proposed by Kissler et al. (March, 2020). We observe that optimal policies are typically very complex, and thus unlikely to be discovered by a human agent.



There are no comments yet.


page 4

page 6

page 9

page 10

page 11

page 12


Learning Convex Optimization Control Policies

Many control policies used in various applications determine the input o...

Optimal multiple testing and design in clinical trials

A central goal in designing clinical trials is to find the test that max...

Optimal Policies for a Pandemic: A Stochastic Game Approach and a Deep Learning Algorithm

Game theory has been an effective tool in the control of disease spread ...

When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes

The coronavirus disease 2019 (COVID-19) global pandemic has led many cou...

A spatiotemporal recommendation engine for malaria control

Malaria is an infectious disease affecting a large population across the...

Policy Space Identification in Configurable Environments

We study the problem of identifying the policy space of a learning agent...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

At the time of writing, the COVID-19 pandemic had already caused more than half a million deaths worldwide; at the time of re-writing, some weeks later, this number was approaching one million. The effects of the virus have been widespread and substantial, from the collapse of healthcare systems Nations (2020); yem (2020); sal (2020) to the enforcement of isolation and quarantine. In the case of Nepal, the national lockdown lasted for days uninterrupted Pradhan (2020).

In these circumstances, identifying reliable and effective disease control policies is of utmost importance. Here by “policy” we mean a deliberate intervention intended to mitigate the effects of a disease as it runs its course. In much of the mathematical literature on epidemiology, the process of generating a policy is as follows Keeling and Rohani (2008): (1) based on their intuition, expert epidemiologists propose a number of suitable policies to control the disease; (2) the impact on the population of each of the considered policies is assessed through dynamical models of disease spread; (3) the outcomes of all policies are compared and a decision is taken as to which one is deemed to be the best.

This three-step process has two disadvantages. First of all, the class of policies devised by an expert could well be suboptimal, since the optimal policy (under some figure of merit) could be extremely complicated and counter-intuitive. Second, the method requires one to numerically simulate each policy. Given the exponential growth of the number of policies in the number of control parameters, the number of policies considered may be on the order of billions; hence, this strategy is not guaranteed to identify the optimal disease control policy in time to enforce it.

In this paper we formulate the problem of identifying an optimal policy for disease control as an optimization problem. Using tools from optimization theory and machine learning Goodfellow et al. (2016)

, we propose efficient heuristics to find optimal disease policies for a given model of a disease.

To illustrate the power of our approach, we use these optimization techniques to generate long-term plans to fight COVID-19, under the assumption that the disease’s dynamics are accurately captured by a variant of the Susceptible-Exposed-Infected-Recovered (SEIR) compartmental model Keeling and Rohani (2008) proposed in Kissler et al. (2020a). Our results confirm that optimal policies tend to be too complicated to be devised by a human.

In reality, most epidemiological models only provide short-term approximations to the spread of the disease, with long term projections becoming less and less reliable Reich et al. (2019). In addition, notwithstanding the enormous knowledge gathered since the initial COVID-19 outbreak, many questions remain to be answered regarding the correctness and accuracy of compartmental models such as SEIR: their basic assumptions (e.g., are recovered patients temporarily or permanently immune to the disease?); the actual value of the model’s parameters (e.g., the basic reproduction number ); and the role of variables not modeled (e.g., age, geographic distribution, contact tracing policies, role of superspreaders).

In this regard, the goal of this work is not to propose a concrete government policy, but rather to present an efficient method to obtain an optimal one, given all the available information. To estimate the effect of our methods in a realistic scenario, we conduct a numerical simulation where we re-calculate the optimal policy plans every month, based on new, incoming data.

Ii The framework

Our starting point is an epidemic that affects a closed population. This assumption is not limiting, since a large ensemble of population centres where individuals are free to commute can also be modeled as a closed system Keeling and Rohani (2002). To gain an understanding of how the disease spreads, it is standard to divide the population into different sectors or compartments Keeling and Rohani (2008) (see Fig. 1

). One can define, e.g., the compartment of all those individuals who are currently infected. This compartment can, in turn, be sub-divided into different compartments, such as symptomatic/asymptomatic. Once the number of relevant compartments is fixed, one can estimate the occupation of each of them and arrange the resulting numbers in a vector

. The disease is subsequently analysed by looking at how changes with time.

In order to control or even extinguish an epidemic, governments can enforce a number of different measures: mass vaccination, physical (or social WHO ) distancing measures, or even a full lockdown, are common examples of interventions aimed at fighting the disease. When and to which degree such measures are applied is determined by the disease control policy. Consider a disease control policy based on random vaccination campaigns, where the intervention consists of vaccinating a number of individuals at random per day. Call the number of individuals vaccinated on day . This function, between the initial and final times and (i.e. on the interval ) determines the government’s vaccination policy. Similarly, let take the value if the country is in lockdown on day and if it is not. Then the government’s lockdown policy between times and corresponds to the function .


If the government is intervening both through vaccination and lockdown, then its disease control policy will be identified by both the functions and .

The above are instances of non-adaptive policies for disease control, because the functions just depend on the time , and not, e.g., on the current value of the death toll. A general (adaptive) policy for disease control would take into account the whole past history of data gathered by the government before deciding what to do at each step. Although in the following all our proposed policies are non-adaptive, the formalism we introduce allows one to optimize over adaptive policies as well.

In conclusion, a disease control policy can always be identified with a vector function of the time and perhaps some other observed variables , where each vector entry represents a type of government intervention at time . In turn, we can use a variable vector to parametrize the class of considered policies, that is, . Since completely determines the policy , we can also regard the parameters as the disease policy. We will do so from now on.

The applied policy is assumed to influence the compartment occupation within the time interval . That is, is both a function of and . In this work we are interested in devising policies for disease control which guarantee that the spread of the disease evolves under certain conditions. For example, any country has a fixed number of critical care capacity beds, which we denote by . At each time , it is desirable that the number of individuals admitted to critical care in hospitals, , does not exceed that capacity. That is, we require that


We will call any such condition on the evolution of the disease a constraint.

Finally, among all policies satisfying the desired constraints, we typically wish to identify the one that minimizes a certain quantity. For example, for many diseases, a simple physical distancing policy that satisfies the constraint (2) consists of declaring a lockdown throughout the whole time interval , i.e., for . This policy is arguably impractical, difficult to enforce and harmful to its citizens’ psychological health as well as to the national economy. More rationally, one is interested in finding alternative disease policies which, while respecting the critical care occupation constraint, minimize the number of days of lockdown. Alternatively, one may wish to minimize the total number of deaths during the interval , or the number of infected people. In general, the figure of merit, or quantity that we wish to minimize will be a complicated functional of the considered policy and . We will call this functional the objective function.

Iii Models and optimization

The optimization problem sketched above is mathematically ill-defined, unless we specify how varies with the parameters determining the policy. In order to predict the natural course of a disease or how a given policy might affect its spread, epidemiologists make use of mathematical models. In this paper, we will mainly be concerned with deterministic compartmental models

. In these models, the whole population is divided into a number of basic compartments and the interactions between those compartments are modeled through a system of ordinary differential equations. Given the occupation

of the compartments at time , these models allow us to compute the value of at any instant as a function of the policy . That is, each model provides an implicit functional relation of the form .

Past literature on disease control has made extensive use of compartmental models to recommend specific strategies for policy-makers in an effort to control the spread of disease. In all cases that we know of, suitable policies are devised through a mixture of intuition and grid search, see McLean and Blower (1993); Agur et al. (1993); Keeling et al. (2001); Kissler et al. (2020a). The starting point is a family of disease control policies with one or two unknowns. For instance, in pulse vaccination Agur et al. (1993), those unknowns are the time intervals between two random mass vaccinations and the vaccination rate. By running many computer simulations for different values of these unknowns or parameters, one eventually finds a policy for disease control satisfying the desired constraints that minimizes the considered objective function.

With this scheme, the number of computer simulations required to identify the best policy within the family scales exponentially with the number of parameters that specify the policy. As soon as exceeds or , it is unrealistic to expect this method to provide an optimal disease policy in any reasonable amount of time, let alone in a time where it can be implemented.

In this paper, we propose a scheme that allows for the optimization over policies specified by thousands of parameters in the space of a few hours. Our scheme, detailed in Appendices B and C, is based on a standard tool in optimization theory and machine learning known as gradient descent Boyd et al. (2004); Duchi (2018). Starting with a rough guess for the optimal policy , gradient descent methods generate a sequence of policies

which typically exhibit increasingly better performance. Although the gradient method is not guaranteed to converge to the optimal policy, after many iterations it generates solutions that are good enough for many practical problems. In fact, gradient descent is the method most commonly used to train deep neural networks

Goodfellow et al. (2016)

and support vector machines

Cristianini and Shawe-Taylor (2000).

Importantly, the computational resources required to carry out the gradient descent method are comparable to the cost of running a full simulation between times and with the considered disease model. Furthermore, the necessary computations can be parallelized for policies depending on too many parameters. Although the focus of this paper is on compartmental disease models, our main ideas can also be used to understand ecological systems undergoing a more complex dynamics Keller et al. (2013), see Appendix E. Even in such complicated scenarios, the former scaling laws hold: provided that we can run the considered disease model, we can apply the gradient method to optimize over policies of disease control.

To illustrate our approach for policy design, we will use a variant of the extended “susceptible-exposed-infected-recovered” (SEIR) disease model proposed in Kissler et al. (2020a) to predict the impact of COVID-19 in the USA, a schematic of which is shown in Fig. (1) and the details of which appear in Appendix A. The clinical parameters of the model, such as recovery and hospitalization rates, were estimated in et al. (2020); Kissler et al. (2020b), based on early reports from COVID-19 cases in the UK, China and Italy. Following the model in Kissler et al. (2020a), the disease transmission is assumed to be seasonal by analogy with the known behavior of betacoronaviruses such as HCoV-OC43 Kissler et al. (2020b), with a baseline reproduction number between and , following fits of the early growth-rate of the epidemic in Wuhan Li et al. (2020); Riou and Althaus (2020).

Figure 1: A possible compartment model for COVID-19 (adapted from Kissler et al. (2020a)). The main compartments are: susceptible; exposed; infected; hospitalized; critical and recovered. This compartmental splitting captures different possible evolutions as well as time delays between transitions. The “infected” compartment, for instance, contains those who will recover without hospitalization (); those who will be hospitalized but won’t need critical care (); and those who will end up receiving critical care (). The “exposed” compartment is introduced here to model the time delay between the exposure to the disease and the development of symptoms (incubation period), in particular, the possibility of infecting others, which is what is relevant for the model. In this model, the compartment “recovered” includes both dead and alive individuals; in principle, it could be sub-divided further.


A relevant compartment in this model is , the population occupying a critical care bed111More precisely, in our model, the relevant compartment is , or the fraction of the population in critical care at instant . This has to be compared with , the number of critical care beds per inhabitant. relate to through the expressions , where is the population size. at time . The patients sent to critical care cannot breathe unassisted, and thus it is fundamental to ensure that such capacity is not surpassed, namely, that the constraint (2) holds. For our simulations, we chose a population size of million and . That is, we assumed that the healthcare system provides critical care beds per million inhabitants. This is a good approximation to the healthcare capacity of many European countries, as well as the USA.

According to the chosen model, without intervention the number of citizens requiring a bed in a critical care unit evolves according to Figure 2 (see Appendix A for the exact initial conditions of our numerical simulations). As the reader can appreciate, between the third and seventh month, the number of people in need of critical care exceeds the capacity of the considered healthcare system by times.

Figure 2: Occupation of critical care beds over two years with no policy intervention. The red dashed line indicates the critical care capacity of the healthcare system.

At the time of this writing, none of the potential vaccines for COVID-19 have passed the necessary clinical trials to be considered safe to administer vac . In lieu of this, most governments have opted to control the disease via distancing measures and/or lockdowns. The effect of implementing a policy is to multiply the disease’s basic reproduction number by a factor of Kissler et al. (2020a, b), i.e. , where


At this point we shall make a distinction: a policy which outputs a binary value shall be known as discrete and correspond to situations in which a lockdown is either on or off. For discrete as in Eq. (1), the effect of a lockdown () results in the reduction of the transmission rate in Eq. (3). On the other hand, when the population is free to interact () then and there is no change in the disease’s basic reproduction number.

On the other hand, a policy , which outputs a value on the interval will be known as continuous and correspond to physical-distancing measures. Continuous values of correspond to intermediate policies (for example mandatory face masks, suspension of sport events, remote working, school closures), the effect of which can be tentatively estimated from available data et al. (2020).

If distancing measures are the only type of intervention that a government uses, then a non-adaptive continuous policy is fully determined by the function . To begin with, we will assume that the government can only declare new measures at the beginning of each week, i.e., that does not vary within the weekly intervals , for . With these conditions, can be expressed as



is the characteristic function of week number

, i.e., equals if is in the -th week; and , otherwise. The characteristic function, thus, ensures that there is only one policy per week.

denotes the sigmoid function, which guarantees that

by continuously mapping the variables , such that it is everywhere differentiable, making it amenable to the gradient method. The parameters to optimize over are , since they fully define the government’s disease policy.

Having chosen either discrete or continuous measures, one must choose a figure of merit or objective function to optimize over. Our formalism, explained in detail in Appendix C, allows us to minimize expressions of the form


In this paper, we consider optimizations that never allow the number of people in the critical care compartment to exceed the maximal occupancy, i.e., we impose the constraint in Eq. (2). Thus we consider optimizations of the form


In what follows, we split our analysis by choosing different objective functions to minimize over, considering policies which are either continuous, as in Eq. (4), or discrete as in Eq. (1).

iii.1 Objective function 1: minimizing physical distancing and lockdown measures

Suppose, for instance, that one is confident that a vaccine for COVID-19 will be developed within the next two years. In this case, one is interested in minimizing the aggregate economic cost associated with the physical-distancing measures implemented by the government over these two years. Thus for the optimization in Eq. (6), we consider Lagrangians (objective functions) of the form


Figure 3 shows the result of applying the gradient method to minimize this functional for the cost function , under continuous policies of the form (4) satisfying the critical care capacity constraint (2). The plot shows both the critical care occupancy and the physical-distancing measure between times and . The aggregate cost of the optimal policy is equal to the economic cost of sustaining a full lockdown for days. As anticipated, the optimal policy found by the computer is very complicated. Notice that the critical care occupancy grows quickly towards the end of the plot. The reason for this is that we asked the computer to minimize the total time in lockdown over a fixed period of two years, which is exactly what it did: it minimized the physical-distancing measures in the first two years with complete disregard for what could happen next. To overcome such a pathological case, one can introduce additional terms into Eq. (7) that would constrain the final slope of the curve , to make it less steep, or even decreasing.

Figure 3: Occupation of critical care beds (red) and physical-distancing measures (blue) for a period of two years. The optimization has been performed over continuous weekly policies, continuous parameters, i.e., any value of physical-distancing measure between and is accepted. The algorithm, however, tends to prefer configurations, i.e., full lockdown or no lockdown in most instances.

In some circumstances, the only distancing measures considered by governments are discrete: lockdown on or off as in Eq. (1). These kinds of optimizations over discrete variables cannot be carried out directly with the gradient method. In Appendix D, we propose two different ways to tackle this problem.

One way consists of using stochastic gradient descent to obtain an optimal non-deterministic weekly policy that is later “binned” to arrive at a deterministic policy for disease control, see Appendix

D. The results of this first method are shown in Figure 4. This time the critical care occupancy curve touches the critical care capacity just after the end of year . The reason for this is that we demanded lockdowns to last exactly one week: had we allowed the government to declare a lockdown on any day of the week, the computer would have found a tighter solution, with every peak of the red curve touching the dashed line. Even under this discrete weekly simplification, the solution found by the computer is non-trivial: it requires the government to declare a lockdown times. The total length of the lockdown in the course of two years is days.

Figure 4: Occupation of critical care beds (red) and lockdown (blue) for a period of two years. The plot shows the result of the optimization over probabilistic policies via gradient descent over a period of two years.

Our second discretization method consists in parametrizing the disease control policy through the specific times where lockdown is declared. In other words, , and lockdown is assumed to take place within the time intervals . In this parametrization, lockdowns can be declared or lifted at arbitrary times within , and not only on Mondays, like in the first method. This second discretization method has the advantage of allowing one to set the maximum number of lockdowns throughout the period . For , the corresponding critical care occupation and lockdown graphs are shown in Figure 5. The total length of the lockdown is days.

Figure 5: Occupation of critical care beds (red) and lockdown measures (blue) for a period of two years. Optimization over deterministic policies with arbitrary initial and final times for each lockdown period. A total of 9 lockdown periods has been fixed prior to the optimization. Notice that the first lockdown period, around day , has been basically removed by the optimization procedure.

iii.2 Objective function 2: achieving herd immunity

One problem with the policies above is that they require one to implement lockdown measures over long periods of time. Another strategy to fight COVID-19 consists of steering the population towards herd immunity, without violating the constraint on the critical care capacity. This is captured by using a functional of the form


in Eq. (6), where (as depicted in Fig. 1) is the component of that denotes the proportion of individuals in the population who are susceptible to the disease. is the proportion of susceptible individuals required for herd immunity to be guaranteed; thus, once although people will continue to become infected, the natural evolution of the disease will be such that the rate and number of infected quickly dies out.

Fig. 6 illustrates the results of optimizing the objective function in Eq. (8) for continuous physical-distancing measures. The number of susceptible individuals reaches on day , after which the disease can be considered extinct.

Figure 6: Occupation of critical care beds (red) and population of susceptible individuals (blue) for a period of three years. The blue line corresponds to the level of susceptibles guaranteeing herd immunity over the whole year.

Notwithstanding the unprecedented velocity in the development of a COVID-19 vaccine, with many pharmaceutical companies and research institutions that already reached advanced stages in clinical trials on humans, as of now, no vaccine has completed all testing phases before being approved vac . One may wonder what will happen if the situation were different with a possible vaccine not available in a time span of months.

A possible policy to reach faster herd immunity, while not saturating the critical care capacity, could be to deliberately infect a fraction of the population at the same time as imposing physical distance measures. It is important to remark that, even allowing this procedure on a voluntary basis among the population which are at the lowest health risk (e.g., young adults in good health), this practice poses high ethical concerns; as such, we remark that we do not endorse applying such “infection policies” on a human population. We just consider them in this paper for the academic purpose of exploring the power of the gradient method to devise complex policies for disease control. In fact, the results of our simulations indicate that the advantage with respect to simple physical distancing policies are minimal (cf. Fig. 7).

That being said, consider the following model of such an unethical policy. Let be the rate at which the government intentionally infects the general population on day . Then, a lockdown-infection policy is identified with the functions and . As with , one can parametrize the function to change once per week. Denoting by the maximum rate of this deliberate infection, a suitable parametrization for is


The disease model used to predict the impact of this policy will depend on how the government decides to deal with those infected: (a) the infected are quarantined until they overcome the disease; (b) the infected are allowed to mingle with the general population. In either compartmental model, one can compute the functions , with . We are interested in minimizing the functional given by (8) for the problem in Eq. (6).

The results for both quarantined/non-quarantined policies for are shown in Figure 7, where we included the curves of the no-infection policy for comparison. As can be seen, the unethical policy of intentional infection brings the population to herd immunity, and thus the disease eventually to extinction, just slightly earlier than the optimal policy based on physical-distancing measures only.

Figure 7: Occupation of critical care beds (red) and population of susceptible individuals (blue) for a period of three years under different policies for disease control. The dotted lines represent the effects a physical distancing measures-only policy (dotted lines), infection policy (a) (dashed-dotted lines) and infection policy (b) (solid line).

Note how tight the constraint on critical care bed occupation becomes for policy (b). Figure 8 shows the strategy achieving this result. As in the previous cases, the optimal policy is extremely complicated in terms of physical distancing measures. The infection policy is, however, very simple: except at the beginning of the outbreak and after the disease’s extinction, infection occurs at the maximum rate. Notice also that, once the disease is extinct, the computer recommends a permanent lockdown, the reason being that the objective function in Eq. (8), contrary to Eq. (7), does not penalize the overzealous use of physical distancing measures.

Figure 8: Government infection (green) and physical distancing measures (blue) under infection policy (b). The algorithm recommend an almost constant infection rate, except for a brief initial period, which goes to zero after reaching herd immunity. Notice that, since we are minimizing the distance from the herd immunity threshold (cf. Eq (8)), the algorithm does not care about minimizing physical distancing measures. This is the reason why after reaching herd immunity, a full lockdown is suggested. This type of behavior of the algorithm can be easily corrected by hand after the solution is obtained.

Iv Models vs. reality

In practice, the predictions of any mathematical model for a physical system will not be perfect for a number of reasons. First, basic parameters of the model, such as the transmission rate or the initial occupation , are only known up to approximations. Even if reality were exactly described by a particular mathematical model, small errors in such parameters would accumulate in the long run, making long-term predictions unreliable. Second, reality is never exactly described by mathematical models: on the contrary, any tractable disease model is, at best, a rough approximation to reality. Consequently, even the most successful disease models in the market cease to deliver solid predictions beyond weeks Reich et al. (2019).

These considerations make us question how practical a two-year disease control policy really is. Consider the policy depicted in Figure 3, which was obtained by applying the gradient method to the SEIR model in Kissler et al. (2020a). Here, the model parameters correspond to the average values of the parameter ranges in Table 1 of Appendix A. In reality of course, the values of the parameters are never all equal to their averages, so we proceed to generate sets of parameters with some fluctuations. Let

be the vector with entries given by the difference between the upper and lower bounds of all the entries of the table, and suppose that the actual parameters of “reality” are unknown and uniformly distributed in the region of values

, where can be interpreted as the amount of noise or uncertainty. How robust is a policy to uncertainty in the initial parameters?

Figure 9: Occupation of critical care beds (red) and (unoptimized) physical distancing measures (blue) for a period of two years under random parameters in . The region in red is obtained by sampling times from the region of model parameters and evolving the corresponding models with the physical distancing policy optimized over the model with average-value parameters (as in Fig. 3). More precisely, the red region is the one delimited by the minimum and the maximum critical care occupation for all the models, at each time. The red line represents the average critical care occupation in all those simulations.

Fig. 9 shows the result of generating independent parameter samples from the region , corresponding to a uncertainty, with respect to the given interval of values, and running the corresponding models for the optimal physical distancing policy in Fig. 3. As one can see, for some sampled values of parameters, the critical care capacity of the healthcare system is exceeded. This is not surprising, since the policy depicted in Fig. 3 was devised to perform well under the assumption that and not , i.e., for the model parameters corresponding to their average value.

In order to tame the behavior in the plot in Fig. 9, one can employ a variant of the gradient method called stochastic gradient descent (see Appendix B). This allows one to optimize over long-term policies without violating the constraints of the problem for a range of values for the parameters of the model as well as the initial conditions 222Rigorously speaking, stochastic gradient descent just minimizes the average value of the objective function . By adding to

sufficiently strong penalties to the violation of each optimization constraint, we make sure that such constraints will hold with high probability when

and are sampled from the training distribution.. Fig. 10 shows a lockdown policy minimizing the physical distancing measures under the condition that constraint (2) holds for different values of , i.e., the critical care capacity is not exceeded. This time, the violation of condition (2) is neither so extreme nor so frequent. This comes, however, at the cost of enforcing physical distancing measures with a cost equivalent to days of lockdown. Repeating the optimization for , Fig. 11, we see that the critical care capacity is rarely surpassed. However, this time the total cost is equivalent to days of lockdown.

Figure 10: Occupation of critical care beds (red) and (optimized) physical distancing measures (blue) for a period of two years under random parameters in . The disease control policy was optimized to respect condition (2) over the whole range of parameters . As in Fig. 9, the region in red depicts again the range of critical care occupations observed in a sample of model parameters in , and the red line the average critical care occupation.
Figure 11: Occupation of critical care beds (red) and (optimized) physical distancing measures (blue) for a period of two years under random parameters in . The disease control policy was optimized to respect condition (2) over the whole range of parameters . As in Fig. 9, the region in red depicts again the range of critical care occupations observed in a sample of model parameters in , and the red line the average critical care occupation.

This result is what one would have expected. As time goes by, the predictions of the disease model for different values of diverge: any policy which aims to satisfy constraint (2) for large ranges of these parameters will necessarily require extensive physical distancing measures.

In practical policy-making, graphs such as Figs. 10 and 11 should not be understood to represent the actual physical distancing policy, but rather to provide a provisional policy plan. A policy plan gives a recommendation for action for the immediate future, given the current knowledge of the disease. In Fig. 10, the policy plan is advising not to declare physical distancing measures in the first weeks. That is the measure that the government should adopt then. After a first time period, say four weeks, more data will have been gathered: this will allow us to obtain a better estimate of the parameters , and then re-run the models for another two years ahead. The measure to enforce should then be whatever the new policy plan recommends for the following four-week time period. The process is then repeated.

To test how this idea would perform in practice, we consider a scenario where the parameters defining the disease model are unknown, but the region in parameter space in which they live shrinks every month (to be precise, we used a -day period, corresponding to four weeks). That is, at month , the government is informed that the parameters satisfy . Every four weeks, the policy is recalculated to minimize the physical distancing measures for the rest of the two years ahead, using the range . The final curves for the critical care occupation and the physical distancing measures are shown in Fig. 12 for , in a sequence of shrinking regions , for each month (red region) and in the case of a fixed uncertainty region corresponding to the last month, i.e., (inner dark blue region). The total cost of the physical distancing measures is equivalent to lockdown days. This has to be compared with the cost of days predicted by the initial policy plan under the assumption .

Figure 12: Occupation of critical care beds (red) and physical distancing measures (blue) for a period of two years under random parameters and monthly noise decrease. The disease control policy was optimized to respect the condition in Eq. (2), i.e., critical care capacity not exceeded, starting with the parameter region and with a monthly noise decrease of , i.e., in the -th month the noise is equal to . The region in red depicts the range of critical care occupations observed in a sample of model parameters in a sequence of shrinking regions ; more precisely, for each month the red region is obtained by evolving, from the initial time to month (included), different models with parameters sampled from the region . The final plot is obtained by joining the plots for each month . The inner region in dark blue depicts the range of critical care occupations corresponding to the uncertainty in the final month, i.e., obtained with models with parameters sampled in . The red line represents the average critical care occupation, obtained by joining the average of the simulations with decreased uncertainty for each month . Despite the initial uncertainty on the parameters of the disease model, the final lockdown time is much lower, due to monthly revisions of the original policy plan.

In principle, one could further decrease the total planned cost by devising adaptive policy plans

, where the measure to be taken at each moment depends not only on the current time

, but also on the past history of physical distancing measures and their observed effects. In fact, we tried optimizing over adaptive policies described by a continuous version of a neural network architecture known as Long Short-Term Memory (LSTM)

Hochreiter and Schmidhuber (1997). In all our numerical experiments, such simple LSTM architectures could not improve the performance of non-adaptive strategies, but this could be due to ineffective training on our side.

V Conclusion

In this paper, we have applied standard tools from optimization theory and machine learning to identify optimal disease control policies, given an epidemiological model. This is in stark contrast to standard practice in mathematical epidemiology, where human intuition is used to narrow down the considered set of policies to a uni-parametric family. We saw that the optimal solutions found by our algorithms are highly counter-intuitive, and thus unlikely to be identified by a human. This supports the idea that policies for disease control should be based on a combination of both human expertise and machine learning.

To illustrate our ideas, we studied a scenario in which a computer is tasked with outputting the minimal amount of physical distancing measures for an epidemic in a hypothetical country, in such a way as to never exceed the critical care bed capacity. We looked at situations in which these measures were continuous (recommendations on the interval ) as well as discrete (either or ) – a lockdown that is off or on, respectively for periods of 2 years. We experimented with measures which are just allowed to change weekly, as well as those in which there is a maximum number of lockdowns that is allowed to be declared. When considering unethical interventions, in which parts of the population are deliberately infected, we found that there is little difference in the time for the disease to become extinct, compared with the natural evolution of the disease.

We examined the problems that one may encounter when applying these techniques to scenarios where the model parameters are not known with high accuracy, which led us to propose practical policy plans which must be continually revisited, to account for our ever-changing and ever-growing knowledge in an epidemic. We tested the viability of this approach by simulating a scenario where the uncertainty on the disease model parameters decreases with time. As expected, the final policy implemented was safe for the final range of parameters and required considerably less physical distancing measures than the initial policy plan hinted.

In this last regard, an interesting problem for future research is how to devise adaptive policy plans for disease control, where the actual measure at each time depends on the whole history of disease indicators accessible to the government. In theory, such plans should predict lower values of the average objective function in scenarios where the model parameters are unknown. In our experience, though, gradient descent alone seems to be unable to beat the non-adaptive score.

Finally, we would like to remark once more that, since the optimization problems we dealt with in this paper are non-convex, the gradient method is not guaranteed to converge to the minimum of the (average) objective function. While conducting this research, in order to convince ourselves that the solutions found by our numerical methods were close to optimal, we had to repeat our optimizations several times, with different initial policies and learning rates . Such a redundant use of computational resources would have been entirely avoidable if we had had some rough approximation to the exact solution of the problem. Hence we conclude this paper with a challenge for the operations research community: develop mathematical tools which allow one to lower bound the solution of minimization problems involving ordinary differential equations.

We thank Luca Gerardo-Giorda and Mario Budroni for useful discussions. C.B. and Y.G. acknowledge funding from the Austrian Science Fund (FWF) through the Zukunftskolleg ZK03.


Appendix A Models for the spread of COVID-19

In all our numerical simulations, we will assume that the dynamics of the COVID-19 are well approximated by a compartmental model of the SEIR type. When the government policy reduces to enforcing physical distance measures, we will adopt a simplified version of the model used in Kissler et al. (2020a). This model divides those infected with COVID-19 into three different compartments: or those who recover by themselves from the disease; , those who require hospitalization but do not enter a critical care unit; and , those who are both hospitalized and visit a critical care unit before recovery. The dynamics of the model are governed by the system of ordinary differential equations below, see the diagram in Figure 13:

Figure 13: COVID-19 model for disease policies based on physical distance measures.



denotes the virus’ transmitivity, that is subject to seasonal variability. models the effect of a government-mandated lockdown on the virus’ transmission rate. The values of the remaining parameters are taken from Kissler et al. (2020a), and appear in Table 1.

parameter value units
[2, 2.5] None
Table 1: Parameter ranges for the compartmental model for COVID-19 proposed in Kissler et al. (2020a).

If the government has the option of deliberately infecting random samples of the population with COVID-19, then the model gets slightly more complicated. If those infected are quarantined until they overcome the disease, the new model, depicted in Figure 14, is described by the system of equations:

Figure 14: COVID-19 model for disease policies based on physical distance measures and deliberate inoculation.

If, on the contrary, the government allows those deliberately infected to mingle with the general population, the equations above simplify considerably: it suffices to replace all instances of the term in eqs. (10) by .

In all our numerical simulations, we take the total population to be million; the critical care bed capacity per inhabitant is also taken to be . We assume that the government starts its intervention on day , days after the outbreak of the disease. We model the disease outbreak by assuming that, at time , there are individuals in compartment . By default, the values of the disease parameters are taken to be the arithmetic means of the intervals shown in Table 1. We assume that the government can deliberately infect individuals per day. This sets a value for of .

Appendix B The gradient method

Given functions , for , consider the following optimization problem:

such that (13)

Each of the conditions is called a constraint.

Define . If is sub-differentiable, a simple heuristic to solve this problem is the projected gradient method Boyd et al. (2004). Call the solution of the problem. Starting from an initial guess , the gradient method generates a sequence of values with the property that , provided that are, respectively, a convex set and a convex function Boyd et al. (2004). The sequence is generated recursively via the relation


where, for any set , denotes the point that minimizes the Euclidean distance, i.e., .

Unfortunately, in the problems we encounter in the main text, is not convex and, sometimes, neither is . This implies that the sequence output by the projected gradient method is not guaranteed to converge to the solution of the problem, but to a local minimum thereof.

In machine learning, optimization problems with non-convex objective function and

are legion. To solve them, deep learning practitioners typically use variants of the gradient method sketched above. One of these variants, Adam

Kingma and Ba (2015), is extensively used to train neural networks.

Adam works as follows. Starting with the null vectors , vector sequences are generated according to the following iteration rule:


where denotes the vector of the element-wise product; similarly, the fraction and square root in the definition of are defined element-wise. Recommended values for the free parameters are , , and Kingma and Ba (2015).

Like the projected gradient method, Adam is not guaranteed to converge to the optimal solution of the problem. However, provided that the initial conditions and the learning rate are chosen with care, Adam has been observed to typically output a local minimum that is “good enough”.

Note that, by taking , it is not clear how to enforce that the solution satisfies constraints of the form . The answer is to include those constraints as penalties in the objective function. That is, rather than minimizing , we apply Adam to minimize the function


where and denotes the positive part of , i.e., for ; otherwise, . For high enough values of , the solution of the problem will just violate the constraints slightly, i.e., , for . If no violation whatsoever is desired then one can instead optimize over a function of the form


with .

In some situations, the objective function will be complicated to the point that computing its exact gradient is an intractable problem. It might be possible, though, to generate a random vector with the property


In such a predicament, we can solve the original optimization problem (13) through stochastic gradient descent methods Duchi (2018). Stochastic gradient descent consists in applying the considered gradient method, with the difference that, every time that the method requires the gradient of

, we input the random variable

instead. Namely, it suffices to replace by in the iterative equations (14), (15). As before, if both and are convex, stochastic gradient descent methods are guaranteed to converge to the optimal solution of problem (13) Duchi (2018).

Appendix C Optimization over continuous policies for disease control

Our starting point is an ordinary differential equation of the form


The entries of vector represent the occupations of the different compartments of a disease model333In the case of adaptive policies, some of such entries might also represent the components of the cell state Hochreiter and Schmidhuber (1997), i.e., the internal variables used by the government to keep track of the evolution of the disease and guide future government interventions.. represents a parametrization of the effects of a given policy. In the examples of the main text, or . Call the solution of Eq. (19) with initial conditions and .

Given the set , we consider the problem of finding the parameters such that minimizes a given functional . This functional defines how we wish to control the disease and what for: it might represent the number and duration of lockdown, etc. For the time being, let us assume this functional to be of the form


Note that we are assuming to know the initial conditions with precision. We will relax this requirement by the end of the section.

From the discussion in section B, functional (20) might also contain constraints which we wish the solution to satisfy. For instance, if we want Eq. (2) to hold, then one of the terms in could be


with . In the optimizations presented in the paper, we did just this, with (figures 12, 11), (figure 10) or (rest of the figures).

As we saw in section B, optimizations over a large number of parameters are usually conducted via gradient descent methods, such as Adam Kingma and Ba (2015). Such methods require us to compute . For functionals of the form (20), we have that


The next question is thus how to compute the derivatives . To this aim, define the variables . Differentiating equation (19) by , we have that


In order to obtain for each time , it hence suffices to solve the system of coupled differential equations given by (19), (23) with initial conditions , . This can be achieved numerically through several different methods, depending on the desired accuracy. In all our numerical simulations, we used the Euler explicit method num (2016). Namely, for , we regarded time as a discrete variable of the form , for . We obtain the quantities by recursively applying the relations


We chose to generate all our plots, with the exception of Figure 5, where .

In some situations in the main text, our functional is more complicated than (20). Some parameters (not policy parameters) regulating the evolution (19), such as the disease’s basic reproduction number, might be unknown, or perhaps the initial conditions are just known within some bounds. In such cases, the problem’s objective function might adopt the form


Again, we wish to minimize over . As explained in section B, this can be achieved via stochastic gradient descent methods Boyd et al. (2004)

: all we need is an unbiased estimator

of . We obtain this estimator by taking independent samples from the measure and using them to compute the quantity


The optimal value of

depends on the variance of the specific estimator, and we used different values for different levels of noise, defining the set

in the main text.

Appendix D Optimization over discrete policies of disease control

The above section explains how to conduct optimizations over disease control policies, as long as the parameters defining the policy are allowed to vary all over . Some policies, though, are by their very nature, discrete. For instance, on day we can either declare a lockdown () or not declare a lockdown (). The effect of the policy measure over the disease’s transmission rate will be to multiply the latter by the amounts and , respectively.

In principle, we could model this situation via a continuous variable and write the effect on the transmitivity by means of a piece-wise continuous function of , e.g.: , where denotes the Heaviside function 444 equals for , or , otherwise..

In that case, however, the gradient method would not work. Note that the Heaviside function has zero derivative everywhere except at . In every iteration of Adam, would be null, and so for all .

Optimizing over discrete variables is, in general, a very difficult endeavor: apparently simple problems can be argued not to have an efficient solution Cook (1971). In this paper, we propose a simple heuristic to attack this problem in this particular situation.

Suppose, for the time being, that our lockdown policy were probabilistic, i.e., at each week , we declare a lockdown with probability ; otherwise, with probability , we let the population roam freely. We wish to minimize our average objective function, that is, the expression


where is the whole vector of weekly lockdowns, and corresponds to the continuous parameters of the policy, e.g.: vaccination rates.

In principle, we could apply gradient descent to minimize (27). Estimating the exact gradient of the above expression is, however, unrealistic, as it involves summing a number of terms exponential in the number of weeks . Instead, we will produce a random unbiased estimate of the gradient and invoke stochastic gradient descent methods, see Appendix B.

Let us first differentiate Eq. (27) with respect to the continuous variables . The result is




and the components of the random variable are generated by sequentially sampling from the Bernouilli distributions . Note that the expression in the integrand of (29) can be computed using the techniques discussed in Appendix C.

Differentiating Eq. (27) with respect to we find that




and the average

is obtained via sampling over the product of Bernoulli distributions for

and fixing .

Putting this all together, we have that the random vectors satisfy


Since both vectors can be sampled efficiently, we can use them (and their averages) to optimize over via stochastic gradient descent.

At this point, the reader might object that our original goal was to minimize (20) over policies with deterministic lockdown. Very conveniently, independently of the initial values of the stochastic gradient method will converge to a policy such that the deterministic policy with the same continuous parameters and lockdown given by


has the same objective value.

Indeed, for , fix . Then,




Since is a local minimum of , it follows that, either or


In either case, fixing through the procedure (33) cannot increase the average value of the objective function. Iterating over , we prove the claim.

Note that this method easily extends to optimizations over discrete adaptive policies. In that case, one can model the probability of lockdown on week as a function of both the current input and the cell state , i.e., . Let us assume that the family of functions available is rich enough to regard as independent. Then one can argue as before and conclude that a deterministic policy can be inferred from the limiting probabilistic policy generated by stochastic gradient descent.

d.1 Optimization over discrete policies with continuous lockdown times

In this section, we explain how to optimize over lockdown policies of the following form: