We model the development of the COVID-19 epidemic in the UK under different scenarios of handling the epidemic. There are many standard epidemiological models for modelling epidemics, see e.g. [1, 2]. In this research, we use a more generic simulation model which has the following features:
it can take into account spatial heterogeneity of the population and heterogeneity of development of epidemic in different areas;
it allows the use of time-dependent strategies for analysing the epidemics;
it allows taking into account special characteristics of particular groups of people, especially people with specific medical pre-histories and elderly.
Standard epidemiological models such as SIR and many of its modifications do not possess these properties and hence are not applicable for any study that requires the use of the features above. In particular, In particular, for influenza the mortality changes less significantly with age in comparison to coronavirus; hence common influenza models do not give much insight in modelling the COVID-19 epidemic. The report , which is widely considered as the main document specifying the current epidemic in the UK and in the world, is almost entirely based on the use of standard epidemiological models and hence the conclusions of  seem to be lacking specifics related to important issues of the study such as heterogeneity of development of epidemic at different locations and even flexible use of different death rates across different ages. Moreover, as noted in , the basic assumptions for the model of  were written more than 13 years ago and based on the specific dynamics of a flu pandemic and hence the model was not calibrated for COVID-19. Unreliability of COVID-19 data, including the numbers of COVID cases and COVID deaths, is a serious problem. It is discussed, in particular, by G. Antes, in .
Despite the SIR models including the model of  have been heavily criticised [4, 6], when choosing the main parameters of our model we calibrate it so that its mean field version approximately reproduces the same output as SIR-based models of [3, 7] and we use the parameter values consistent with [3, 7]. Also, the main notation (which sometimes does not look natural from a statistician’s point of view) is taken from .
The primary objective of our work is construction of a reliable, robust and interpretable model describing the epidemic under different control regimes. In this paper, we make the first step towards this objective. Due to lack of time and unreliability of available data on COVID-19, most of our results serve as an illustration of the role of different control options and we hope that even as illustration they are useful. However, we try to stay close to the COVID-19 epidemic scenario and hence we use appropriate recommendations about the choice of parameters and models of the virus behaviour we found from the studies based on the use of standard models.
See conclusions in Sections 3,4,5 for the main conclusions of this work.
Acknowledgement. We are grateful to several colleagues in Univ. Cambridge and Univ. Warwick for useful comments on drafts of this paper and to M. Hairer (ICL) and J. Ball (Univ. Nottingham) for valuable comments and discussions.
2 The model
Variables and parameters
- time (in days)
- intervention time (e.g., the time when self-isolation starts)
- population size
- number of infected at time
- number of immune (recovered or dead) at time (this makes full sense from a modelling perspective, but we appreciate that this may look kind of strange when read by non-scientists)
- number of susceptible at time
- number of groups selected for the study
- size of -th group ()
- number of infected at time in group
- number of immune at time in group
- number of susceptible at time in group
- average number of transmissions of the virus per unit of time with no intervention
- average infectious period
- shape parameter of the Erlang distribution
- reproductive number (average number of people who will capture the disease from one contagious person)
- reproductive number after intervention
- probability to recover for an infected person from the-th group
Values of parameters and generic model
The reproductive number is the main parameter defining the speed of development of an epidemic. There is no true value for as it varies in different parts of the UK (and the world). In particular, in rural areas one would expect a considerably lower value of than in London. Authors of  suggest and as typical; the authors of  use values for in the range [2.25, 2.75]. We shall use the value as typical which may be a slightly pessimistic choice overall but could be an adequate choice for the mega-cities where the epidemics develop faster and may lead to more causalities. In rural areas, in small towns, and everywhere else where social contacts are less intense, the epidemic is milder.
We assume that the person becomes infected days after catching the virus, where
has Poisson distribution with mean of 1 week. To model the time to recover (or die) we useErlang distribution with shape parameter and rate parameter so that the mean of the distribution is (in simulations, we discretise the numbers to their nearest integers). This implies that we assume that the average longevity of the period of time while the infected person is contagious is 21 days, in line with the current knowledge, see e.g. [8, 9, 10]
. Standard deviation of the chosen Erlang distribution is approximately 12, which is rather large and reflects the uncertainty we currently have about the period of time a person needs to recover (or die) from COVID-19. An increase inwould prolong the epidemic and smaller values of would make it to cause people to be contagious for less time. The use of Erlang distribution is standard for modelling similar events in reliability and queuing theories, which have much in common with epidemiology. We have considered the sensitivity of the model in this study with respect to the choice of parameters and but more has to be done in cooperation with epidemiologists. As there are currently many outbreaks epidemics, new knowledge about the distribution of the period of infection by COVID-19 can emerge soon.
The model varies depending on purposes of the study. The main ingredients are: is the birth-and-death processes and is the associated pure birth processes. The process of transmission is the Poisson process with intensity (time to next transmission has the exponential density ). After the intervention (for ), the Poisson process of transmission for -the group has intensity .
We treat this model as purely stochastic despite parts of it can be written it terms of systems of stochastic differential equations. Despite running pure simulation models taking longer than running combined models, they are simpler and less prone to certain errors.
The number of infected at time as the main quantity of interest
To start with, in Figure 1 we consider an uninterrupted run of an epidemic with in a homogeneous (one-group) population. The starting time of an epidemic is unknown and is even hard to define as the first transmissions of the virus take random and perhaps long times. We start plotting the curves after 0.5% of the population are infected.
In red colour, in Figure 1 and all plots below we plot values of , the proportion of people non-infected by time . In blue, we plot , the proportion of people recovered from the disease (or dead) and in green we plot our main quantity of interest which is , the proportion of infected people at time . The values of do not play any part in modelling and are plotted for information only.
The duration of time while a person with a virus is infected is modelled by Erlang distribution with shape parameter and mean 21. The simulations are flexible and we can easily change these values and plot similar curves with updated data. Despite the full recovery taking slightly longer,
days is a good estimation for the period of time when people with severe infection may require intensive care, ventilator etc. Also, we believe that this is a suitable distribution for the period of time when such people may die.
From the value , we can estimate the distribution of the number of deaths, at time , as follows: first, since on average an infected person is considered as infected for 21 days, we compute . At each particular day, any person can die with the probability which is typical for the chosen population or sub-population. If we consider the whole population, then we apply the mortality rate for the population. However, if refers to a particular group only, then we should apply the corresponding coefficient for the group. In any case, the distribution for the number of deaths at time can be roughly considered as binomial with parameters and the probability of ‘success’ , where is the mortality rate. Different authors disagree on the values of mortality for the COVID-19; see, for example, . UK’s experts believe , WHO sets the world-wide mortality rate at 0.034, the authors of  believe is very small and could be close to , an Israeli expert D. Yamin sets , see .
Example 1. Assume that the epidemic in Birmingham (with population size m) was running uninterrupted according to the scenario depicted in Figure 1. The maximum value of is giving the maximum expected daily death toll of assuming we use UK experts value .
Although simple to implement, the estimation in Example 1 is likely to be wrong, as it would be for any heterogeneous city (e.g. London). Critically, as will be discussed in the next section, the maximum death toll is likely to be significantly lower than the above calculations suggest.
Summarizing, the expected daily mortality curve is simply a suitably scaled version of of Figure 1. The same is applied to the curves representing expected number of hospital beds and ventilators. The variability for the number deaths is naturally considerably higher than the variability for the numbers of required hospital beds and ventilators as the latter ones are correlated in view of the fact that each person occupies a bed (requires a ventilator) for a few days in a row but dies only once.
Unlike , the number of deaths at each particular day is (approximately) known and potentially can be used for estimating the first and second derivatives of and hence the stage of the epidemic. It is, however, difficult to do for the following reasons: randomness in the values , see above, and, more importantly, heterogeneity of large sub-populations, see next section.
One of the main targets of decision-makers for dealing with epidemics like COVID-19 can be referred to as ‘flattening the curve’, where ‘the curve’ is or any of its equivalents and ’flattening’ roughly means ‘suppressing the maximum’. We shall consider this in the next sections.
3 Spatial heterogeneity of the population and heterogeneity of epidemic development in different areas
The purpose of this section is to demonstrate that heterogeneity of epidemic developments for different sub-populations has significant effect on ‘flattening the curve’.
Already when this paper was completed, the authors learned about a preprint  which uses SIR modelling to produce somewhat similar conclusions with more specifics for COVID-19 in the UK. However, SIR modelling using numerous age-classes and many equations for each age-class different across different parts of the country resulting in an astronomical number of parameters; this makes any sensitivity analysis to various parameters uncheckable. We believe that a combination of stochastic models like the present one with standard SIR-based models can help in significantly reducing the number of parameters while also adding more flexibility to a hybrid model.
Consider the following situation. Assume that we have a population consisting of sub-populations (groups) with similar demographic and social characteristics and these sub-populations are subject to the same epidemic which has started at slightly different times. Let the sizes of sub-population be with . In all cases, the curves are shifted in time versions of the curve of Figure 1.
In Figures 3 and 3, and the second epidemic started 50 days after the first one (incubation period is set to be 0). In Figures 5 and 5, and each next epidemic cycle has started 7 days after the previous one. In all cases, we evidence the significant ‘flattening the curve’ phenomenon. In Figures 3–5, the green line is used for the resulting curves . The maximal values of in the examples depicted in Figures 3–5 are 0.124, 0.136, 0.134 and 0.132, respectively. This is significantly lower than the value 0.163 for the original curve. In the assumptions of Example 1, these would respectively lead to the maximum expected number of deaths 57.7, 63.3, 62.9 and 62.7.
Extra heterogeneity of different epidemics caused by the social and demographic heterogeneity of the sub-populations would further ‘flatten the curve’. The following conclusions are in line with what the Israeli expert D. Yamin states  and agree with the main conclusions of the paper  which is rather critical towards standard epidemiological models.
Conclusions. (a) Isolation of sub-populations at initial stages of an epidemic is very important for preserving heterogeneity of the epidemic and ‘flattenning the curve’; this flattening can be very significant. (b) Epidemiological models based on the assumption of a homogeneous population but applied for populations consisting of heterogeneous sub-populations may give completely misleading results.
4 An epidemic with intervention
In this section, we assume that we make an intervention to the epidemic by introducing an isolation at a certain stage. Moreover, we assume that isolation may be different for 2 different groups. We define a special group of more vulnerable people consisting, for example, of all people aged . We define , where is the total population size and is the size of this special group .
be the moment of time when the isolation occurs. It is natural to definefrom the condition , where, for example, . For , the virus has been transmitted to people uniformly so that, conditionally a virus is transmitted, the probability that it reaches a person from group is . For , the virus has been transmitted to people in such a way that, conditionally a virus is transmitted, the probability that it reaches a person from is with . Moreover, at time the value of may change to in view of self-isolation. The parameters in this model are:
: relative size of the group ;
defines the start of the isolation strategy;
defines the strength of isolation of the group ;
: the reproductive number after intervention for ; it defines the strength of the overall isolation.
We have run a large number of scenarios coded using a combination of R and Julia ; the code is provided in Appendix. In Figures 6–11, we illustrate a few of these scenarios. In all these scenarios, we have chosen and . The results are robust towards values of these parameters (subject to faster or slower rate of the epidemic in dependence on ). In Section 5 we use to illustrate some specific results; can be considered either as a generic value or, in the contents of Section 5, as the relative size of the group of vulnerable people which is larger than the group of people aged .
We would like to emphasise that there is still a lot of uncertainty on who is vulnerable. It is very possible that the long term effect of the virus might cause many extra morbidities in the coming years coming from severe cases who recover. It is also possible that the virus will mutate and the new strain might affect younger populations more than the current strain. These, and many more possibilities, have a non-negligible probability of occurring, and they have devastating effects. In future models we will try to address these possibilities.
To estimate the number of COVID cases at a given time is difficult . This implies that at the time of making an intervention only very rough guesses about the value of , which is crucial for the future development of the epidemic, can be made.
We distinguish different strengths of isolation by values of parameters and :
The values and mean that under the condition that a virus is infecting a new person, the probabilities that this new person belongs to are and respectively.
Measuring the level of compliance in the population and converting this to simple epidemiological measures and is hugely complex problem which is beyond the scope of this paper.
Solid blue line: , the proportion of people recovered from the disease (or dead) in case of no intervention, as in Figure 1.
Dashed blue line: in case of intervention.
Solid black line: the proportion of people from group recovered from the disease (or dead) in case of intervention.
Solid red line: , the proportion of people non-infected by time in case of no intervention, as in Figure 1.
Solid orange line: , the proportion of people susceptible to the virus from group non-infected by time in case of intervention.
Solid green line: , the proportion of infected people at time with no intervention, as in Figure 1.
Dashed green line: , the proportion of infected people at time with intervention.
Solid purple line: , the proportion of infected people from group at time with intervention.
Considerably important, in view of the discussions of the next section, is Figure 6. In this figure, for the initial data of Figure 1, we strongly separate group at the time when 10% of the population is infected. We make no call to the general public for social distancing. In this scenario, the intervention does not considerably change the proportion of infected in the total population but it significantly ‘flattens the curve’ for the group : compare the purple and green lines. The maximum of is 0.086 which is 2.5 times lower than 0.216, the maximum of . Note also the fact that the curve is rather flat for long time, during the main stage of development of the epidemic. Just after the peak of the epidemic in the whole population, peaks; it is caused by the presence of very large number of infected people from the general population.
The scenario which led to Figure 6 is important and hence it has been re-run for a few variations of the model. Figure 7 shows the results of the simulations where we have removed the incubation period. The maximum of is now 0.075 which is more than twice lower than 1.63, the maximum of . Note that in the scenario with no incubation period the whole epidemic is milder as the virus lives longer.
In Figure 8 we use similar scenario as for Figure 6 but we separate group only mildly at the time when 10% of the population is infected. We make no call to general public for social distancing. The curve is ‘flattened’ for the group but in a considerably smaller degree than in the case of strong separation of .
Figures 9 and 10 illustrate the situation with mild and strong separation of people from complemented with introduction of the social distancing for general public. Interestingly enough, the effect of social distancing for general public gives less benefit than even mild isolation of people from group .
Figure 11 illustrates the scenario with no call to the general public for social distancing but with strong separation of the people from at at a later stage of epidemic, when 20% of the population is infected. The effect is similar to the one observed in Figures 6 and 7 except for the fact that the call for isolating the group came slightly late.
Conclusion. By considering a number of scenarios we have observed an extreme sensitivity of the negative consequences of the epidemic to the degree of separation of vulnerable people. Sensitivity to the parameter measuring the degree of self-isolation for the whole population is less apparent although it is much costlier.
This conclusion is very much in line with recommendations of leading German epidemiologists .
5 Consequences for the mortality and impact on NHS
Results of the type presented in Figures 6–11 can be translated into the language of the expected number of death and expected number of beds required. To do this we extend the observations we made at the end of Section 2 concerning translation of the curve into the curves for the expected number of death and expected number of hospital beds in the UK. We use the common split of the UK population into following age groups:
and corresponding numbers (; in millions) taken from 
The death probabilities are given from Table 1 in  and replicated many times by the BBC and other news agencies are:
Unfortunately, these numbers do not match the other key number given in : the UK average mortality rate which is estimated to be about As we feel the value of the UK average mortality rate is more important, we have multiplied all probabilities above by 0.732 to get the average mortality rate to be
Defining the group as a union of groups and , we have m and . We then compute the mortality rate in the group and for the rest of population by
For estimating the average number of death in the group and for the rest of population we can the use the formulas
respectively. The average numbers of hospital beds required for two different groups are proportional to these numbers.
We have run a series of scenarios for an UK epidemic without taking into account spatial and social heterogeneity of the society assuming that we would have separated (mildly and strongly) the group of people at the time when 10% of the population were infected with no call to general public for social distancing. We marked as March 23, 2020.
The two scenarios (with and ) we have used for illustrating this technique are different from the scenarios used for plotting Figures 6 and 7 only by the value of . For Figures 6 and 7, we have chosen but for the group of old people in the UK we have .
In Figures 13 and 13 we plot (using the same colours as in Figures 6 and 7) the following curves: (solid green, no intervention on March 23), (dashed green, intervention on March 23) and (purple, intervention on March 23).
In Figures 15 and 15 we plot the curves for the estimated average number of deaths: (solid green, no intervention on March 23), (purple, intervention on March 23), (dashed green, intervention on March 23) and combined (black, intervention on March 23).
We can deduce from Figure 15 (taking into account the extra factor of hospital beds availability) that strong separation of old people alone would have reduced the expected number of death by at least the factor of 2. Another feature of the scenario with is a roughly 50/50 split between the number of deaths in the and groups.
The y-axis in Figures 15 and 15, after multiplication by 140, can be roughly interpreted as hundreds in the London epidemic assuming , on March 23 and homogeneity of the epidemic (as there is about 9m people in London out of 64.1m). As mentioned in Section 3, deaths numbers in other regions of the UK should be expected to be lower.
We then have run through the scenarios with partial lockdown on March 23 reducing to and with a return, 30 days later, to with and . Results are plotted in Figures 17-19. The style of Figure is exactly the same as for Figures 13-15. If the value (proportion of non-infected people on March 23) happens to be larger than then the second wave of epidemic should be expected to be (perhaps, considerably) larger. If then the second wave will be less pronounced.
All these figures are given for illustration only without any claim on accurate predictions as there is an uncertainty of the outputs towards the choice of several parameters describing the virus characteristics but the sensitivity of the model towards the choice of these parameters is not yet adequately assessed. Figures 21-21 illustrate sensitivity of the scenario of Figures 17-17 with respect to the choice of , the shape parameter of the Erlang distribution defined in Section 2. In these figures, we plot 100 trajectories similar to the trajectories of Figures 17-17 with the values of taken at random in the interval .
Conclusion. The outputs of the model developed above can be translated into the language of the expected number of death due to the epidemic. This language amplifies the findings of the previous section stating that isolation of a relatively small percentage of population may hugely reduce the death toll of the epidemic.
Some of the next steps and open problems
Run many different scenarios to get better understanding of the current situation with the epidemic and what can be done to effectively control it.
Continue calibrating the model against existing epidemiological models and new data as it emerges.
Understanding the reproductive number in dependence on the population density in local areas and hence the mixing distribution for needed to combine sub-populations.
and other virus parameters as random variables due to mutations.
Understanding the risk as a function depending on different factors such as age and social groups.
Use of OR models for studying NHS capacity issues under different scenarios and correlated to the percentage of health providers getting infected by the virus.
Properly quantify uncertainty for model predictions.
Full-scale sensitivity analysis to different parameters.
With better understanding of the role of parameters, formulate inverse problems like finding the stage of an epidemic by its early development.
Understand and quantify the differences in epidemic developments across different countries such as UK, USA, Italy and Spain.
Studying long-term effects of coronovirus and future mortality from Covid-19.
Studying mortality from other diseases due to the capacity/fragility of the health system during the epidemic.
Modelling epidemic control strategies based on testing, tracing and isolation.
Appendix: Julia/R code used for computing the scenarios of Section 4
-  Daley D.J., Gani J. Epidemic modelling: an introduction. Cambridge University Press, 2001.
-  Hethcote H. W. The mathematics of infectious diseases. SIAM review: vol. 42, No. 4, pp. 599–653., 2000.
Ferguson M.N. et al.
Impact of non-pharmaceutical interventions (NPIs) to reduce
COVID19 mortality and healthcare demand.
-  ”https://www.spiegel.de/wissenschaft/medizin/coronavirus-die-zahlen-sind-vollkommen-unzuverlaessig-a-7535b78f-ad68-4fa9-9533-06a224cc9250”.
-  Shen C., Taleb N. N., Bar-Yam Y. Review of ferguson et al “impact of non-pharmaceutical interventions…”. ”https://necsi.edu/review-of-ferguson-et-al-impact-of-non-pharmaceutical-interventions”.
-  Ghafari M., Moritz Kraemer M., Thompson C., Simmonds P., Klenerman P., Gupta S. Lourenco J., Paton R. Fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the SARS-CoV-2 epidemic. Doi: ”https://doi.org/10.1101/2020.03.24.20042291”, 2020.
-  Tang B., Bragazzi N.L., Li Q., Tang S., Xiao Y, Wu J. An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov). Infectious Disease Modelling, vol. 5, pp. 248–255.”, 2020.
-  Kucharski A., Russell T., Diamond C., Liu Y. Analysis and projections of transmission dynamics of nCoV in Wuhan. CMMID repository, vol. 2.”, 2020.
-  ”www.womenshealthmag.com/health/a31284395/how-long-does-coronavirus-last/”.
-  Coronavirus: elderly in lockdown and children in school help Sweden pursue herd immunity, The Sunday Times. ”www.thetimes.co.uk/edition/world/coronavirus-elderly-in-lockdown-and-children-in-school-help-sweden-pursue-herd-immunity-r705m76dd”, 2020.
-  ”https://www.haaretz.com/israel-news/.premium.MAGAZINE-israeli-expert-trump-is-right-about-covid-19-who-is-wrong-1.8691031”.
-  ”https://www.medrxiv.org/content/10.1101/2020.02.12.20022566v1”.
-  Cirillo P., Taleb N. N. Tail Risk of Contagious Diseases. ”https://www.academia.edu/42307438/Tail_Risk_of_Contagious_Diseases”, 2020.
-  J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017.
-  ”https://www.spiegel.de/wissenschaft/medizin/coronavirus-ist-eine-kontaktsperre-nur-fuer-risikogruppen-eine-alternative-a-0f25ccea-780b-47ce-a06f-f76e98e6924d”.
-  ”www.statista.com/statistics/281174/uk-population-by-age/”.
-  Zhigljavsky A. Theory of global random search. Kluwer Academic Publishers., 1991.
-  Zhigljavsky A., Zilinskas A. Stochastic global optimization. Springer., 2007.
-  Moskvina V., Zhigljavsky A. An algorithm based on singular spectrum analysis for change-point detection. Communications in Statistics – Simulation and Computation, 32(2), 319-352., 2003.
-  Tartakovsky A., Nikiforov I., Basseville M. Sequential analysis: Hypothesis testing and change-point detection. CRC Press., 2014.
-  Myerson R. B. Game theory. Harvard university press., 2013.
-  Watanabe S., Nair M. G., Rajeev B. Lectures on stochastic differential equations and Malliavin calculus. Springer., 1984.
-  Anh V. V., Leonenko N. N. Renormalization and homogenization of fractional diffusion equations with random data. Probability Theory and Related Fields, 124(3), 381-408., 2002.