Predication of Inflection Point and Outbreak Size of COVID-19 in New Epicentres

07/15/2020 ∙ by Qibin Duan, et al. ∙ qut 0

The coronavirus disease 2019 (COVID-19) had caused more that 8 million infections as of middle June 2020. Recently, Brazil has become a new epicentre of COVID-19, while India and African region are potential epicentres. This study aims to predict the inflection point and outbreak size of these new/potential epicentres at the early phase of the epidemics by borrowing information from more `mature' curves from other countries. We modeled the cumulative cases to the well-known sigmoid growth curves to describe the epidemic trends under the mixed-effect models and using the four-parameter logistic model after power transformations. African region is predicted to have the largest total outbreak size of 3.9 million cases (2.2 to 6 million), and the inflection will come around September 13, 2020. Brazil and India are predicted to have a similar final outbreak size of around 2.5 million cases (1.1 to 4.3 million), with the inflection points arriving June 23 and July 26, respectively. We conclude in Brazil, India, and African the epidemics of COVI19 have not yet passed the inflection points; these regions potentially can take over USA in terms of outbreak size



There are no comments yet.


page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), first reported in Wuhan, China at the end of 2019, spread across China and the globe and was declared a pandemic on March 11, 2020. As of middle June 2020, it had caused more than 8 million infections. As reported by WHO (2020), in recent months, USA and Europe have been epicentres of COVID-19 and are experiencing the rapid increase of its outbreak size. Although the growth of daily confirmed infections in these regions is slowing down, there is still no sign that the epidemic has gained any control or “flattened” in these regions; even worse, Brazil, India and African region (all affected countries) have become the new epicentres of COVID-19 with increasing number of confirmed infection every day (Lancet, 2020; Pearson et al., 2020).

Mathematical modelling, including statistical modelling, is an important tool to understand and predict the dynamics of new diseases. Since the first identification of COVID-19, different approaches haves been developed to simulate and characterize its dynamics and spread trend (Grasselli et al., 2020; Yang et al., 2020; Roosa et al., 2020; Petropoulos and Makridakis, 2020; Cui and Hu, 2020; Perc et al., 2020; Sajadi et al., 2020; Zhang et al., 2020). The classical one is the dynamic infectious disease modeling, with using deterministic ODE models or stochastic individual based models, and such approach allows to incorporate with underlying mechanisms of spread and various risk factors in the simulation of transmission (Peng et al., 2020; Wynants et al., 2020; Kucharski et al., 2020; Fanelli and Piazza, 2020)

. This approach is commonly used to identify the crucial transmission parameters and assess the potential impact of public health interventions. Although prediction of transmission trend can be also achieved, setting up such models needs heavy information of local demography and praxiology that is difficult to obtain accurately. For the purpose of prediction, data-driven (or phenomenological) methods are preferred, i.e., machine learning and statistical modeling

(Alimadadi et al., 2020; Ribeiro et al., 2020; Benvenuto et al., 2020; Ceylan, 2020; Ribeiro et al., 2020; Zheng et al., 2020; Kavadi et al., 2020).

Almost investigated modeling approaches have been developed to characterize the transmission and impact of COVID-19 in the context of a specific country or region, from which this information can be estimated, e.g., forecasting the confirmed cases and deaths in China

(Gao et al., 2020; Al-Qaness et al., 2020). However, for many regions, such modelling studies are still unavailable now. Knowledge of the inflection (“flattening the curve”) and maximal outbreak size is crucial to reflect the evolving trend of an epidemic; in the case of COVID-19, this information remains unclear but does influence the dates of changes to policy restrictions and the recovery of the global economy. Furthermore, an accurate prediction of such information at the early stage is difficult due to the lack of detailed data on testing availability and reporting/infection processes as well as governmental restrictions.

To address the problems, non-linear mixed effect model is used to model the (transformed) daily reported number of cumulative confirmed cases. The data was grouped according to country or region. In the countries at the later stage of the epidemics, the reported number of cumulative confirmed cases all show a sigmoidal shape with respect to time, so we use the four parameters Logistic model (FPLM), a generalization of Logistic growth model, to model the growth patterns. By fitting to such non-linear mixed effect model we can predict the inflection point and final size of outbreak in modeled countries and regions.

2 Data

The data set was downloaded from European Centre for Disease Prevention and Control, which provides the geographic distribution of COVID-19 cases worldwide, e.g. daily incidence (newly confirmed cases), cumulative number of confirmed cases, population of each country, etc., from European Centre for Disease Prevention and Control. Note that we select two groups of countries/regions, the first is for the countries in the late stage of outbreak, including Australia, China, France, Italy, Germany, Spain and UK; and countries/regions in the second group are still in the early stage of the outbreak, including USA, Africa (as a whole), Brazil, India and Russia. The data set is about the officially confirmed and reported cases, which is inevitably inaccurate and under-reported due to the limited coverage of testing, especially in the early period of the outbreak. Also, this study is more interested in the future growth trend and final outbreak size. Hence, early observations were thrown away to enable the model to fit the early period more flexibly.

3 Growth curves with random coefficients

Let be the cumulative numbers at time and be the derivative function representing the growth rate. If represents the explanatory variables(such as temperature, or behavour changes due to government restrictions) believed to be related to the growth rate, we need to incorporate their effect in the growth model via a link function


where specifies the growth rate as a function of its current size under the constant environmental condition while models how the growth rate might change when the environmental conditions () changes. Here

is the stochastic error of zero mean representing the environmental or measurement perturbation possibly with heteroscedasticity. More details can be seen in

Wang (1999).

The simple linear function of corresponds to the asymptotic regression model (also known as von Bertalanffy growth curve). However, we are particularly interested in a sigmoidal curves and the inflection point is of great interest. The well known curves of this type include logistic (autocatalytic), Richards and Gompertz (Seber and Wild, 1989).

In this study, we will apply the mixed-effects models assuming each country follows the same curve but different set of parameters. These parameters can be potentially modelled as functions of population size and other attributes. More details can be seen in Pinheiro and Bates (2006). A model for nonlinear mixed-effects can be written as


for observation in group . In model (2), includes both fixed effect and random effects . Specifically in our case, is the cumulative number of confirmed cases, is the transformation used, is the index of country/region, and is time index of the observation (day). Here and are design matrices for th group to determine the fixed and random effects. The advantage of mixed-effect model is to produce more precise estimate of and by borrowing strength/information from the rest of the sample from the population. See Pinheiro and Bates (2006) for more detains about non-linear mixed effect model.

To model the confirmed cases over time we use a four Four Parameter Logistic Model (FPLM)


where , and the parameters are:

  • , the minimum theoretical value of as time ;

  • , the maximum value as ;

  • , the inflection point, and the response is midway between the and ;

  • , is a scale parameter for time.

We are particularly interested in the following two parameters. : the maximum number (asymptote), and : the number of cases at the inflection point (.

We first tried this on the raw data (Model 1). We then used power and logarithmic transformation of cumulative number of confirmed cases as the fitting response as follows.

  1. Power (square root) transformation (Model 2)


    with . where ;

  2. Logarithmic Transformation (Model 3)


Note that the parameters of FPLM model have different interpretations under transformations. The models are validated by comparing the first-order difference of modeled to the corresponding reported daily new confirmed cases.

4 Results

Here we fit the cumulative cases after power transformation to the well-known growth curves to describe the epidemic trends in the countries and regions of interest under the mixed-effect model and using the four-parameter logistic model. The advantage of the mixed-effect model is to “borrow information” from the members with rich information (Pinheiro and Bates, 2006). The four-parameter logistic model has been proved to perform well in describing epidemic growths (Wu et al., 2020; Chen et al., 2020).

We included 12 countries and regions in our study. Nine countries (Australia, China, France, Germany, Italy, Russia, Spain, UK and USA) have almost experienced a full growth curve, and other regions (Brazil, India and the African region) at the early phase of the epidemic will have certain similarity with some of these nine countries in transmission and response strategies. This may result in similarity in the growth of confirmed cases.

The fitted results for all selected countries/regions under three different models are shown in Table 1. Note that , , and is the number of cases at the inflection point, which is . Model 1, 2 and 3 correspond to no transformation, power transformation and log transformation of the cumulative number of confirmed cases in  (3), (4) and (5), respectively. Additionally, Figure 1,  2 and  3

show the fitted curves and data (circles in the figures) used for fitting of each country or region. (Each color represents the growth curve of one country or region. The dots in the figure are the reported number of cumulative confirmed cases and the curves are the estimated growth curves. The bottom horizontal line segments are the estimate of maximal outbreak size, while top horizontal line segments are the upper boundary of 95% confidence interval.) Moreover, The reported number of daily confirmed cases (daily incidence in Figure 

4 and 5) is used to validate the fitting model.

Additionally, here it is statistically sensible to consider a power transformation, . The well known Box-Cox transformation, , would be a sensible choice (the limiting case of corresponds to the log-transformation. An ’optimal’ or estimate of can also be obtained via likelihood or other robust statistical approaches. We have chosen three different values (1, 0.5 and 0) in this analysis. It is interesting to see how robust the results are in terms of ballpark estimates of the inflection point and maximum number of cases.

Our models have confirmed that the nine countries aforementioned had passed the inflection point (the squared marks in Figure  1). Specifically, China had the inflection point on February 9, 2020, with a final outbreak size of 87k (Gu et al., 2020). Australia, Italy, Spain, Germany and France have passed their inflection point in later March and early April; in these countries the current outbreak size almost reaches the estimates of maximal level (with invisible upper confidence boundaries in Figure  1). Unfortunately, cases in other countries (i.e., USA, Russia, and UK) will continue to increase, possibly to 2,357k (up to 2,425k); 538k (up to 546k); and 309k (up to 313k), respectively.

In terms of other three regions (Brazil, India, and African region), they are still in the early outbreak phase (before inflection). Prediction of inflection point with current data for these regions is very difficult, based on the “shape” from other countries, the non-linear mixed effect model provides sensible predictions albeit large error intervals (in the left panel of Figure  1). African region is predicted to have the largest total outbreak size of 3.9 million cases (2.2 to 6 million), and the inflection will come around September 13, 2020. Note that African region here includes all the African countries affected by COVID-19. Brazil and India are predicted to have a similar final outbreak size of around 2.5 million cases (1.1 to 4.3 million), with the inflection points arriving June 23 and July 26, respectively. The epidemic in Brazil has entered the rapid growth stage and is increasing quickly in the number of the cumulative confirmed cases. The epidemic in India has the similar situation with Brazil, but the growth is predicted to have one-month delay.

5 Conclusions

In these developing areas, community transmission and spread of COVID-19 has been ongoing for a while, but large-scale testing is only available until recent weeks. The rapid increase in confirmed cases can be attributed to the increasing level of testing coverage. Currently African region has a relatively small number of confirmed infection (150k), but it might continue to increase to the level estimated in this study if not controlled effectively (Pearson et al., 2020).

Although USA passed the inflection point around April 18,2020, its growth rate (or the number of daily new confirmed infections) failed to achieve rapid decrease. Russia has a similar epidemic curve to USA. This indicates that there are no effective intervention strategies in these countries to further curb the ongoing transmission of COVID-19. Form the validation figures in supplementary material, our model tends to underpredict the daily new confirmed infections in USA and Russia during the later phase of outbreak after inflection points; thus, USA and Russia may have a larger outbreak size than our estimates. Likewise, if these new epicentres (i.e., Brazil, India and African region) could not conduct effective measures to mitigate the spread and transmission, there will be a large number of confirmed infections even after the inflection points. In summary, our model predicted the inflection point and maximal outbreak size in Brazil, India, and African region; these regions might take over USA in terms of outbreak size at the end.

This work only fits the growth curves to reported number of confirmed infections, and incorporating localized intervention policies and behavior parameters will improve the performance of fitting and prediction. Furthermore, in this work we test three different power transformation of cumulative number of cases, however, they are likely not the optimal choice for all countries and regions. Other power transformations are worthwhile to test and possibility of using different transformation to different countries/regions should be explored under the same framework of mixed-effect model.

CRediT authorship contribution statement

Qibin Duan: Conceptualization, Methodology, Formal analysis, Validation, Writing - original draft, Writing - review & editing. Wu: Methodology, Writing - review & editing. Jinran Wu: Writing - review & editing. You-Gan Wang: Conceptualization, Methodology, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


This work was supported by supported by the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), under grant number CE140100049.


  • M. A. Al-Qaness, A. A. Ewees, H. Fan, and M. Abd El Aziz (2020) Optimization method for forecasting confirmed cases of covid-19 in china. Journal of Clinical Medicine 9 (3), pp. 674. Cited by: §1.
  • A. Alimadadi, S. Aryal, I. Manandhar, P. B. Munroe, B. Joe, and X. Cheng (2020) Artificial intelligence and machine learning to fight covid-19. American Physiological Society Bethesda, MD. Cited by: §1.
  • D. Benvenuto, M. Giovanetti, L. Vassallo, S. Angeletti, and M. Ciccozzi (2020) Application of the arima model on the covid-2019 epidemic dataset. Data in brief, pp. 105340. Cited by: §1.
  • Z. Ceylan (2020) Estimation of covid-19 prevalence in italy, spain, and france. Science of The Total Environment, pp. 138817. Cited by: §1.
  • D. Chen, X. Chen, and J. K. Chen (2020) Reconstructing and forecasting the covid-19 epidemic in the united states using a 5-parameter logistic growth model. Global Health Research and Policy 5, pp. 1–7. Cited by: §4.
  • H. Cui and T. Hu (2020) Nonlinear regression in covid-19 forecasting. Scientia Sinica Mathematica. Cited by: §1.
  • D. Fanelli and F. Piazza (2020) Analysis and forecast of covid-19 spreading in china, italy and france. Chaos, Solitons & Fractals 134, pp. 109761. Cited by: §1.
  • Y. Gao, Z. Zhang, W. Yao, Q. Ying, C. Long, and X. Fu (2020) Forecasting the cumulative number of covid-19 deaths in china: a boltzmann function-based modeling study. Infection Control & Hospital Epidemiology, pp. 1–3. Cited by: §1.
  • G. Grasselli, A. Pesenti, and M. Cecconi (2020) Critical care utilization for the covid-19 outbreak in lombardy, italy: early experience and forecast during an emergency response. Jama 323 (16), pp. 1545–1546. Cited by: §1.
  • C. Gu, J. Zhu, Y. Sun, K. Zhou, and J. Gu (2020) The inflection point about covid-19 may have passed. Science Bulletin. Cited by: §4.
  • M. D. P. Kavadi, R. Patan, M. Ramachandran, and A. H. Gandomi (2020) Partial derivative nonlinear global pandemic machine learning prediction of covid 19. Chaos, Solitons & Fractals, pp. 110056. Cited by: §1.
  • A. J. Kucharski, T. W. Russell, C. Diamond, Y. Liu, J. Edmunds, S. Funk, R. M. Eggo, F. Sun, M. Jit, J. D. Munday, et al. (2020) Early dynamics of transmission and control of covid-19: a mathematical modelling study. The lancet infectious diseases. Cited by: §1.
  • T. Lancet (2020) COVID-19 in brazil:“so what?”. Lancet (London, England) 395 (10235), pp. 1461. Cited by: §1.
  • C. A. Pearson, C. Van Schalkwyk, A. M. Foss, K. M. O’Reilly, J. R. Pulliam, C. C. working group, et al. (2020) Projected early spread of covid-19 in africa through 1 june 2020. Eurosurveillance 25 (18), pp. 2000543. Cited by: §1, §5.
  • L. Peng, W. Yang, D. Zhang, C. Zhuge, and L. Hong (2020) Epidemic analysis of covid-19 in china by dynamical modeling. arXiv preprint arXiv:2002.06563. Cited by: §1.
  • M. Perc, N. Gorišek Miksić, M. Slavinec, and A. Stožer (2020) Forecasting covid-19. Frontiers in Physics 8, pp. 127. Cited by: §1.
  • F. Petropoulos and S. Makridakis (2020) Forecasting the novel coronavirus covid-19. PloS one 15 (3), pp. e0231236. Cited by: §1.
  • J. Pinheiro and D. Bates (2006) Mixed-effects models in s and s-plus. Springer Science & Business Media. Cited by: §3, §4.
  • M. H. D. M. Ribeiro, R. G. da Silva, V. C. Mariani, and L. dos Santos Coelho (2020) Short-term forecasting covid-19 cumulative confirmed cases: perspectives for brazil. Chaos, Solitons & Fractals, pp. 109853. Cited by: §1.
  • K. Roosa, Y. Lee, R. Luo, A. Kirpich, R. Rothenberg, J. Hyman, P. Yan, and G. Chowell (2020) Real-time forecasts of the covid-19 epidemic in china from february 5th to february 24th, 2020. Infectious Disease Modelling 5, pp. 256–263. Cited by: §1.
  • M. M. Sajadi, P. Habibzadeh, A. Vintzileos, S. Shokouhi, F. Miralles-Wilhelm, and A. Amoroso (2020) Temperature and latitude analysis to predict potential spread and seasonality for covid-19. Available at SSRN 3550308. Cited by: §1.
  • G. A. F. Seber and C. J. Wild (1989) Nonlinear regression. New York: John Wiley and Sons.. Cited by: §3.
  • Y. Wang (1999) Estimating equations for parameters in stochastic growth models from tag–recapture data. Biometrics 55 (3), pp. 900–903. Cited by: §3.
  • K. Wu, D. Darcet, Q. Wang, and D. Sornette (2020) Generalized logistic growth modeling of the covid-19 outbreak in 29 provinces in china and in the rest of the world. arXiv preprint arXiv:2003.05681. Cited by: §4.
  • L. Wynants, B. Van Calster, M. M. Bonten, G. S. Collins, T. P. Debray, M. De Vos, M. C. Haller, G. Heinze, K. G. Moons, R. D. Riley, et al. (2020) Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. bmj 369. Cited by: §1.
  • Z. Yang, Z. Zeng, K. Wang, S. Wong, W. Liang, M. Zanin, P. Liu, X. Cao, Z. Gao, Z. Mai, et al. (2020) Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions. Journal of Thoracic Disease 12 (3), pp. 165. Cited by: §1.
  • X. Zhang, R. Ma, and L. Wang (2020) Predicting turning point, duration and attack rate of covid-19 outbreaks in major western countries. Chaos, Solitons & Fractals, pp. 109829. Cited by: §1.
  • N. Zheng, S. Du, J. Wang, H. Zhang, W. Cui, Z. Kang, T. Yang, B. Lou, Y. Chi, H. Long, et al. (2020) Predicting covid-19 in china using hybrid ai model. IEEE Transactions on Cybernetics. Cited by: §1.