1 Introduction
Since Thursday, March 26, 2020, the US leads the world in terms of the cumulative number of infected cases for a novel coronavirus, COVID19. On this day, a dashboard provided by the Center for Systems Science and Engineering (CSSE) at the Johns Hopkins University (https://systems.jhu.edu/) reported that the numbers of the confirmed, death, and recovered from the virus in the US are 83,836, 1,209, and 681, respectively. Figure 1 displays daily infection trajectories describing the cumulative numbers of infected cases for eight countries (US, Spain, Italy, China, UK, Brazil, South Korea, and India), spanning from January 22nd to April 9th, which accounts for 79 days. The dotted vertical lines on the panel mark certain historical dates that will be explained. As seen from the panel, the US has been a laterunner until March 11th in terms of the infected cases, but the growth rate of the cases had suddenly skyrocketed since the day, and eventually excelled the forerunner, China, just in two weeks, on March 26th. Figure 2 shows the cumulative infected cases for 50 countries on April 9th: on the day, the number of cumulative infected cases for the US was 461,437, two times more than that of Spain which is 153,222.
Since the COVID19 outbreak, there have been numerous research works to better understand the pandemic in different aspects (EstimateDiamondship; 2002.06563; JTD36385; 2002.12298; REMUZZI2020; 2003.05447; trendforecastchina; gao2020breakthrough). Some of the recent works from statistics community are as follows. EstimateDiamondship
focused on a serial interval (the time between successive cases in a chain of transmissions) and used the gamma distribution to study the transmission on Diamond Princess cruise ship.
2002.06563 proposed the generalized susceptible exposed infectious removed model to predict the inflection point for the growth curve, while JTD36385 modified the proposed model and considered the public health interventions in predicting the trend of COVID19 in China. 2002.12298 proposed a differential equation prediction model to identify the influence of public policies on the number of patients. trendforecastchina used a symmetrical function and a long tail asymmetric function to analyze the daily infections and deaths in Hubei and other places in China. REMUZZI2020 used an exponential model to study the number of infected patients and patients who need intensive care in Italy. One of the major limitations of these works is that the researches are confined by analyzing data from a single country, thereby neglecting the global nature of the pandemic.One of the major challenges in estimating or predicting an infection trajectory is the heterogeneity of the country populations. It is known that there are four stages of a pandemic: visit economictimes.indiatimes.com/. The first stage of the pandemic contains data from people with travel history to an already affected country. In stage two, we start to see data from local transmission, people who have brought the virus into the country transmit it to other people. In the third stage, the source of the infection is untraceable. In stage four the spread is practically uncontrollable. In most of the current literature, estimation or prediction of the infection trajectory is based on a single country data where the status of the country falls into one of these four stages. Hence, such estimation or prediction may fail to capture some crucial changes in the shape of the infection trajectory due to a lack of knowledge about the other stages. This motivates the use of data integration (lenzerini2002data; huttenhower2006bayesian) which combines data from different countries and elicits a solution with a unified view of them. This will be particularly useful in the current context of the COVID19 outbreak.
Recently, there are serious discussions all over the world to answer the crucial question: “even though the current pandemic takes place globally due to the same virus, why infection trajectories of different countries are so diverse?” For example, as seen from Figure 1, the US, Italy, and Spain have accumulated infected cases within a short period of time, while China took a much longer time since the onset of the COVID19 pandemic, leading to different shapes of infection trajectories. It will be interesting to find a common structure in these infection trajectories for multiple countries, and to see how these trajectories are changing around this common structure. Finally, it is significant to identify the major countrywide covariates which make infection trajectories of the countries behave differently in terms of the spread of the disease.
2 Significance
The rapid spread of coronavirus has created pandemic, and countries all over the world are struggling with a surge in COVID19 infected cases. Scientists are working on estimating the infection trajectory for future prediction of cases, which will be useful for future planning and policymaking. We propose a hierarchical model that integrates worldwide data to estimate COVID19 infection trajectories. Due to information borrowing across multiple countries, the proposed growth curve model will be a powerful predictive tool endowed with uncertainty quantification. Additionally, we use countrywide covariates to adjust curve fitting for the infection trajectory. A joint variable selection technique has been integrated into the modeling scheme, which will identify the possible reasons for diversity among the countryspecific infection curves.
3 Our Contribution
There are three major classes of infectious disease prediction models: (i) differential equation models, (ii) time series models, and (iii) the statistical models. The differential equation models describe the dynamic behavior of the disease through differential equations allowing the laws of transmission within the population. The popular models include the SI, SIS, SIR, and SEIR models (2000SIAMR..42..599H; ExactanalSIR; korobeinikov2004lyapunov)
. These models are based on assumptions related to S (susceptible), E (exposed), I (infected), and R (remove) categories of the population. Time series based prediction models such as ARIMA, Grey Model, Markov Chain models have been used to describe dependence structure over of the disease spread over time
(EpidemiologyARIMA; GMpredict; hu2006rainfall; rushton2006disease; GeneralizedMarkov). On the other hand, statistical models which follow the laws of epidemiology (clayton2013statistical; thompson2006epidemiology) are also popular, and can be easily extended in the framework of hierarchical models (multilevel model) to analyze data within a nested hierarchy, eventually harnessing the data integration (hill1965inference; tiao1965bayesian; stone1965paradox; browne2006comparison). In this paper, we use Bayesian hierarchical models so that data integration and uncertainty analysis (malinverno2004expanded) are possible in a unified way.Specifically, we use the Gompertz growth curve model (gompertz1825xxiv). The novelties of our method are as follows: we (i) use a flexible hierarchical growth curve model to global COVID19 data, (ii) integrate information from multiple countries for estimation and prediction purposes, (iii) adjust for countryspecific covariates, and (iv) perform covariate selection to identify the important reasons to explain the differences among the countrywise infection trajectories. We demonstrate that our proposed models perform better than the individual countrybased modes.
3.1 Gompertz growth curve models
The Gompertz growth curve model (gompertz1825xxiv) is widely used to describe a growth curve for population studies in situations where growth is not symmetrical about the point of inflection (seber2003nonlinear; anton1988calculus). Examples include trend of mobile phone uptake, bacterial growth in a confined space, and growth of cancer stem cell tumor (islam2002modelling; zwietering1990modeling; sottoriva2010cancer; caravelli2015optimal). There are variant versions of the curve in the literature (tjorve2017use), and we use the following form in this research
(1) 
where , , and are real numbers. It is easy to derive that the Gompertz curve (1) has its unique inflation point at (goshu2013derivation).
Figure 3 shows different shapes of the Gompertz growth curve obtained by varying each of the three parameters, , , and , while fixing others. The followings are summary of the role of the the parameters: first, represents an asymptote for the curve (1); second, is related to a growth rate (slope) at the inflection point ; third, sets the displacement along the xaxis.
We use the Gompertz growth curve (1) to model the infection trajectory. In this context, each of the curve parameters can be interpreted as follows: is the maximum cumulative number of infected cases across the times; is the growth rate of the trajectory at the inflection time point; and is the inflection time point of the trajectory. More detailed interpretations will be revisited in Subsection 4.5.
4 Results
4.1 Benefits from the information borrowing
We investigate the predictive performance of three Bayesian models based on the Gompertz growth curve. We start with the individual countrybased model (here we use only the single country data) which has been widely used in the literature (). Next, we extend the previous model to a hierarchical model by utilizing the infection trajectories of all the 50 countries (). A limitation of is that it lacks certain countrywide adjustments in estimating the trajectories where the borrowing information takes place uniformly across all the countries although those countries are heterogeneous in terms of aspects like socioeconomic, health environment, etc.. Next, we further upgrade this model by adding countryspecific covariates in a hierarchical fashion (). (For technical description for the three models, see the Subsection 6.3.) Eventually, borrowing information across the 50 countries takes place in these two hierarchical models, and , but not in the individual countrybased model .
For evaluation criteria, we calculate the mean squared error (MSE) (Scoringmeasures) associated with the extrapolated infection trajectory for each of the 50 countries. Training and the test data are selected as follows: given that is an infection trajectory of the th country spanning for days since January 22nd, and is the chosen testday, then (i) the trajectory spanning for days since January 22nd, that is, , is selected as the training data, and (ii) the recent observations, , is selected as the test data.
For the two models and , the MSE is averaged over the 50 countries, given by
where is the actual value for the cumulative confirmed cases of the th country at the th time point, and is the forecast value. More concretely,
is the posterior predictive mean given the information from 50 countries. For the model
, the is acquired by using the predicted values based on a single country.We evaluate the from 20 replicates, for each of the shortterm testdays () and longterm testdays (), and then report the median of the ’s. The results are shown in Figure 4. From the panel, we see that (1) the predictive performances of two hierarchical models, and , are universally better than that of across the number of testdays; and (2) the gap of between and the other two models increases as the number of test days increases. Based on the outcomes, we conclude that information borrowing has improved the accuracy of the forecasting in terms of MSE. Hence, we present all the results in the consequent subsections based on the model . A similar result is found in the Clemente problem from (efron2010future) where the JamesStein estimator (james1992estimation) better predicts then an individual hitterbased estimator in terms of the total squared prediction error.
4.2 COVID19 travel recommendations by country
Centers for Disease Control and Prevention (CDC) categorizes countries into three levels by assessing the risk of COVID19 transmission, used in travel recommendations by country (Visit www.cdc.gov/): Level 1, Level 2, and Level 3 indicate the Watch Level (Practice Usual Precautions), Alert Level (Practice Enhanced Precautions), and Warning Level (Avoid Nonessential Travel), respectively.
We categorize the 50 countries into the three levels by estimating the the total number of infected cases (that is, of the Gompertz growth curve (1)), for the 50 countries. Grouping criteria are as follows: (1) Level 1 (estimated total number is no more than 10,000 cases); (2) Level 2 (estimated total number is between 10,000 and 100,000 cases); and (3) Level 3 (estimated total number is more than 100,000 cases).
Figure 5 displays results of posterior inference for the by country, based on the model . Countries on the axis are ordered from the severest country (US) to the least severe country (Slovenia) in the magnitude of the posterior means for the . Countries categorized as Level 3 are US, France, UK, Spain, Iran, Italy, Germany, and Brazil: this list is similar to the list of countries labeled with the Warning Level designated by CDC except that China has been excluded and Brazil has been included. There are 31 and 11 countries categorized as Level 2 and Level 1, respectively.
4.3 Extrapolated infection trajectories and flat time points
Figure 7 displays the extrapolated infection trajectory (posterior mean for the Gompertz growth curve) for the USA. The posterior mean of the maximum number of cumulative infected cases is 1,106,426 cases. The scenario that ‘millions’ of American could be infected was also warned by a leading expert in infectious diseases (Visit a related news article www.bbc.com/).
A crucial question is then when this trajectory gets flattened. To that end, we approximate a time point where an infection trajectory levels off its value, showing a flattening pattern after that time point. The following is the definition of the flat time point which we use in this paper:
Definition 4.1.
Given the Gompertz growth curve (1), the flat time point is defined as the solution of the equation for some small , given by
Specifically speaking, the flat time point is the time point whereat only number of infected cases can maximally take place to reach the maximum confirmed cases , after the time point . Figure 6 depicts an exemplary infection trajectory obtained by the Gompertz curve (1) with . In this case, a flat time point is approximately when . The choice of depends on the situation of a country considered: for China which already shows flattening phase (refer to Figure 1) in the infection trajectory, (case) can be safely used, but for US one may use (cases) or larger numbers.
For the US, the posterior means of the flat time points are May 8th, June 7th, July 7th, and August 6th when the corresponding ’s are chosen by 100,000, 10,000, 1,000, and 100, respectively. It is important to emphasize that these estimates are based on ‘observations tracked until April 9th’. Certainly, incorporation of new information such as compliance with social distancing or advances in medical and biological sciences for this disease may change the inference.
Figure 8 show the extrapolated infection trajectories for Spain, UK, and Brazil. Posterior means of the maximum number of cumulative infected cases are as follows: (1) for the Spain, 222,500 cases; (2) for the UK, 235,211 cases; and (3) for the Brazil, 109,157 cases. Posterior means of the flat times points are as follows: (1) for the Spain, =May 2nd, =May 27th, and =June 20th; (2) for the UK, =June 4th, =July 12th, and =August 19th; and (3) for the Brazil, =June 6th, =July 22nd, and =September 6th. Results for other countries are included in the SI Appendix.
4.4 Global trend for the COVID19 outbreak
Figure 9 displays the extrapolated infection trajectory for grand average over 50 countries obtained from the model
. Technically, this curve is acquired by extrapolating the Gompertz growth curve by using the intercept terms in linear regressions (
3). The grey dots on the panel are historical infection trajectories for 50 countries. Posterior means for the flat time points are =May 14th, =June 22nd, and =July 29th. Posterior means for the maximum accumulated cases is 79,392 cases.4.5 Identifying risk factors for severe disease due to COVID19
COVID19 is a new disease and there is very limited information regarding risk factors for this severe disease. There is no vaccine aimed to prevent the transmission of the disease because there is no specific antiviral agent is available (For more detail, visit www.cdc.gov/). It is very important to find risk factors relevant to the disease. CDC described HighRisk Conditions based on currently available information and clinical expertise: those at highrisk for severe illness from COVID19 include

People 65 years and older;

People who live in a nursing home or longterm care facility;

People with chronic lung disease or moderate to severe asthma;

People who are immunocompromised, possibly caused by cancer treatment, smoking, bone marrow or organ transplantation, immune deficiencies, poorly controlled HIV or AIDS, and prolonged use of corticosteroids and other immune weakening medications;

People with severe obesity (body mass index of 40 or higher);

People with diabetes;

People with chronic kidney disease undergoing dialysis;

People with liver disease.
The model
involves three separated linear regressions whose response, and coefficient vector are given by
and its corresponding regression parameters , respectively (). (See the equation (3)) The sparse horseshoe prior (carvalho2009handling; carvalho2010) is imposed for each of the coefficient vectors which makes the model equipped with covariates analysis. That way, we can identify key predictors explaining the heterogeneity of shapes among countrywise infection trajectories, which can be further used in finding risk factors for severe disease due to COVID19. The results are in table 1 1.Rank  

1  Doc_num()  Alcohol_cons_rec()  Doc_num() 
2  Overweight()  Life_expect_total_60()  Testing_num_COVID19() 
3  Alcohol_cons_unrec()  Hib3_immun ()  Life_expect_total_birth() 
4  MCV2_immun()  Heavy_drinking_total()  Dis_to_China() 
5  Hosp_bed()  Dtt_dtp_immun()  Envi_death() 
6  MCV1_immun()  Risk_Communication()  Surveillance() 
7  Points_of_Entry()  Human_Resources()  Heavy_drinking_total() 
8  Cholesterol()  Cigarette_smoke()  Hea_life_expect_total_60 () 
9  Life_expect_total_60()  Tobacco_smoke()  Risk_Communication() 
10  Food_Safety()  Health_Emergency()  Alcohol_cons_rec() 

NOTE: Covariates are ranked based on the absolute values of the posterior means for the coefficients, ordered from the largest to the smallest: the table shows only top 10 interesting covariates. See SI Appendix for detailed explanation for the listed covariates..
The followings are general guideline about how covariates on the Table 1 can be interpreted in analyzing infection trajectories in the context of pandemic.

In the second column of the Table 1, the parameter represents the total number of infected cases across the times. A larger number of implies that a country has (can have) more COVID19 infected patients. A covariate with plus sign () (or minus sign ()) is a factor associated with an increase (or decrease) of the total infected cases.

In the third column of the Table 1, the parameter represents the a growth rate of the infection trajectory at the time point . A larger number of implies a faster spread of the virus around the country. A covariate with plus sign () (or minus sign ()) is a factor associated with a rapid (or slow) spread of the virus.

On the fourth column of the Table 1, the parameter is related to the a timedelaying factor of the infection trajectory. The larger the value of the later the trajectory begins to accumulate infected cases, leading to a later onset of the accumulation. A covariate with plus sign () (or minus sign ()) is a factor associated with accelerating (or decelerating) the onset of the accumulation.
Now, based on the aforementioned guideline, we shall interpret the Table 1 in detail. (The reasoning reflects our subjectivity, and disease expert should decipher precisely.)
For the parameter , it is obvious that a country with having more doctors and hospital beds (hospitalutilization) can treat more patients, possibly including COVID19 infected patients, more efficiently, which results in decreasing the total number of cases. General health status of a population (Demographicscience) also affects the value of : long life expectancy and large numbers of people with older age, overweight (visit related news article www.cidrap.umn.edu/), higher cholesterol, or higher alcohol consumption can increase the total number of infected cases. On the other hand, proper vaccinations for measles and higher scores in health regulations associated with food safety and importation (Healthsecuritycapacities) can keep the total number of infected cases low.
Turning to the parameter , it is shown that having longer life expectancy and larger numbers of elderly people, smokers, and heavy alcohol drinkers may accelerate the rapid disease transmission among people, increasing the growth rate of the infection trajectory. Better immunization coverage such as Haemophilus influenzae type b third dose (Hib3) immunization and Diphtheria tetanus toxoid and pertussis (DTP3) immunization help to decrease the growth rate. Effective response and risk communication during a public health emergency and sufficient human resources in healthcare are also helpful.
Finally, moving to the parameter , having larger numbers of doctors and COVID19 testings conducted are helpful in earlier detection of the infected patients, which leads to an earlier onset of the accumulation of the infected patients. Besides, having longer life expectancy and larger numbers of elderly people, heavy alcohol drinkers can accelerate the earlier onset. Also, countries far from China have a certain time delay effect, and the onset tends to begin later. Moreover, functioning surveillance and risk communication in health emergency events can help to delay the onset.
5 Discussions
It is important to emphasize that, while medical and biological sciences are on the front lines of beating back COVID19, the true victory relies on advance and coalition of almost every academic field. However, information about COVID19 is limited: there are currently no vaccines or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID19 (on April 13, 2020). Although numerous research works are progressed by different academic field, the information about COVID19 is scattered around different disciplines, which truly requires interdisciplinary research to hold off the spread of the disease.
Proper integration of data from multiple sources is a key to understand the COVID19 disease, and this can be accomplished by borrowing information. The motivation of using the borrowing information is to make use of the indirect evidence (efron2010future) to enhance the predictive performance: for example, to extrapolate the infection trajectory for the US, the information is not only from the US (direct evidence) but also from other countries (indirect evidence) which has been utilized to improve the predictive accuracy of the trajectory for the US. To harness the borrowing information endowed with uncertainty quantification, Bayesian argument is useful, which induces sensible inferences and decisions for the users (lindley1972bayesian).
The results demonstrated the superiority of our approach compared to the existing individual countrybased models. Our research outcomes can be thought even more insightful given that we have not employed information about diseasespecific covariates. That being said, using more detailed information such as social mixing data, precise hospital records, or patientspecific information will further improve the performance of our model. Moreover, integration of epidemiological models with these statistical models will be our future topic of research.
6 Materials and Methods
6.1 Research data
In this research, we analyze global COVID19 data , obtained from countries. (Meanings for the vector notations, and , will be explained shortly later.) These countries are most severely affected by the COVID19 in terms of the confirmed cases on April 9th, and listed on Table 2: each country is contained in the table with format “country name (identifier)”, and this identifier also indicates a severity rank, where a lower value indicates a severer status. The order of the ranks thus coincides with the order of the countries named on the axis of the Figure 2.
Country (index ) 

US (1), Spain (2), Italy (3), France (4), Germany (5), 
China (6), Iran (7), United Kingdom (8), Belgium (9), Switzerland (10), 
Netherlands (11), Canada (12), Brazil (13), Portugal (14), Austria (15), 
South Korea (16), Russia (17), Israel (18), Sweden (19), India (20), 
Ireland (21), Norway (22), Australia (23), Chile (24), Denmark (25), 
Poland (26), Czechia (27), Peru (28), Romania (29), Japan (30), 
Pakistan (31), Malaysia (32), Philippines (33), Indonesia (34), Saudi Arabia (35), 
Luxembourg (36), United Arab Emirates (37), Finland (38), Thailand (39), Qatar (40) 
Greece (41), Singapore (42), Egypt (43), Iceland (44), Iraq (45), 
Estonia (46), Slovenia (47), Kuwait (48), Bahrain (49), Lebanon (50) 

NOTE: Countries are listed with the form “country name (identifier)”. This identifier also represents a severity rank. The rank is measured based on the accumulated number of the confirmed cases on April 9th.
For each country (), let denotes the number of accumulated confirmed cases for COVID19 at the th time point (). Here, the time indices and correspond to the initial and end time points, January 22nd and April 9th, respectively, spanning for (days). The time series data is referred to as an infection trajectory for the country . Infection trajectories for eight countries (US, Spain, Italy, China, UK, Brazil, South Korea, and India) indexed by , , , , , , , and , respectively, are displayed in the Figure 1. We collected the data from the Center for Systems Science and Engineering at the Johns Hopkins University.
For each country , we collected 74 covariates, denoted by (). The predictors can be further grouped by 6 categories: the 1st category: general country and population distribution and statistics; the 2nd category: general health care resources; the 3rd category: tobacco and alcohol use; the 4th category: disease and unhealthy prevalence; the 5th category: testing and immunization statistics; and the 6th category: international health regulations monitoring. The data sources are the World Bank Data (https://data.worldbank.org/), World Health Organization Data (https://apps.who.int/), and National Oceanic and Atmospheric Administration (https://www.noaa.gov/). Detailed explanations for the covariates are described in SI Appendix.
6.2 Bayesian hierarchical Gompertz model
We propose a Bayesian hierarchical model based on the Gompertz curve (1), which is referred to as Bayesian hierarchical Gompertz model (BHGM), to accommodate the COVID19 data . (Although the model is based on the Gompertz curve, the idea can be generalized to any choice for growth curves.) Ultimately, a principal goal of the BHGM is to establish two functionalities:

[Extrapolation] uncover a hidden pattern from the infection trajectory for each country , that is, , through the Gompertz growth curve (1), and then extrapolate the curve.

[Covariates analysis] identify important predictors among the predictors that largely affect on the shape the curve in terms of the three curve parameters.
A hierarchical formulation of the BHGM is given as follows. First, we introduce an additive independently identical Gaussian error to each observation , leading to a likelihood part:
(2) 
where is the Gompertz growth curve (1) which describes a growth pattern of infection trajectory for the th country. Because each of the curve parameters has its own interpretations in characterizing the infection trajectory, we construct three separate linear regressions:
(3) 
where is a dimensional coefficient vector corresponding to the th linear regression. To impose a continuous shrinkage effect (bhadra2019lasso) on each of the coefficient vectors, we adopt to use the horseshoe prior (carvalho2009handling; carvalho2010):
(4) 
Finally, improper priors (gelman2004bayesian)
are used for the intercept terms and error variances terms in the model:
(5) 
See SI Appendix for a posterior computation for the BHGM (2) – (5).
6.3 Technical expressions for three models , , and
Technical expressions for the three models, , , and , compared in Subsection 4.1 are given as follows:
References
Supporting Information Appendix
Appendix Appendix A Tables for covariates
Category  Covariates (index) 

General country and  Total_over_65 (1), Female_per (2), Death_disease (3), 
population distribution  Median_age (10), Birth_rate (11), Death_rate (12), 
and statistics  Life_expect_total_birth (23), Life_expect_total_60 (24), 
Hea_life_expect_total_birth (25), Hea_life_expect_total_60 (26),  
Dis_to_China (66), Popu_density (70), Tempe_avg (71)  
Health care resources  Physician (4), Health_expen (5), Health_expen_real_per_capita (6), 
Health_expen_real_per_capita_ppp (7), Doc_num_per (20),  
Doc_num (21), Hosp_bed (22)  
Tobacco and alcohol use  Alcohol_cons_rec (13), Alcohol_cons_unrec (14), Abstainers_total (15), 
Alcohol_consumers_total (16), Heavy_drinking_total (17),  
Alcohol_death_total (18), Alcohol_disorder_total (19),  
Tobacco_smoke (55), Cigarette_smoke (56)  
Disease and unhealth  Underweight_total (8), Thinness_total (9), Adult_mortality (47), 
prevalence  NCD_Mortality (48), NCD_deaths_un_70 (49), Blood_glucose (50), 
Blood_pressure (51), Cholesterol (52), Insuf_phy_act (53),  
Overweight (54), Air_pollution (57), Air_pollution_death (58),  
Air_pollution_DALYs (59), Uninten_poison (60), Envi_death (61),  
Envi_DALs (62), Tuberculosis_death (63), Tuberculosis_case (64),  
Unsafe_wash (65)  
Testing and immunization  Dtt_dtp_immun (27), HepB3_immun (28), Hib3_immun (29), 
statistics  MCV1_immun (30), MCV2_immun (31), PCV3_immun (32), 
Pol3_immun (33), Testing_num_COVID19 (67),  
Testing_confirm_COVID19 (68), Testing_popu_COVID19 (69)  
International Health  Legislation_and_Financing (34), Coordinate_Focal_Points (35), 
Regulations monitoring  Zoonotic_Events (36), Food_Safety (37), Laboratory (38), 
Surveillance (39), Human_Resources (40), Health_Emergency (41),  
Health_Service_Provision (42), Risk_Communication (43),  
Points_of_Entry (44), Chemical_Events (45),  
Radiation_Emergencies (46) 

NOTE: Covariates are listed with the form “predictor name (index)”. Predictor names are abbreviated.
Covariates (index )  Explanation 

Total_over_65 (1)  Population ages 65 and above (% of total population) in 2018. 
Female_per (2)  The percentage of female in the population in 2018. 
Death_disease (3)  Death by communicable diseases and maternal, prenatal 
and nutrition conditions (% of total) in 2016.  
Median_age (10)  Population median age in 2013. 
Birth_rate (11)  Crude birth rate (per 1000 population) in 2013. 
Death_rate (12)  Crude death rate (per 1000 population) in 2013. 
Life_expect_total_birth (23)  Life expectancy at birth (years) in 2016. 
Life_expect_total_60 (24)  Life expectancy at age 60 (years) in 2016. 
Hea_life_expect_total_birth (25)  Healthy life expectancy at birth (years) in 2016. 
Hea_life_expect_total_60 (26)  Healthy life expectancy at age 60 (years) in 2016. 
Dis_to_China (66)  Calculated by the R function distm based on the average 
longitude and latitude.  
Popu_density (70)  Population density (people per sq.km of land area) in 2018. 
Tempe_avg (71)  The average temperature in February and March in the captain 
of each country (we choose New York for US and Wuhan for  
China, due to the severe outbreak in the two cities). 
Covariates (index )  Explanation 

Physician (4)  The number of physicians (per 1000 people) between 
2015 and 2018.  
Health_expen (5)  General government expenditure on health as a 
percentage of total government expenditure in 2014.  
Health_expen_real_per_capita (6)  Current health expenditure per capita (current US$) 
in 2016.  
Health_expen_real_per_capita_ppp (7)  Current health expenditure per capita, PPP (current 
international $) in 2016.  
Doc_num_per (20)  The number of medical doctors (per 10000 population) 
in 2016.  
Doc_num (21)  The number of medical doctors (number) in 2016. 
Hosp_bed (22)  Average hospital beds (per 10000 population) from 
2013 to 2015. 
Covariates (index )  Explanation 

Alcohol_cons_rec (13)  Recorded alcohol consumption per capita (15+) (in litres of 
pure alcohol), threeyear average between 2015 and 2017.  
Alcohol_cons_unrec (14)  Unrecorded alcohol consumption per capita (15+) (in litres 
of pure alcohol) in 2016.  
Abstainers_total (15)  Alcohol lifetime abstainers (those adults who have never 
consumed alcohol) (% of total) in 2016.  
Alcohol_consumers_total (16)  Alcohol consumers past 12 months (those adults who 
consumed alcohol in the past 12 months) (% of total) in 2016.  
Heavy_drinking_total (17)  Agestandardized estimates of the proportion of adults (15+ 
years) (who have had at least 60 grams or more of pure alcohol  
on at least one occasion in the past 30 days) in 2016.  
Alcohol_death_total (18)  Alcoholattributable death (% of allcause deaths in 
total) in 2016.  
Alcohol_disorder_total (19)  Number of adults (15+ years) with a diagnosis of F10.1, 
F10.2 (alcohol disorder) during a calendar year (% of total  
15+) in 2016.  
Tobacco_smoke (55)  Agestandardized rates of prevalence estimates for daily 
smoking of any tobacco in adults (15+ years) in 2013.  
Cigarette_smoke (56)  Agestandardized rates of prevalence estimates for daily 
smoking of any cigarette in adults (15+ years) in 2013. 
Covariates (index )  Explanation 

Underweight_total (8)  Crude estimate of percent of adults with underweight 
(BMI 18.5) in 2016.  
Thinness_total (9)  Crude estimate of percent of children and adolescents with thinness 
(BMI 2 standard deviations below the median) in 2016. 

Adult_mortality (47)  Adult mortality rate (probability of dying between 15 and 
60 years per 1000 population) in 2016.  
NCD_Mortality (48)  Agestandardized noncommunicable diseases mortality rate 
(per 100000 population) in 2016.  
NCD_deaths_un_70 (49)  Noncommunicable disease deaths under age 70 (% of all 
noncommunicable diseases deaths) in 2016.  
Blood_glucose (50)  Agestandardized percent of 18+ population with raised fasting 
blood glucose (7.0 mmol/L or on medication) in 2014.  
Blood_pressure (51)  Percent of 18+ population with raised blood pressure (systolic blood 
pressure 140 or diastolic blood pressure 90) in 2015.  
Cholesterol (52)  Percentage of 25+ population with total cholesterol 240 mg/dl 
(6.2 mmol/l) in 2008.  
Insuf_phy_act (53)  Agestandardized prevalence of insufficient physical activity 
(% of adults aged 18+) in 2016.  
Overweight (54)  Agestandardized prevalence of overweight among adults 
(BMI 25) (% of adults aged 18+) in 2016.  
Air_pollution (57)  Concentrations of fine particulate matter (PM2.5) in 2016. 
Air_pollution_death (58)  Agestandardized ambient air pollution attributable death rate 
(per 100000 population) in 2016.  
Air_pollution_DALYs (59)  Agestandardized ambient air pollution attributable Disability 
adjusted life year (DALYs) (per 100000 population) in 2016.  
Uninten_poison (60)  Mortality rate attributed to unintentional poisoning 
(per 100000 population) in 2016.  
Envi_death (61)  Agestandardized deaths attributable to the environment 
(per 100000 population) in 2012.  
Envi_DALs (62)  Agestandardized Disabilityadjusted life year (DALYs) attributable 
to the environment (per 100000 population) in 2012.  
Tuberculosis_death (63)  The number of deaths due to tuberculosis among HIVnegative 
people (per 100000 population) in 2018.  
Tuberculosis_case (64)  Incidence of tuberculosis (per 100000 population per year) in 2018. 
Unsafe_wash (65)  Mortality rate attributed to exposure to unsafe wash services 
(per 100000 population) (SDG 3.9.2) in 2016. 
Covariates (index )  Explanation 

Diphtheria tetanus toxoid and pertussis  Diphtheria tetanus toxoid and pertussis thirddose 
thirddose immunization (27)  (DTP3) immunization coverage (% of total 
1yearolds) in 2018.  
Hepatitis B thirddose  Hepatitis B thirddose (HepB3) immunization coverage 
immunization (28)  (% of total 1yearolds) in 2018. 
Haemophilus influenzae type B  Haemophilus influenzae type B thirddose (Hib3) 
thirddose immunization (29)  immunization coverage (% of total 1yearolds) in 2018. 
Measlescontainingvaccine  Measlescontainingvaccine firstdose (MCV1) 
firstdose immunization (30)  immunization coverage (% of total 1yearolds) 
in 2018.  
Measlescontainingvaccine  Measlescontainingvaccine seconddose (MCV2) 
seconddose immunization (31)  immunization coverage (% of total nationally 
recommended age) in 2018.  
Pneumococcal conjugate vaccines  Pneumococcal conjugate vaccines thirddose (PCV3) 
thirddose immunization (32)  immunization coverage (% of total 1yearolds) in 2018. 
Polio thirddose immunization (33)  Polio (Pol3) thirddose immunization coverage 
(% of total 1yearolds) in 2018.  
Testing_num_COVID19 (67)  The number of COVID19 testing cases 
(ourworldindata.org/ collect the data and the data dates  
are between Febrary and March on several media).  
Testing_confirm_COVID19 (68)  The total number of confirmed cases divided 
by the covariate Testing_num_COVID19 (67) on  
the same day with Testing_num_COVID19.  
Testing_popu_COVID19 (69)  The covariate Testing_num_COVID19 (67) divided 
by covariate Total_popu (2). 
Covariates (index )  Explanation 

Legislation_and_Financing (34)  Scores that show whether legislation, laws, regulations, 
administrative requirements, policies or other government  
instruments in place are sufficient for implementation  
of IHR in 2018.  
Coordinate_Focal_Points (35)  Scores that show whether a functional mechanism is 
established for the coordination of relevant sectors in  
the implementation of IHR, etc., in 2018.  
Zoonotic_Events (36)  Scores that show whether mechanisms for detecting 
and responding to zoonoses and potential zoonoses are  
established and functional in 2018.  
Food_Safety (37)  Scores that show whether mechanisms are established 
and functioning for detecting and responding to  
foodborne disease and food contamination in 2018.  
Laboratory (38)  Scores that show the availability of laboratory 
diagnostic and confirmation services to test for priority  
health threats in 2018.  
Surveillance (39)  Scores that show surveillance including an early 
warning function for the early detection of a public  
health event and established and functioning  
eventbased Surveillance in 2018.  
Human_Resources (40)  Scores that show the availability of human resources 
to implement IHR Core Capacity.  
Health_Emergency (41)  Scores that show the ability of effective response 
at health emergencies in 2018.  
Health_Service_Provision (42)  Scores that show an immediate output of the inputs 
into the health system, such as the health workforce,  
procurement and supplies, and financing in 2018. 
Covariates (index )  Explanation 

Risk_Communication (43)  Scores that show mechanisms for effective risk 
communication during a public health emergency  
are established and functioning in 2018.  
Points_of_Entry (44)  Scores that show whether general obligations 
at point of entry are fulfilled (including for  
coordination and communication) to prevent the  
spread of diseases through international traffic in 2018.  
Chemical_Events (45)  Scores that show whether mechanisms are established 
and functioning for detection, alert and response to  
chemical emergencies that may constitute a public  
health event of international concern in 2018.  
Radiation_Emergencies (46)  Scores that show whether mechanisms are established 
and functioning for detecting and responding to  
radiological and nuclear emergencies that may constitute  
a public health event of international concern in 2018. 

NOTE 1: The International health regulations, or IHR (2005), represent an agreement between 196 countries including all WHO Member States to work together for global health security. Through IHR, countries have agreed to build their capacities to detect, assess and report public health events. WHO plays the coordinating role in IHR and, together with its partners, helps countries to build capacities. (https://www.who.int/ihr/about/)

NOTE 2: IHR monitoring framework was developed, which represents a consensus among technical experts from WHO Member States, technical institutions, partners and WHO. (https://www.who.int/ihr/procedures/)
Appendix Appendix B Posterior computation
We illustrate a full description of a posterior computation for the BHGM (2) – (5) by using a Markov chain Monte Carlo (MCMC) simulation (robert2013monte). To start with, we reexpress the linear regression (3) in a vector form representation
where () is dimensional vector for the latent responses, () is dimensional vector for the coefficients, and X is by design matrix whose th row vector is given by the predictors , . The notation
stands for an identity matrix. Each of column vectors of the design matrix
X should be standardized: that is, each column vector has been centered, and then columnwisely scaled to have the unit Euclidean norm.Under the formulation of BHGM (2) – (5), our goal is to sample from the full joint posterior distribution where (), and a proportional part of this joint density is
where the matrix is by diagonal matrix (). To sample from the full joint density, we use a Gibbs sampler (casella1992explaining) to exploit conditional independences among the latent variables induced by the hierarchy. The following algorithm describes a straightforward Gibbs sampler

Step 1. Sample from its full conditional distribution
where . Here, the vector r is a dimensional vector which is given by such that the dimensional vector () is obtained by

Step 2. Sample and , , independently from their full conditional distributions. Proportional parts of the distributions are given by
respectively, where dimensional vector () is obtained by
Here, indicates the norm. Note that the two conditional densities are not known in closed forms because two parameters, and , participate to the function in nonlinear way. We use the Metropolis algorithm (andrieu2003introduction) with Gaussian proposal densities within this Gibbs sampler algorithm.

Step 3. Sample from its full conditional distribution

Step 4. Sample , , independently from their full conditional distributions

Step 5. Sample , , independently from conditionally independent posteriors
where , , and .

Step 6. Sample , , independently from conditionally independent posteriors
Note that the densities are not expressed in closed forms: we use the slice sampler (neal2003slice).

Step 7. Sample , , independently from conditionally independent posteriors
Note that the densities are not expressed in closed forms: we use the slice sampler (neal2003slice).

Step 8. Sample , , independently from their full conditionally distributions
Appendix Appendix C Infection trajectories for top 20 countries
The file includes extrapolated infection trajectories for top 20 countries that are most severely affected by the COVID19. The panels in the files display extrapolated posterior mean (red curve) for the Gompertz curve along with pointwise 95% credible intervals (pink region).
Comments
There are no comments yet.