Data-driven Analytical Models of COVID-2019 for Epidemic Prediction, Clinical Diagnosis, Policy Effectiveness and Contact Tracing: A Survey

by   Ying Mao, et al.
Fordham University
Columbia University

The widely spread CoronaVirus Disease (COVID)-19 is one of the worst infectious disease outbreaks in history and has become an emergency of primary international concern. As the pandemic evolves, academic communities have been actively involved in various capacities, including accurate epidemic estimation, fast clinical diagnosis, policy effectiveness evaluation and development of contract tracing technologies. There are more than 23,000 academic papers on the COVID-19 outbreak, and this number is doubling every 20 days while the pandemic is still on-going [1]. The literature, however, at its early stage, lacks a comprehensive survey from a data analytics perspective. In this paper, we review the latest models for analyzing COVID19 related data, conduct post-publication model evaluations and cross-model comparisons, and collect data sources from different projects.



page 6


Data-driven Analytics of COVID-2019 for Epidemic Prediction, Clinical Diagnosis, Policy Effectiveness and Contact Tracing: A Survey

The widely spread CoronaVirus Disease (COVID)-19 is one of the worst inf...

An Overview of Ontologies and Tool Support for COVID-19 Analytics

The outbreak of the SARS-CoV-2 pandemic of the new COVID-19 disease (COV...

A Non-Markovian Model to Assess Contact Tracing for the Containment of COVID-19

COVID-19 remains a challenging global threat with ongoing waves of infec...

FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Modern epidemiological analyses to understand and combat the spread of d...

From predictions to prescriptions: A data-driven response to COVID-19

The COVID-19 pandemic has created unprecedented challenges worldwide. St...

COVID-19 Real-Time Tracker and Analytical Report

While the COVID-19 outbreak was reported to first originate from Wuhan, ...

Estimation of mask effectiveness perception for small domains using multiple data sources

All pandemics are local; so learning about the impacts of pandemics on p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With over 7,800,000 cases and 430,000 deaths globally [2], CoronaVirus Disease (COVID)-19, the disease caused by Severe Acute Respiratory Syndrome CoronaVirus (SARS-CoV)-2, is one of worst infectious disease outbreaks in history and has become an emergency of primary international concern. In mid-December 2019, the first COVID-19 case was detected in Wuhan, China, where it rapidly spread across the country and caused a pneumonia epidemic in early January 2020. The virus currently has spread to 140 other countries, including Japan, Italy, Brazil, and the USA, after infecting and causing the death of thousands of patients in China, with the number of confirmed new cases and deaths increasing every day [3, 4, 5].

Hospitals and healthcare systems worldwide are under high stress and have already stepped up in unprecedented ways to face the challenges of COVID-19. For example, the first confirmed case of COVID-19 in the United States was reported in Snohomish County, Washington State [6]. The genomic and epidemiological analyses of sequenced virus RNA recovered that in February 2020, community transmission of COVID-19 was detected in the western Washington region. Confirmed cases in the U.S. increased to 1,000 by March 11, to 100K by March 27, over 1 Million on April 28, and reaching 2 Million at the end of May [7]. In order to save lives and minimize the virus spread, hospitals have accelerated testing efforts and are treating hundreds of thousands of people worldwide.

The virus has influenced people’s daily life. To mitigate the spread of the disease, Wuhan in China Hubei province was placed under a strict lockdown on January 23 and reopened gradually after more than ten weeks. In the United States, California Gov. Gavin Newsom issued a stay-at-home order on March 19, and every state in the USA had restrictions in place by early April. The virus has also effectively grounded global economies to a halt. The U.S. unemployment rate had shot up from 3.8% in February to 13.3% in May [8], and the COVID-19 recession is predicted to be comparable to the Great Depression of the 1930s, where the unemployment rate was estimated to reach 25% [9].

To combat this ongoing crisis, many efforts have been made in developing accurate epidemic predictions, fast diagnosis solutions, effective policy implementations and efficient tracing systems. These projects, ranging from using different kinds of clinical data (chest CT image, X-Ray, laboratory findings, etc.) to generate fast screening methods, risk profiling, patient surveillance and tracking, and genetic network analysis, provide a snapshot of pandemic origins and social analytics. Many institutions have already developed COVID-19 tracking projects and generated data dashboards to help policymakers and the public understand the trajectory of the pandemic, compare each country or state’s interventions and testing levels with case counts and death overtime, and make decisions for the path forward.

While various datasets and analyses have been published by hospitals, institutions, governments and organizations globally, it lacks a comprehensive literature review and data collection from the analytical perspective to address the fragmented data at the early stage of COVID-19 related researches. For example, a prediction model published in early March showed accurate estimation of infection numbers might not work well due to the fast evolution of the virus and government responses. This manuscript demonstrates the latest datasets, prediction models, evaluations of policies, and tracing technologies for combating the challenges caused by COVID-19. The main contributions of this paper are summarized below.

  • We provide an overview of data-driven COVID-19 studies from the perspectives of epidemic prediction, clinical diagnosis, policy effectiveness, and contact tracing.

  • We conduct model studies with the latest data to evaluate how good they perform since their publication date.

  • We collect the data sources and timeline of the key policies and combine them with multiple data sources to estimate the effectiveness.

The rest of this paper is organized as follows. In Section II, we review the state-of-the-art models for epidemic prediction. In Section III, we report the analytical studies about clinical characteristics and diagnosis. We present the policy effectiveness researches in Section IV. The latest technologies for COVID-19 related contact tracing are reviewed in Section V. Final, Section VI concludes this.

Ii Epidemic Prediction

The tremendous increase in the number of infected patients with COVID-19 has drained the healthcare systems globally. Based on the New York Times Data Set [10], Fig. 1 illustrates how rapidly the virus spreads in the counties of New York State (NYS), which is the epicenter of the coronavirus in the United States. The heatmap figure plots the number of infections per each county in NYS. On March 1st, there was only one confirmed COVID-19 case in NY state; however, the number increased to 67462, 286901, 382879 on March 30, April 25 and June 7. An urgent need exists to accurately predict the epidemic. Many efforts have been made to estimate the scale and time course of epidemics, evaluating the effectiveness of public health interventions, and informing public health policies.

(a) March 1st, 2020
(b) March 30th, 2020
(c) April 25th, 2020
(d) June 7th, 2020
Fig. 1: Per-County Infection Map of New York State

Ii-a Prediction Models

Ii-A1 Exponential Model

Without effective responses (e.g. the early stages of a pandemic), the number of infected patients will grow exponentially over the time. Given the initial time series data of diagnosed infections, we can get,


,where is the number of diagnosed infections over the time and

is the growth rate, which can be obtained though observed data at the moment when executing the model.

The authors in [11, 12] studied the exponential model, however, in practice, the prediction fails to deliver reliable numbers due to active responses from the government.

Ii-A2 Logistic Model

Unlike the exponential model that only works for the uncontrolled prevalence, the logistic growth model is approximately exponential at first, but the growth rate reduces as it approaches the model’s upper bound, called the carrying capacity. In the logistic model, the growth is given by [13],


, where is the cumulative number of confirmed cases, is the predicted maximum number of confirmed cases (carrying capacity of the population), and are fitting coefficients which can be obtained by using the existing data set, is the time when the first infection is observed and is the number of days since the first case.

Similar logistic growth and regression based models were developed to predict trends of the pandemic [14, 15, 16, 17, 18, 19]. For example, authors [19] proposed a segment Poisson model that coupled a power law with an exponential law to estimate outbreaks. However, according to the latest evolution of COVID-19 worldwide, the model consistently under predicts the final epidemic size.

Ii-A3 SIR Model

The Susceptible-Infectious-Recovered (SIR) is a compartmental model that describes the transmission of an infectious disease through individuals who pass through the following five states: susceptible, infectious, and recovered. Their distributions can be given as follows [12],


, where is the transmission rate, is the recover rate recovery, and is a constant The basic reproduction number in SIR model is,


As a popular base model for predicting COVID-19, SIR has many variations in the literature. For example, a modified Susceptible-Exposed-Infectious-Removed (SEIR) epidemiological model was proposed in [20], which introduced move-in, In(t) and move-out, Out(t) parameters to respect the mass population in Wuhan during the Chinese New Year. Additionally, a Stochastic SIR model (SSIR) was proposed in [21] that takes the randomness into the prediction.

Ii-A4 MetaWards

The author in [22] adapted an existing stochastic metapopulation model of disease transmission to predict the likely timing of the peak of the COVID-19 epidemic in England and Wales. The population was divided into electoral wards in this model, and the author assumed that the individuals would contribute to the force of infection in their ”home” ward during the night and their “work” ward during the day. To estimate potential decreased transmission rate during the summer months, the author replaced the constant transmission rate with a time-varying transmission rate. The equation of the transmission rate is,


, where represents the magnitude of the seasonal difference in transmission and ranges from 0 (no seasonality) to 1 (maximum seasonality with no transmission at the peak of the summer).

However, besides the seasonal factors, the MetaWards model fails to consider COVID-19 interventions, which results in significantly overestimated trends of infections.

Ii-A5 Sidarthe

A more comprehensive model was proposed in [23]. The model SIDARTHE considers multiple stages of the infection such that S, susceptible (uninfected); I, infected (asymptomatic or paucisymptomatic infected, undetected); D, diagnosed (asymptomatic infected, detected); A, ailing (symptomatic infected, undetected); R, recognized (symptomatic infected, detected); T, threatened (infected with life-threatening symptoms, detected); H, healed (recovered); E, extinct (dead). Fig. 2 illustrates the stage transitions of SIDARTHE.

Fig. 2: SIDARTHE Model (Fig .1 in  [23])

Specifically, it consists of eight ordinary differential equations, modeling the evolution of the population in each stage over time.


, where the state variables (upper Latin letters) are the population fraction of each stage and considered parameters (Greek letters) are positive numbers. The and are the transmission rate of contact between S and I, D, A and R.

are the detection probabilities of asymptomatic and symptomatic cases, respectively.

and represent the probability rate at which an infected subject, respectively not aware and aware of being infected. and denote the probabilities of undetected and detected infected subjects that develop serious symptoms. is the mortality rate. and are the recovery rates for the patients in five classes (S, I, D, A and R).

(a) Predicted epidemic evolution
(b) The effect of lockdown
(c) The effect of testing
Fig. 3: Followup of the SIDARTHE model with 70 days post-publication data

Ii-B Post Publication Model Evaluation

The SIDARTHE model utilizes much more parameters than the previously mentioned models. Based on the code provided by authors, Fig. 3 plots the midterm evolution of the pandemic in Italy, which contains three additional curves with the latest data, Real Diagnosed Cumulative Cases (black), Real Diagnosed Total Infected (pink) and Real Diagnosed Total Recovered (light blue). Please note that the model uses data from 0 to 45 days to obtain the best parameter set. The solid lines (except the additional three) are estimates of the actual pandemic, and the dotted lines are the estimates of the diagnosed pandemic.

As we can see from Fig. 2(a), the predicted values of diagnosed cumulative cases are always lower than the real values from 45 to 116. However, the difference between the two curves is reducing over time. It is likely that the restrictions were taking effect gradually. Fig. 2(b) plots the prediction of a stricter lockdown enforcement. The curve flattens quicker than the real case (black curve). Fig. 2(c) assumes a milder lockdown with widespread testing and contact tracing. It fits well with the real curve with a stable gap, which reflects the shortage of tests and contact tracing. Additionally, the real diagnosed total number of recovered cases are more than expected. This is due to the fact that more resources, such as ventilators, are available to the patients.

Almost all reviewed models fit the curve very well before their publication date; however, the performance of their estimations is yet to be discovered. In Table I, we present the post -publication statistics of 13 models and 17 countries published from Feb. 28 to Jun. 6. For the papers, Table I collects the Predicted Peak Date (P. Peak Date), Predicted Peak Value (P. Peak Value, largest number daily new cases) and Predict Size (P. Size, e.g. the predicted final size of the pandemic). From the database of World Health Organization [24], we queried the data (on Jun. 18, 2020) of Real Peak Date (R. Peak Date, the date of largest daily new cases), Real Peak Value (R. Peak Value, the largest number of daily new cases) and Current Size (the current total infections). We can see from the table that models [13, 20], which predict China will produce reasonably good results on the predicted peak date and size. This is because, when they published in late February and mid March, the curve of infections in China had already been flattened and was reduced, which means China was on track towards the end of the pandemic and thus, the models have enough data for training and fitting the curve. Except for China, COVID-19 was still quickly spreading in many countries in late March and early April. For example, on April 8, the model in [14] predicts that the final pandemic size would be 71950, 36240, 10420, 85750, 85750, 41850, 61420, 1560, for Italy, Iran, South Korea, Germany, France, USA, Spain and Japan, respectively. With the latest data on Jun. 18, however, most of these numbers are significantly underestimated, 230%, 438%, 17.6%, 118%, 313%, 4900%, 297%, 1000% for these counties. The most accurate prediction was for South Korea since, at the time of its publication, the trend in South Korea was clear enough for the model.

Model Publish Date Region P. Peak Date P. Peak Value P. Size R. Peak Date R. Peak Value Current Size
SEIR [20]
LSTM [20]
02/28/20 China
02/13 15152 84867
Logistic Model [13] 03/16/20 China 02/06/20 8000 80261 02/13 15152 84867
SIR [25] 03/16/20 China 02/27/20 N/A 120000 02/13 15152 84867
SIDARTHE [23] 03/22/20 Italy 03/15 N/A 181080 03/21 6557 237500
SIR [12] 04/06/20 India 04/12/20 1500 13000 06/14 11929 354065
Adjusted SEIR [14] 04/08/20
South Korea
Segmented Poisson [19]
Gaussian[26] 04/27/20
Modified SEIR [27] 04/29/20
SEIRQRP [28] 04/29/20 USA 05/18 N/A 820000 04/26 38509 2098106
Nonlinear LR model [29] 05/13/20
SIR [30] 06/06/20 Algeria 4/13 106 244400 04/02 263 11147
TABLE I: Epidemic Prediction Post Publication Evaluation (as of June 18 2020)

Iii Clinical Characteristics and Diagnosis

The purpose of this retrospective cohort study is to seek a faster and more reliable diagnosis method of COVID-19 and acquire more accurate conclusions concerning the clinical characteristics and mortality risk factors for patients with confirmed COVID-19 infection. In this section, we review the literature from traditional meta analysis and artificial intelligence aided analysis.

Iii-a Meta Analysis

The authors in [31, 32] screened medical databases from PubMed [33], Cochrane Library [34], Embase databases [35], Scopus [36], and Google scholar [37]. They collected the relevant literature dated up to February 24  [31] and May 1, 2020 [32], and then proposed a meta-analysis of a quantitative, formal procedure that aggregated, integrated, and reanalyzed the results of several independent studies.

As a subset of the systematic review, meta-analyses attempt to collate empirical evidence fitting previously specified criteria to provide a more precise estimate of the effect of treatment or the risk factors concerning a disease [38]. The PRISMA (Preferred Reporting Items for Systematic reviews and Meta-analyses), which contains a 27-item checklist and a four-phase flow diagram, is a guide to improve the reporting of systematic reviews and meta-analyses [39]

. The forest plot is commonly used to present the results of meta-analyses, where each study is shown with its effect size and corresponding 95% confidence interval. The risk ratio (or relative risk) and the odds ratio are the two most common measures of effect used for dichotomous data in meta-analysis, while the standardized mean difference (SMD) estimation is the dominant method used for continuous data. The random-effects model in meta-analysis assumes the true treatment effect differs across studies and could generate an estimate of the average treatment effect 

[40]. The greatest benefit of meta-analysis is the ability to examine the degree of heterogeneity among studies. A statistical test such as Cochran’s test or the Q-test is used to indicate the extent of heterogeneity. The author in [41] developed measures for the impact of heterogeneity and proposed three suitable statistics:


, where Q is heterogeneity statistic, and

is its degrees of freedom;


, which represents the ratio of the standard error of the underlying mean from a random effects meta-analysis to the standard error of a fixed effect meta-analytic estimate;


, the inconsistency index that describes the proportion of total variation in study estimates due to heterogeneity.

The Newcastle-Ottawa Scale (NOS) [42] was used to evaluate all literature, with the highest quality of literature scoring nine stars. Articles with the NOS score of higher than five stars were considered high-quality publications in the study. The random-effects model for meta-analysis was used to reduce the influence of heterogeneity between the included studies in the final conclusion [31].

A total of 284 articles were retrieved, where 39 papers were eliminated due to repeated retrieval, 212 papers after reading abstracts, and 23 after reading the full text, a total of 10 articles of literature [43, 44, 45, 46, 47, 48, 49, 50, 51, 52], including data from 50,466 patients were analyzed in author  [31]

’s research. Original data were transformed by the double arcsine method to make them conform to the normal distribution, and the initial conclusion was then restored via the formula

to reach the final conclusion. The Egger test with was performed in response to publication bias, where the values larger than were considered as demonstrating no publication bias. The statistical software Stata version 12.0 was used to carry out the single-arm meta-analysis, and the results were presented in Table 1 (with Egger test results, which indicates there existed a publication bias in the meta-analysis of ARDS (Acute Respiratory Distress Syndrome) group ()).

Symptom Meta-analysis Adjusted results P
95% CI: 2.26 - 2.67
95% CI: 0.818 - 0.945
95% CI: 1.89 - 2.17
95% CI: 0.657 - 0.782
Muscle soreness
or fatigue
95% CI: 0.96 - 1.88
95% CI: 0.213 - 0.652
95% CI: 0.43 - 1.15
95% CI: 0.046 - 0.296
chest CT
95% CI: 2.57 - 2.97
95% CI: 0.921 - 0.993
Patient in
critical condition
95% CI: 0.73 - 1.03
95% CI:0.127 - 0.243
Death of patient
95% CI: 0.33 - 0.50
95% CI: 0.027 - 0.061
TABLE II: Clinical Characteristics for Patients with Confirmed COVID-19 via Meta-analysis [31]

The three most common symptoms among people who were hospitalized with confirmed COVID-19 infection are fever (89.1%), cough (72.2%), and muscle or general fatigue (42.5%). Diarrhea, hemoptysis, headache, sore throat, shock, and other symptoms are rare. 14.8% of patients had ARDS; 18.1% of all infected cases were defined as severe cases, and the mortality rate was 4.3%. Chest CT scans were generally performed at the time of admission, and almost all patients (96.6%) revealed abnormal results. The representative radiology findings in COVID-19 patients are shown in Fig. 4 and Figure 2 5. The common pattern on chest CT scans for patients with COVID-19 infection were ground-glass opacity and bilateral patchy shadowing Fig. 4, and the bilateral multiple lobular and subsegmental areas of consolidation were found on the typical chest CT images of severe cases [43].

Fig. 4: Representative chest radiographic manifestations in a non-severe and a severe case with COVID-19 [46]
Fig. 5: Representative Chest Radiographs and CT Images of a Critically Ill COVID-19 patient in Seattle Region USA [53]

The authors in [32] systematically reviewed the present evidences towards the association between age, gender, hypertension, diabetes, chronic obstructive pulmonary disease (COPD), cardiovascular disease (CVD), and risk of death due to COVID-19 infection. They summarized the available findings by meta-analysis. The classic Cochran’s Q test [54] was performed to examine the heterogeneity across studies, where was considered to demonstrate such heterogeneity. The formal test of Egger was used, and all statistical analyses were conducted by Stata, version 14.0.

A total of 14 studies (twelve conducted in China [46, 55, 56, 57, 58, 59, 44, 60, 61, 62, 63, 64], one in Italy [65], and one in Iran [66]) with 29,909 COVID-19 infected patients and 1,445 cases of death were included in the research [32], and the meta-analysis results are presented in Table III. The results of Egger’s test demonstrated that the hypothesize on the association of demographic characteristics and comorbidities with COVID-19 mortality did not depend on a single study. The authors [32] findings supported the hypothesis that patients who were of ages older than 65 years, male, with coexisting disorders including hypertension, CVDs, diabetes, COPD, and cancer were associated with higher risk of mortality from COVID-19 infection.

Odds Ratios
65 v.s. 65
95% CI: 2.61 - 8.04
67.10 0.01 0.185
Male v.s. Female
95% CIs: 1.06 - 2.12
76.30 0.002 0.388
Yes v.s. No
95% CIs: 1.40 – 5.24
92.6 0.001 0.065
diseases, Yes v.s. No
95% CIs: 1.77 – 7.83
89.10 0.001 0.068
Yes v.s. No
95% CIs: 1.05 – 5.51
93.60 0.001 0.117
Chronic obstructive
pulmonary disease
(COPD), Yes v.s. No
95% CI: 1.79 – 6.96
72.20 0.001 0.178
Yes v.s. No
95% CIs: 1.80 – 5.14
41.60 0.114 0.054
TABLE III: Mortality Risk Factors for patients with confirmed COVID-19 via Meta-analysis [32]

Iii-B Artificial Intelligence Aided Analysis

A confirmed case of COVID-19 infection is routinely defined as a positive result on high-throughput sequencing or RT-PCR assay of nasal and pharyngeal swab specimens [46]. However, the RT-PCR test has three limitations:

  • The process is very slow and can take up to two days to complete.

  • The serial testing may be required to eliminate the possibility of false negative results.

  • In some areas, there exists a shortage of RT-PCR test kits

Those challenges underscore the urgent need for alternative methods of rapid and accurate diagnosis of patients with COVID-19.

Based on initial chest CT scans and associated clinical information (including epidemiological history, leukocyte counts, symptomatology, patient age and sex), the authors [67]

designed a deep learning based model to identify COVID-19 infection that could rapidly identify COVID-19 positive patients in the early stages. A deep convolutional neural network (CNN) was first developed to learn the imaging characteristics of COVID-19 patients on the initial CT scan. The support vector machine (SVM), random forest model, and multilayer perceptron (MLP) classifiers were then used to classify COVID-19 patients based on clinical information, while MLP showed the best performance on the tuning set and only MLP performance would be reported hereafter. Finally, a neural network model combined with radiology data and clinical information was generated to predict COVID-19 infection status. The generated models were evaluated on the test set, and their performance was compared to one fellowship-trained thoracic radiologist and one thoracic radiology fellow (Table 

IV). Two-sided P values were calculated by comparing the sensitivity, specificity, and area under the curve (AUC) between each of the two models. The CIs of AUC were calculated with DeLong methods [68] for evaluation. The sensitivity and specificity comparisons were calculated via the exact Clopper-Pearson method [69] to compute the 95% CI shown in parentheses and exact McNemar’s test  [70] for P value.

normal CT
AUC, P Sensitivity, P Specificity, P
0 / 25
(80.0, 88.4),
(66.4, 81.7),
P 0.0501
(88.5, 97.1),
P 0.005
0 / 25
(68.3, 78.0),
(47.1, 64.5),
P 0.0004
(84.3, 94.6),
P 0.090
CNN Model 13 / 25
(0.821, 90.7),
P 0.00146
(76.2, 89.4),
P 1.00
(68.1, 82.6),
MLP Model 16 / 25
(0.746, 84.9),
P 0.0004
(72.9, 86.9),
P 0.442
(60.0, 75.8),
P 0.0004
Joint Model 17 / 25
(88.7 94.8),
(0.771, 90.0),
(0.756, 88.5),
TABLE IV: Models in [67]

The proposed joint AI algorithm [67] combined with both clinical data and CT imaging performed well in sensitivity (84.3%) and specificity (82.8%), and achieved an AUC of 0.92. It can be hypothesized that AI systems will help to rapidly diagnose COVID-19 infected patients when chest CT scans and associated clinical history are available, and therefore help in training the health system and combating the COVID-19 pandemic.

Based on the images produced by X-rays and CT scans, researchers attempted to design COVID-19 specific deep neural networks to increase the accuracy of the diagnosis [71, 72, 73, 74]. Due to very limited data sets, authors of [74]

used transfer learning to train the deep CNNs. Firstly, they applied transfer learning on different CNNs models, such as VGG-19 

[75], MobileNets V2 [76], Inception V4 [77] and Xception [78]. Then, the best two models on accuracy, MobileNet v2, and VGG-19 were selected for COVID-19 classification, which involves 224 images with positive Covid-19, 700 images with confirmed common bacterial pneumonia, and 504 images without diseases.

MobileNet v2
TABLE V: Models in [74]

Furthermore, COVID-Net [71] makes predictions using a design to fully understand the critical factors associated with positive cases, which helps clinicians to improve screening and in the meantime, audit COVID-Net in a responsible and transparent manner to ensure that only relevant information from the CXR images is leveraged in the decision making.

While many efforts have been made to utilize artificial intelligence assisted analysis in combating COVID-19, the biggest challenge in the field is the shortage of data sets. We summarize the existing data sets that are publicly available below.

  • COVIDx [79]: It is a combined data set from five different sources that contain Chest radiography images of 7966 normal, 5451 Pneumonia, 258 COVID-19 patients.

  • Italian radiological cases [80]: It contains 115 COVID-19 patients with detailed symptomatography and images at different stages.

  • BIMCV-COVID19+ [81]: It is a large data set with chest X-ray images and CT imaging of COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction, immunoglobulin G and immunoglobulin M (IgM) diagnostic antibody tests, and radiographic reports. Currently, it includes 1380 CX, 885 DX, and 163 CT studies.

Iv Policy Responses and Effectiveness

As the COVID-19 pandemic continues to spread, governments and international organizations are implementing various policies, which aim to deliver systematic, effective, and coordinated responses to flatten the curve, save lives and restart the economy. The following items summarize basic policy responses that were widely implemented by the governments globally.

  • Social distancing: Keep space (e.g. 6 feet) between yourself and other people outside of your home. It means the reduced capacity for indoor businesses and activities, such as restaurants and schools.

  • School closures: In most schools, it is impossible to maintain social distancing in the classroom. Most of the schools were closed in response to the pandemic and transitioned to online lecturing.

  • Travel restrictions: Stop non-essential travels, travel bans on specific counties or regions, border closures, e.g. US-Canada and US-Mexico closed on March 18.

  • Face covering and mask requirement: Cloth face coverings are required when not working alone and when interacting with the public, masks should be worn.

  • Stay in the home: Except essential workers, all should remain at home and away from other people unless it is absolutely necessary to go out (e.g. grocery shopping and doctor visit). Note California has a similar policy named shelter in place.

  • Phased reopening: Based on the government evaluations, reopening the economy following a phased structure such that each phase remains around two weeks for further evaluation.

Fig. 6 plots the timeline of key police interventions from government of New York (NY) State, California (CA) State, Italy, Sweden and United States Federal along with the commonly used analytical data sources from Google Coronavirus Search Trends [82], Daily Infection Curves [2] and Google Community Mobility Report [83]. The Google Search Interest demonstrates the degree of the propaganda that each region involved, where the values of interests stay at high level in Italy starting from early March, but in NY and CA the interest started jittering in mid-March. When various policies, such as different levels of stay-at-home order, implemented in these regions, the community mobility decreased quickly in NY, CA and Italy for workplaces, retail/recreation, which were not recommended or prohibited under the order. The degrees of decreases can reflect the level of restrictions, for example, in Sweden, there were roughly 35% reduce in workplaces, however, the value in Italy was 75%. This is because Sweden implemented a partial stay-at-home policy, which only recommend vulnerable people (e.g. seniors) to stay at home.

Fig. 6: Comparison of Google Trends, Community Traffic and Daily Infections of New York, California, Italy and Sweden along with the timeline of the policies

Iv-a Social Distancing Policies

The mobility data is used to gauge the effectiveness of social distancing and stay at home orders, as well as how well they were followed. The authors in [84] ranked different intervention policies based on their effectiveness, by using a difference-in-differences methodology, location-based mobility, and daily state-level data COVID-19 tests and confirmed cases. The mobility data collected by the Google Community Mobility Reports [83] was split into county-level data and state-level data. Since most of the intervention policies were implemented state-wide, the authors [84] performed an analysis based on the state-level data and had already collected movements for 50 states and the District of Columbia for 29 days with 1479 observations. The web-scraping daily temperature data was captured for the top 5 biggest cities in each state from Weather Underground [85] (commercial weather service with real-time weather information). The data for the daily state-level numbers of tests and positive cases were collected from the COVID Tracking Project website from March 9 to April 20, 2020, and the author [84] had data on all 50 states and the District of Columbia for 43 days, providing 2193 observations.

The linear regression model and a difference-in-differences methodology were used to evaluate the effect of the COVID-19 policies by authors 


. A binary variable was defined for each policy, set to one if a given state adopts that policy after a certain day during the sample period, and otherwise to zero. The regression equation is:


, where is the changes in visiting various places; X represents the matrix for COVID-19 policies, indicates state-level mean daily temperature, and are sets of state and day-of the month fixed effects. and are the fitting coefficients.

When estimating the effect of COVID-19 policies on the number of confirmed cases, it studied the Poisson regression model:


,where is the state-level daily number of confirmed cases. Since the confirmed cases in each state heavily depended on the number of conducted COVID-19 tests, the log-transformed version of the test number variable was used to interpret the estimated coefficient as elasticity. The results demonstrated that statewide stay-at-home orders significantly increased the measure associated with presence at home by about six fold (relative to states without policy). Though the policies such as non-essential business closures and restaurant and bar limits have positive and statistically significant impact on presence at home, their effect sizes were about half of what observed for stay-at-home orders. Meanwhile, there existed a steady decline in the number of daily confirmed COVID-19 cases after 10-15 days after such policies were implemented.

Similarly, a research group in University of Wisconsin-Madison [86] used two social distancing metrics.

  • The median of individual maximum travel.

  • The home dwell time.

The data are derived from large-scale mobile phone location data provided by Descartes Lab [87] and SafeGraph [88]. The metrics are used to evaluate the effectiveness of series of stay-at-home policing on decelerating the spread of the COVID-19 epidemic by mathematical curve fitting models and mechanistic epidemic prediction models. Their results [86] confirmed that state implemented stay-at-home orders increased the amount of time spent at home and the increasing stay-at-home dwell time would help to decrease the amount of daily cases of COVID-19. In conclusion, both studies [86, 84] confirmed that the amount of positive daily COVID-19 cases decreased as more stay-at-home policies were implemented, with the stay-at-home orders being the most effective and bar and restaurant closings being the least effective.

By using metro traffic data to compare epidemics in two major cities with the largest number of COVID-19 reported cases (Daegu and Seoul), the authors in [89] described potential roles of social distancing in mitigating the spread of COVID-19 in South Korea. The authors collected daily numbers of reported cases data in two geographic regions from the Korea Centers for Disease Control and Prevention (KCDC) between January 20 to March 16, 2020, and the daily metro traffic in two cities between 2017 to 2020 was obtained from and

The time-dependent reproduction number , which represents the average number of secondary cases caused by an average individual, given conditions at time t, was estimated using the following equation with a 14-day sliding window:


,where is the reconstructed incidence time series, for example, the number of infected cases on day , and represents the generation-interval distribution randomly drawn from a prior distribution.

After comparing the reconstructed incidence and estimates of in Daegu and Seoul, the results showed that the estimates of gradually decreased and eventually dropped below 1 about one week after the reporting of the first case, while the metro traffic volume also decreased simultaneously. The clear, positive correlations between the normalized traffic and the median estimates of Rt were found in both Daegu ( = 0.90; 95% CI: 0.79-0.95) and Seoul ( = 076; 95% CI: 0.59-0.87), which indicated that staying away from the metro and traveling less had a positive correlation with preventing spreading the virus.

Iv-B Travel Restriction, School Closure and Large-scale Lockdown

To reduce the spread of COVID-19 pandemic in China, restrictions on mobility (hereafter called cordon sanitaire) were imposed on Wuhan City, Hunbei province on January 23, 2020 [90]. To elucidate the role of case importation in transmission in cities across China, the authors collected real-time mobility data from Baidu Inc., together with epidemiological data from each province, and detailed case data with reported travel history. These data would help to ascertain the impact of control measures. Three different COVID-19 ”Generalized” Linear Models, GLM, were built to evaluate hypotheses regarding the effect of mobility and testing on COVID-19 dynamics; model 1 and model 2 were a Poisson GLM and a negative binomial GLM to estimate daily cases counts, where model 3 used a log-linear regression to estimate daily cumulative cases.

The findings in [90] confirmed that the travel restrictions were particularly helpful in the early stage of an outbreak when it was more confined but became less effective as the outbreak became more widespread. The real-time human mobility data from Baidu Inc. presented an expected decline of importation after the establishment of the cordon sanitaire. Since the travel bans prevented traveling into and out of Wuhan around the time of the Lunar New Year celebration, the bans may help to reduce further dissemination of COVID-19 from Wuhan. Except for Hubei, the study also estimated COVID-19 growth rates in all other provinces and found that all other provinces experienced faster growth rates before travel restrictions and substantial control measures were implemented. After the control measures were implemented, growth rates became negative.

The authors in [91] used the example of Japan, the country in Asia that received the largest number of visitors from China, to quantify the impact of the drastic reduction in travel volume on the COVID-19 transmission dynamic outside China, and to estimate reduction in COVID-19 infections and the chance of an outbreak outside China as a result of such travel policies. The epidemiological datasets of confirmed COVID-19 cases outside China were collected from government and news websites as of February 6, 2020. The author [35] quantified the impact on the reduced number of exported cases, the reduced probability of a major epidemic overseas, and the time delay to a major epidemic gained from the reduction in travel volume. The author [35] assumed the epidemic start data was set on December 1, 2019 (Day 0), and then Wuhan was put in lock-down from Day 53 (January 23, 2020). Since the mean incubation period of COVID-19 was nearly 5 days, thus, the impact of reduced travel volumes would start to be interpretable from Day 58. The counterfactual model was used to estimate the reduced volume of exported cases, and Poisson regression was used to fit the following model through Day 57 with following equation:


where was the incidence of exported cases on Day , was the initial value at and presents the exponential growth rate of exported cases outside China. The reduced travel volume of exported cases by Day 67 was calculated by,


, where showed the observed number of cases on day .

According to the calculations and predicted curve, the expected number of confirmed COVID-19 cases between Day 58 (January 28, 2020) and Day 67 would be 321 (with 95% confidence interval: 181, 544), and a total of 95 cases were diagnosed in the empirical observation. Based on the results, the authors estimated that 226 cases (95% CI: 86, 449) were prevented from being exported across the world as a result of the Wuhan lockdown. Furthermore, the researchers considered the probability of a major pandemic and the possible delay, specifically focusing on Japan. Without travel restrictions, researchers found that the probability of a major pandemic would be more than 90%, while it would be ”broadly ranged from 56% to 98%” with restrictions. When mobility is limited, the delay (in days) in time to pandemic is decreased. Furthermore, this paper considers the reduction in COVID-19 spread through contact tracing, where risk reduction reached 37% when 50% of those infected were traced. In Japan, researchers found that the probability of a major epidemic was estimated to be reduced by 7%-20% and a 2-day delay was gained in the estimated time to a major epidemic.

Iv-C Long-term Impact of COVID-19 policies

With nearly every state in the United States placing stay-at-home order and shutting down schools for the rest of the 2019 - 2020 academic year due to the COVID-19 pandemic. The long-term effect of these policy responses attracts many researchers. The Fig. 7 plots the unemployment rate in United States that can reflect the immediate economic impacts of the policies.

Fig. 7: Unemployment Rates in United States 2020

During the pandemic, many employees are unable to travel to work. The authors in [92] investigate on how many jobs can be done at home based the data collected by the Occupational Information Network (O*NET) surveys that covered ”work context” and ”generalized work activities”. They also took this data and merged it with data from the Bureau of Labor Statistics to show the prevalence of each job listed in the United States. The results from this paper showed that about 37% of United States jobs can be done from home, which account for 46% of all wages.

There are more than 55 million students are out of school without an explicit expectation of school reopening. Yet, education leaders have little information on how the education system has been impacted by school closures. How to model the potential impact of COVID-19 school closure based on the existing data becomes a critical topic in the field of educational policy.

To project the learning loss caused by school closure, the authors in [93] assumed that the learning loss due to COVID-19 can be deemed as an extended learning loss due to summer break. They used the data from the past MAP Growth assessment takers (Grades 3-8 students taking exams in 2017-18 and 2018-19 school year) to estimate the average summer loss. By using the ”typical” school year growth rates and summer loss as a reference, they built regression models to project the learning loss due to COVID-19. Under these models, it is estimated that the students would obtain approximately 63 - 68% of the learning gains in reading relative to a typical school year and with 37-50% of the learning gains in math if they were able to return schools in 2020 Fall. The worry in learning loss also exists in worldwide. The article [94] analyzed 27 datasets from low- and middle-income countries to estimate year-on-year growth in student reading achievement under normal conditions. They assumed that learning loss can be estimated as a constant relative to the percent of schooling lost. (e.g. grade 3 students were consistently reading about 20 words per minute faster than their grade 2 counterparts at given percentiles in consecutive grades.) While a 30% learning loss (i.e. the equivalent of an approximately 3-month school closure) would yield a 5.9 correct word per minute loss for mid-percentile grade 3 students (30% of the 19.8 correct word per minute expected gain).

Predicting the long-term effect of COVID-19 is still in a very early stage, related communities are encouraged to collect data from various sources.

V Contact Tracing

In public health, contact tracing is the process of identification of persons who may have come into contact with an infected person and subsequent collection of further information about these contacts. In practice, however, it is a challenging task to record the close contact (e.g. 6 feet) through daily routine intersects.

The researchers are rapidly coalescing around applications for proximity tracing. Different technologies are been utilize in this field. For example, the Bluetooth signal strength can be used to determine whether two smartphones were close enough together for their users to transmit the virus.

The two dominant mobile operating systems owners, Apple and Google published Exposure Notification (a.k.a Privacy-Preserving Contact Tracing Project) in late April. It is a system that contains public available specifications developed by Apple and Google. Exposure Notification utilizes Bluetooth Low Energy technology and privacy-preserving cryptography to decide whether a specific user may have recently been within the proximity of someone that had been infected with COVID-19. Due to security, privacy and political concerns, however, some governments (e.g. Norway, France and United Kingdom) tend to develop their own version the application. We summarize the mainstream contact tracing application below.

  • Singapore, TraceTogether [95]: it uses Bluetooth to approximate your distance to other phones running the same app and stores data for up to 25 days. It does not collect GPS locations or data about users’ WiFi or mobile network.

  • China, Chinese health code system [96]: it is built inside two hugely popular applications WeChat and Alipay in China, to provide a health survey and location based colored health code. The mobile network association is collected at backend to track users location.

  • Austria, StoppCorona [97]

    : it is an open-source project for bluetooth based contact tracing. It claims to use a decentralized approaches for the tracing.

  • Hong Kong, StayHomeSafe [98]: the application together with a wristbands, which is given to all arrivals at the airport, is used to strictly enforce 14-day quarantine. The users need to scan an unique QR code to pair the wristband with the app. Once home, they are to walk around the apartment to calibrate the wristband.

  • South Korea, Corona 100 [99]: it utilizes government data, alerts users when they come within 100 meters of a location visited by an infected person. The GPS data is used to keep tracking the users’ location.

  • France, STOPCovid [100]: it relies on Bluetooth Low Energy to build record the users nearby. If a user test positive of COVID-19, he/she would get a QR code from the doctor and the user can choose to open the app and enter that code to notice the people that he/she interacted with over the past two weeks.

  • Japan, COCOA [101]: it is developed by a group of engineers at Microsoft and utilizes Exposure Notification platform. It records encrypted data flagging phones that have been within one meter for more than 15 minutes; when one person reports the fact that they have tested positive for COVID-19, those other users will be notified.

  • India, Aarogya Setu [102]: based on the both Bluetooth and GPS technologies, it lets users know if they have been near a person with Covid-19 by scanning a database of known cases of infection. The gathered the data stores on the servers and shared with the government.

  • Italy, Immuni [103]: it follows the standards of Exposure Notification, which uses Bluetooth to swap codes between mobile devices.

  • Norway, Smittestopp [104], it utilizes both Bluetooth and GPS singles to estimate user proximity as a means of calculating exposure risk to COVID-19. In addition, it is a centralized application architecture, which means the data is uploaded to a central server controlled by the health authority, instead of being stored locally on devices.

  • United Kingdom, NHS COVID-19 App [105]: it leverages a centralized design that uses Bluetooth to trace the users and stores data on NHS’s servers. (UK is in the transition to move the application under Exposure Notification specifications.)

App-based contact tracing is necessary and useful to control the COVID-19 pandemic since not enough to quarantine people only after symptoms onset. To reduce infections, when a person is confirmed with COVID-19 infection, one should act quickly to find all people this person was in close proximity with. Only a digital, largely automatic solution would help to conduct such fast contact tracing. However, how to effectively evaluate these applications are still under investigation.

The authors in [106] used published parameters for the incubation time distribution (5.2 days) and the epidemic doubling time (5.0 days) from the early epidemic data in China to develop a mathematical model for COVID-19 infectiousness to analyze the contribution of different transmission routes. The model estimated the basic reproductive number equaled two in the early stages of the epidemic in China, while the contributions to included four parts 1) 46% from presymptomatic individuals (who had not shown symptoms yet), 2) 38% from symptomatic individuals, 3) 10% from asymptomatic individuals (who never show symptoms), and 4) 6% from environmentally mediated transmission via contamination. The general mathematical model of COVID-19 infectiousness was determined to illustrate the infectiousness varies as a function of time since infection, , for a representative cohort of infected individuals [106]. The equation is presented as:


,where is proportion asymptomatic, represents relative infectiousness of asymptomatics, describes the infectiousness of an individual currently either symptomatic or presymptomatic, at age of infection and presents environmental infectiousness, which indicates the rate at which a contaminated environment infects new people after a time lag .

In order to estimate the requirements for successful contact tracing, the authors [107] determined the combination of two key parameters needed to reduce to less than 1: 1) the symptomatic individuals should be isolated and 2) the contacts of symptomatic cases should be traced and quarantined. Based on published analytical mathematical framework [2], the authors in [1] quantified the whether the COVID-19 epidemic was expected to be controlled or not by these two interventions. The results indicated that if used by a sufficiently high proportion of the population, immediate notification through a contact-tracing mobile phone app could be sufficient to stop the epidemic. Practical and logistical factors including uptake, coverage, R0in a given population would be used to determine whether an app is sufficient to control epidemic spread, or whether additional measures are required to reduce R0. The performance of the app can be explored at [108].

The authors in [109] estimated the conditions that isolation and contact tracing in settings with various levels of social distancing would be able to contain or slow down COVID-19 epidemic. A stochastic transmission model in [110] was used to calculate the numbers of latently infected persons, infectious persons, and persons who have been diagnosed and isolated in time steps of one day. The author used the model to distinguish between household contacts (close contacts) and non-household contacts, and found that only if the majority of cases were ascertained, then isolation and contacting tracing would be an effective methods to slow down epidemics. Meanwhile, social distancing would reduce the effective reproduction number to below one when non-household contacts were reduced by around 90%. Finally, the combination of social distancing with isolation and contacting tracing have synergistic effects that would increase the prospect of containment.

While many countries utilize Bluetooth-based technologies [111, 112] to help slow the spread, digital contact tracing comes with serious privacy concerns because many proposed apps rely on geo-location tracking and some of them store user data on central servers if people are to be identified and tracked. And due to the lack of privacy regulations by the government, users have to depend on the good will of technology companies to avoid violating their privacy [113, 114].

Vi Conclusion

As the COVID-19 pandemic continues, many academic papers have been published to help with combating it. In this paper, we conduct a literature review from the perspective of data-driven analytics. We investigate the latest solutions for epidemic prediction models, clinical diagnosis, policy effectiveness and contact tracing. Additionally, we study models with latest data to evaluate how good they perform since their publication date and collect data sources for analytical researches.


  • [1] “Scientists are drowning in covid-19 papers. can new tools keep them afloat?”
  • [2] “Coronavirus resource center,”
  • [3] F. Peng, L. Tu, Y. Yang, P. Hu, R. Wang, Q. Hu, F. Cao, T. Jiang, J. Sun, G. Xu et al., “Management and treatment of covid-19: the chinese experience,” Canadian Journal of Cardiology, 2020.
  • [4] P. Zhai, Y. Ding, X. Wu, J. Long, Y. Zhong, and Y. Li, “The epidemiology, diagnosis and treatment of covid-19,” International journal of antimicrobial agents, p. 105955, 2020.
  • [5] G. Pascarella, A. Strumia, C. Piliego, F. Bruno, R. Del Buono, F. Costa, S. Scarlata, and F. E. Agrò, “Covid-19 diagnosis and management: a comprehensive review,” Journal of Internal Medicine, 2020.
  • [6] M. L. Holshue, C. DeBolt, S. Lindquist, K. H. Lofy, J. Wiesman, H. Bruce, C. Spitters, K. Ericson, S. Wilkerson, A. Tural et al., “First case of 2019 novel coronavirus in the united states,” New England Journal of Medicine, 2020.
  • [7] “Centers for disease control and prevention,”
  • [8] “Unemplyment rate in united states,”
  • [9] “Covid-19 and great recession: Unemplyment rate,”
  • [10] “New york times coronavirus (covid-19) data,”
  • [11] F. A. Cássaro and L. F. Pires, “Can we predict the occurrence of covid-19 cases? considerations using a simple model of growth,” Science of the Total Environment, p. 138834, 2020.
  • [12] R. Ranjan, “Predictions for covid-19 outbreak in india using epidemiological models,” medRxiv, 2020.
  • [13] L. Jia, K. Li, Y. Jiang, X. Guo et al., “Prediction and analysis of coronavirus disease 2019,” arXiv preprint arXiv:2003.05447, 2020.
  • [14] X. Zhou, X. Ma, N. Hong, L. Su, Y. Ma, J. He, H. Jiang, C. Liu, G. Shan, W. Zhu et al., “Forecasting the worldwide spread of covid-19 based on logistic model and seir model,” medRxiv, 2020.
  • [15] K. Wu, D. Darcet, Q. Wang, and D. Sornette, “Generalized logistic growth modeling of the covid-19 outbreak in 29 provinces in china and in the rest of the world,” arXiv preprint arXiv:2003.05681, 2020.
  • [16] D. Tátrai and Z. Várallyay, “Covid-19 epidemic outcome predictions based on logistic fitting and estimation of its reliability,” arXiv preprint arXiv:2003.14160, 2020.
  • [17] L. Kriston and L. Kriston, “Projection of cumulative coronavirus disease 2019 (covid-19) case growth with a hierarchical logistic model,” Bull World Health Organ COVID-19 Open Preprints. http://dx. doi. org/10.2471/BLT, vol. 20, 2020.
  • [18] R. Huang, M. Liu, and Y. Ding, “Spatial-temporal distribution of covid-19 in china and its prediction: A data-driven modeling analysis,” The Journal of Infection in Developing Countries, vol. 14, no. 03, pp. 246–253, 2020.
  • [19] X. Zhang, R. Ma, and L. Wang, “Predicting turning point, duration and attack rate of covid-19 outbreaks in major western countries,” Chaos, Solitons & Fractals, p. 109829, 2020.
  • [20] Z. Yang, Z. Zeng, K. Wang, S.-S. Wong, W. Liang, M. Zanin, P. Liu, X. Cao, Z. Gao, Z. Mai et al., “Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions,” Journal of Thoracic Disease, vol. 12, no. 3, p. 165, 2020.
  • [21] A. Simha, R. V. Prasad, and S. Narayana, “A simple stochastic sir model for covid-19 infection dynamics for karnataka after interventions–learning from european trends,” arXiv preprint arXiv:2003.11920, 2020.
  • [22] L. Danon, T. House, and M. J. Keeling, “The role of routine versus random movements on the spread of disease in great britain,” Epidemics, vol. 1, no. 4, pp. 250–258, 2009.
  • [23] G. Giordano, F. Blanchini, R. Bruno, P. Colaneri, A. Di Filippo, A. Di Matteo, and M. Colaneri, “Modelling the covid-19 epidemic and implementation of population-wide interventions in italy,” Nature Medicine, pp. 1–6, 2020.
  • [24] “Who coronavirus disease (covid-19) dashboard,”
  • [25] W. C. Roda, M. B. Varughese, D. Han, and M. Y. Li, “Why is it difficult to accurately predict the covid-19 epidemic?” Infectious Disease Modelling, 2020.
  • [26] G. D. Barmparis and G. Tsironis, “Estimating the infection horizon of covid-19 in eight countries with a data-driven approach,” Chaos, Solitons & Fractals, p. 109842, 2020.
  • [27] L. López and X. Rodo, “A modified seir model to predict the covid-19 outbreak in spain and italy: simulating control scenarios and multi-scale epidemics,” Available at SSRN 3576802, 2020.
  • [28] C. Xu, Y. Yu, Q. Yang, and Z. Lu, “Forecast analysis of the epidemics trend of covid-19 in the united states by a generalized fractional-order seir model,” arXiv preprint arXiv:2004.12541, 2020.
  • [29] M. Bashir, H. A. Sattar, and A. Zaheer, “Trend analysis modelling and prediction of epidemic covid-19 for us, italy, spain and pakistan,” 2020.
  • [30] M. S. Boudrioua and A. Boudrioua, “Predicting the covid-19 epidemic in algeria using the sir model,” medRxiv, 2020.
  • [31] P. Sun, S. Qie, Z. Liu, J. Ren, K. Li, and J. Xi, “Clinical characteristics of hospitalized patients with sars-cov-2 infection: a single arm meta-analysis,” Journal of medical virology, vol. 92, no. 6, pp. 612–617, 2020.
  • [32] M. Parohan, S. Yaghoubi, A. Seraji, M. H. Javanbakht, P. Sarraf, and M. Djalali, “Risk factors for mortality in patients with coronavirus disease 2019 (covid-19) infection: a systematic review and meta-analysis of observational studies,” The Aging Male, pp. 1–9, 2020.
  • [33] “Pubmed,”
  • [34] “Cochrane library,”
  • [35] “Embase database,”
  • [36] “Scopus,”
  • [37] “Google scholar,”
  • [38] A.-B. Haidich, “Meta-analysis in medical research,” Hippokratia, vol. 14, no. Suppl 1, p. 29, 2010.
  • [39] A. Liberati, D. G. Altman, J. Tetzlaff, C. Mulrow, P. C. Gøtzsche, J. P. Ioannidis, M. Clarke, P. J. Devereaux, J. Kleijnen, and D. Moher, “The prisma statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration,” Annals of internal medicine, vol. 151, no. 4, pp. W–65, 2009.
  • [40] R. D. Riley, J. P. Higgins, and J. J. Deeks, “Interpretation of random effects meta-analyses,” Bmj, vol. 342, p. d549, 2011.
  • [41] J. P. Higgins and S. G. Thompson, “Quantifying heterogeneity in a meta-analysis,” Statistics in medicine, vol. 21, no. 11, pp. 1539–1558, 2002.
  • [42] A. Stang, “Critical evaluation of the newcastle-ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses,” European journal of epidemiology, vol. 25, no. 9, pp. 603–605, 2010.
  • [43] C. Huang, Y. Wang, X. Li, L. Ren, J. Zhao, Y. Hu, L. Zhang, G. Fan, J. Xu, X. Gu et al., “Clinical features of patients infected with 2019 novel coronavirus in wuhan, china,” The lancet, vol. 395, no. 10223, pp. 497–506, 2020.
  • [44] K. Wang, P. Zuo, Y. Liu, M. Zhang, X. Zhao, S. Xie, H. Zhang, X. Chen, and C. Liu, “Clinical and laboratory predictors of in-hospital mortality in 305 patients with covid-19: a cohort study in wuhan, china,” China (2/24/2020), 2020.
  • [45] N. Chen, M. Zhou, X. Dong, J. Qu, F. Gong, Y. Han, Y. Qiu, J. Wang, Y. Liu, Y. Wei et al., “Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: a descriptive study,” The Lancet, vol. 395, no. 10223, pp. 507–513, 2020.
  • [46] W.-j. Guan, Z.-y. Ni, Y. Hu, W.-h. Liang, C.-q. Ou, J.-x. He, L. Liu, H. Shan, C.-l. Lei, D. S. Hui et al., “Clinical characteristics of coronavirus disease 2019 in china,” New England journal of medicine, vol. 382, no. 18, pp. 1708–1720, 2020.
  • [47] L. Chen, H. Liu, W. Liu, J. Liu, K. Liu, J. Shang, Y. Deng, and S. Wei, “Analysis of clinical features of 29 patients with 2019 novel coronavirus pneumonia,” Zhonghua jie he he hu xi za zhi= Zhonghua jiehe he huxi zazhi= Chinese journal of tuberculosis and respiratory diseases, vol. 43, pp. E005–E005, 2020.
  • [48] K. Sun, J. Chen, and C. Viboud, “Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study,” The Lancet Digital Health, 2020.
  • [49] Y. Yang, Q. Lu, M. Liu, Y. Wang, A. Zhang, N. Jalali, N. Dean, I. Longini, M. E. Halloran, B. Xu et al., “Epidemiological and clinical features of the 2019 novel coronavirus outbreak in china,” MedRxiv, 2020.
  • [50] J. Li, S. Li, Y. Cai, Q. Liu, X. Li, Z. Zeng, Y. Chu, F. Zhu, and F. Zeng, “Epidemiological and clinical characteristics of 17 hospitalized patients with 2019 novel coronavirus infections outside wuhan, china,” medRxiv, 2020.
  • [51] C. China, “Novel coronavirus pneumonia emergency response epidemiology team,” Vital surveillance: The epidemiological characteristics of an outbreak of, 2019.
  • [52] X.-W. Xu, X.-X. Wu, X.-G. Jiang, K.-J. Xu, L.-J. Ying, C.-L. Ma, S.-B. Li, H.-Y. Wang, S. Zhang, H.-N. Gao et al., “Clinical findings in a group of patients infected with the 2019 novel coronavirus (sars-cov-2) outside of wuhan, china: retrospective case series,” bmj, vol. 368, 2020.
  • [53] P. K. Bhatraju, B. J. Ghassemieh, M. Nichols, R. Kim, K. R. Jerome, A. K. Nalla, A. L. Greninger, S. Pipavath, M. M. Wurfel, L. Evans et al., “Covid-19 in critically ill patients in the seattle region—case series,” New England Journal of Medicine, vol. 382, no. 21, pp. 2012–2022, 2020.
  • [54] K. D. Patil, “Cochran’s q test: Exact distribution,” Journal of the American Statistical Association, vol. 70, no. 349, pp. 186–189, 1975.
  • [55] F. Zhou, T. Yu, R. Du, G. Fan, Y. Liu, Z. Liu, J. Xiang, Y. Wang, B. Song, X. Gu et al., “Clinical course and risk factors for mortality of adult inpatients with covid-19 in wuhan, china: a retrospective cohort study,” The lancet, 2020.
  • [56] C. Wu, X. Chen, Y. Cai, X. Zhou, S. Xu, H. Huang, L. Zhang, X. Zhou, C. Du, Y. Zhang et al., “Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in wuhan, china,” JAMA internal medicine, 2020.
  • [57] F. Caramelo, N. Ferreira, and B. Oliveiros, “Estimation of risk factors for covid-19 mortality-preliminary results,” MedRxiv, 2020.
  • [58] Y. Cheng, R. Luo, K. Wang, M. Zhang, Z. Wang, L. Dong, J. Li, Y. Yao, S. Ge, and G. Xu, “Kidney impairment is associated with in-hospital death of covid-19 patients,” MedRxiv, 2020.
  • [59] V. Y. F. Su, Y.-H. Yang, K.-Y. Yang, K.-T. Chou, W.-J. Su, Y.-M. Chen, D.-W. Perng, T.-J. Chen, and P.-C. Chen, “The risk of death in 2019 novel coronavirus disease (covid-19) in hubei province,” Available at SSRN 3539655, 2020.
  • [60] R. Chen, W. Liang, M. Jiang, W. Guan, C. Zhan, T. Wang, C. Tang, L. Sang, J. Liu, Z. Ni et al., “Risk factors of fatal outcome in hospitalized subjects with coronavirus disease 2019 from a nationwide analysis in china,” Chest, 2020.
  • [61] R.-H. Du, L.-R. Liang, C.-Q. Yang, W. Wang, T.-Z. Cao, M. Li, G.-Y. Guo, J. Du, C.-L. Zheng, Q. Zhu et al., “Predictors of mortality for patients with covid-19 pneumonia caused by sars-cov-2: a prospective cohort study,” European Respiratory Journal, vol. 55, no. 5, 2020.
  • [62] Y. Liu, X. Du, J. Chen, Y. Jin, L. Peng, H. H. Wang, M. Luo, L. Chen, and Y. Zhao, “Neutrophil-to-lymphocyte ratio as an independent risk factor for mortality in hospitalized patients with covid-19,” Journal of Infection, 2020.
  • [63] S. Shi, M. Qin, B. Shen, Y. Cai, T. Liu, F. Yang, W. Gong, X. Liu, J. Liang, Q. Zhao et al., “Association of cardiac injury with mortality in hospitalized patients with covid-19 in wuhan, china,” JAMA cardiology, 2020.
  • [64] L. Wang, W. He, X. Yu, D. Hu, M. Bao, H. Liu, J. Zhou, and H. Jiang, “Coronavirus disease 2019 in elderly patients: Characteristics and prognostic factors based on 4-week follow-up,” Journal of Infection, 2020.
  • [65] D. Colombi, F. C. Bodini, M. Petrini, G. Maffi, N. Morelli, G. Milanese, M. Silva, N. Sverzellati, and E. Michieletti, “Well-aerated lung on admitting chest ct to predict adverse outcome in covid-19 pneumonia,” Radiology, p. 201433, 2020.
  • [66] M. Nikpouraghdam, A. J. Farahani, G. Alishiri, S. Heydari, M. Ebrahimnia, H. Samadinia, M. Sepandi, N. J. Jafari, M. Izadi, A. Qazvini et al., “Epidemiological characteristics of coronavirus disease 2019 (covid-19) patients in iran: A single center study,” Journal of Clinical Virology, 2020.
  • [67] X. Mei, H.-C. Lee, K.-y. Diao, M. Huang, B. Lin, C. Liu, Z. Xie, Y. Ma, P. M. Robson, M. Chung et al., “Artificial intelligence–enabled rapid diagnosis of patients with covid-19,” Nature Medicine, pp. 1–5, 2020.
  • [68]

    E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,”

    Biometrics, pp. 837–845, 1988.
  • [69] A. Agresti and B. A. Coull, “Approximate is better than “exact” for interval estimation of binomial proportions,” The American Statistician, vol. 52, no. 2, pp. 119–126, 1998.
  • [70] Q. McNemar, Psychological statistics.   Wiley New York, 1962, vol. 3.
  • [71] L. Wang and A. Wong, “Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images,” arXiv preprint arXiv:2003.09871, 2020.
  • [72] L. Huang, R. Han, T. Ai, P. Yu, H. Kang, Q. Tao, and L. Xia, “Serial quantitative chest ct assessment of covid-19: Deep-learning approach,” Radiology: Cardiothoracic Imaging, vol. 2, no. 2, p. e200075, 2020.
  • [73] S. Wang, B. Kang, J. Ma, X. Zeng, M. Xiao, J. Guo, M. Cai, J. Yang, Y. Li, X. Meng et al., “A deep learning algorithm using ct images to screen for corona virus disease (covid-19),” MedRxiv, 2020.
  • [74] I. D. Apostolopoulos and T. A. Mpesiana, “Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks,” Physical and Engineering Sciences in Medicine, p. 1, 2020.
  • [75] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [76] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  • [77]

    C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in

    Thirty-first AAAI conference on artificial intelligence, 2017.
  • [78] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2017, pp. 1251–1258.
  • [79] “Covidx,”
  • [80] “Italian radiological cases,”
  • [81] “Bimcv-covid-19,”
  • [82] “Google covid-19 search trends,”
  • [83] “Covid-19 community mobility reports,”
  • [84] R. Abouk and B. Heydari, “The immediate effect of covid-19 policies on social distancing behavior in the united states,” Available at SSRN, 2020.
  • [85] “weather underground,”
  • [86] S. Gao, J. Rao, Y. Kang, Y. Liang, J. Kruse, D. Doepfer, A. K. Sethi, J. F. M. Reyes, J. Patz, and B. S. Yandell, “Mobile phone location data reveal the effect and geographic variation of social distancing on the spread of the covid-19 epidemic,” arXiv preprint arXiv:2004.11430, 2020.
  • [87] “descartes labs,”
  • [88] “safegraph,”
  • [89] S. W. Park, K. Sun, C. Viboud, B. T. Grenfell, and J. Dushoff, “Potential roles of social distancing in mitigating the spread of coronavirus disease 2019 (covid-19) in south korea,” medRxiv, 2020.
  • [90] M. U. Kraemer, C.-H. Yang, B. Gutierrez, C.-H. Wu, B. Klein, D. M. Pigott, L. Du Plessis, N. R. Faria, R. Li, W. P. Hanage et al., “The effect of human mobility and control measures on the covid-19 epidemic in china,” Science, vol. 368, no. 6490, pp. 493–497, 2020.
  • [91] A. Anzai, T. Kobayashi, N. M. Linton, R. Kinoshita, K. Hayashi, A. Suzuki, Y. Yang, S.-m. Jung, T. Miyama, A. R. Akhmetzhanov et al., “Assessing the impact of reduced travel on exportation dynamics of novel coronavirus infection (covid-19),” Journal of clinical medicine, vol. 9, no. 2, p. 601, 2020.
  • [92] J. I. Dingel and B. Neiman, “How many jobs can be done at home?” National Bureau of Economic Research, Tech. Rep., 2020.
  • [93] G. Basilaia and D. Kvavadze, “Transition to online education in schools during a sars-cov-2 coronavirus (covid-19) pandemic in georgia,” Pedagogical Research, vol. 5, no. 4, pp. 1–9, 2020.
  • [94] “Calculating the educational impact of covid-19: Using data from successive grades to estimate learning loss,”
  • [95] “trace together,”
  • [96] “China colored health code,”
  • [97] “stopp corona,”
  • [98] “Stay home safe,”
  • [99] “covid 100,”
  • [100] “stopcovid,”
  • [101] “cocoa,”
  • [102] “aarogyasetu,”
  • [103] “immuni,”
  • [104] “smittestopp,”
  • [105] “nhs covid19 app,”
  • [106] L. Ferretti, C. Wymant, M. Kendall, L. Zhao, A. Nurtay, L. Abeler-Dörner, M. Parker, D. Bonsall, and C. Fraser, “Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing,” Science, vol. 368, no. 6491, 2020.
  • [107] C. Fraser, S. Riley, R. M. Anderson, and N. M. Ferguson, “Factors that make an infectious disease outbreak controllable,” Proceedings of the National Academy of Sciences, vol. 101, no. 16, pp. 6146–6151, 2004.
  • [108] “Covid-19 transmission routes,”
  • [109] M. Kretzschmar, G. Rozhnova, and M. van Boven, “Isolation and contact tracing can tip the scale to containment of covid-19 in populations with social distancing,” Available at SSRN 3562458, 2020.
  • [110] M. Kretzschmar, S. Van den Hof, J. Wallinga, and J. Van Wijngaarden, “Ring vaccination and smallpox control,” Emerging infectious diseases, vol. 10, no. 5, p. 832, 2004.
  • [111] Y. Mao, J. Wang, and B. Sheng, “Mobile message board: Location-based message dissemination in wireless ad-hoc networks,” in 2016 international conference on computing, networking and communications (ICNC).   IEEE, 2016, pp. 1–5.
  • [112] Y. Mao, J. Wang, J. P. Cohen, and B. Sheng, “Pasa: Passive broadcast for smartphone ad-hoc networks,” in 2014 23rd International Conference on Computer Communication and Networks (ICCCN).   IEEE, 2014, pp. 1–8.
  • [113] L. R. Bradford, M. Aboy, and K. Liddell, “Covid-19 contact tracing apps: A stress test for privacy, the gdpr and data protection regimes,” Journal of Law and the Biosciences, 2020.
  • [114] M. Zastrow, “Coronavirus contact-tracing apps: can they slow the spread of covid-19,” Nature Technology Features, 2020.