I Goal
The COVID-19 pandemic has led to a massive global crisis, caused by the rapid spread rate and severe fatality, especially, among those with a weak immune system. In this work, we use the available COVID-19 time-series of the infected cases to build models for predicting the number of cases in the near future. In particular, given the time-series till a particular day, we make predictions for the number of cases in the next days, where . This means that we predict for the next day, after 3 days, and after 7 days. Our analysis is based on the time-series data made publicly available on the COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at the Johns Hopkins University (JHU) (https://systems.jhu.edu/research/public-health/ncov/) [JHU_article].
Let denote the number of confirmed cases on the -th day of the time-series after start of the outbreak. Then, we have the following
-
The input consists of the last samples of the time-series given by .
-
The predicted output is , .
-
Due to non-stationary nature of the time-series data, a sliding window of size is used over to make the prediction, and is found via cross-validation.
-
The predictive function is modeled either by a polynomial or a neural network, and is used to make the prediction:
Countries considered in the analysis |
Sweden |
Denmark |
Finland |
Norway |
France |
Italy |
Spain |
UK |
China |
India |
Iran |
USA |
Ii Dataset
The dataset from JHU contains the cumulative number of cases reported daily for different countries. We base our analysis on 12 of the countries listed in Table I. For each country, we consider the time-series starting from the day when the first case was reported. Given the current day index , we predict the number of cases for the day by considering as input the number of cases reported for the past days, that is, for the days to .
Iii Approaches
We use data-driven prediction approaches without considering any other aspect, for example, models of infectious disease spread [folkhal]. We apply two approaches to analyze the data to make predictions, or in other words, to learn the function :
-
Polynomial model approach: Simplest curve fit or approximation model, where the number of cases is approximated locally with polynomials is a polynomial.
-
Neural network approach
: A supervised learning approach that uses training data in the form of input-output pairs to learn a predictive model
is a neural network.
We describe each approach in detail in the following subsections.
Iii-a Polynomial model
Iii-A1 Model
We model the expected value of as a third degree polynomial function of the day number :
The set of coefficients are learned using the available training data. Given the highly non-stationary nature of the time-series, we consider local polynomial approximations of the signal over a window of
days, instead of using all the data to estimate a single polynomial
for the entire time-series. Thus, at the -th day, we learn the corresponding polynomial using .Iii-A2 How the model is used
Once the polynomial is determined, we use it to predict for -th day as
For every polynomial regression model, we construct the corresponding polynomial function by using as the most recent input data of size . The appropriate window size is found through cross-validation.
Iii-B Neural networks
Iii-B1 Model
We use Extreme Learning Machine (ELM) as the neural network model to avoid overfitting to the training data. As the length of the time-series data for each country is limited, the number of training samples for the neural network would be quite small, which can lead to severe overfitting in large scale neural network such as deep neural networks (DNNs), convolutional neural networks (CNNs), etc.
[DNN_2013, CNN_2012]. ELM, on the other hand, is a single layer neural network which uses random weights in its first hidden layer [elm_HUANG2015]. The use of random weights has gained popularity due to its simplicity and effectiveness in training [giryes_randomweights, SSFN_2020, HNF_2020]. We now briefly describe ELM.Consider a dataset containing samples of pair-wise -dimensional input data and the corresponding
-dimensional target vector
as . We construct the feature vector as , where-
weight matrix
is an instance of Normal distribution,
-
is the number of hidden neurons, and
-
is the rectified linear unit (ReLU).
To predict the target, we use a linear projection of feature vector onto the target. Let the predicted target for the -th sample be . Note that . By using -norm regularization, we find the optimal solution for the following convex optimization problem
(1) |
where denotes the Frobenius norm. Once the matrix is learned, the prediction for any new input is now given by
Iii-B2 How the model is used
When using ELM to predict the number of cases, we define and . Note that and . For a fixed , we use cross-validation to find the proper window size , number of hidden neurons
, and the regularization hyperparameter
.Iv Experiments
Iv-a With the available data till May 4, 2020
In this subsection, we make predictions based on the time-series data which is available until May 4, 2020. We estimate the number of cases for the last 31 days of the countries in Table I. For each value of , we compare the estimated number of cases with the true value and report the estimation error in percentage, i.e.,
(2) |
We carry out two sets of experiments for each of the two approaches (polynomial and ELM) to examine their sensitivity to the new arriving training samples. In the first set of experiments, we implement cross-validation to find the hyperparameters without using the observed samples of the time-series as we proceed through 31 days span. In the second set of experiments, we implement cross-validation in a daily manner as we observe new samples of the time-series. In the latter setup, the window size varied with respect to time to find the optimal hyperparameters as we proceed through time. We refer to this setup as ’ELM time-varying’ and ’Poly time-varying’ in the rest of the manuscript.
We first show the reported and estimated number of infection cases for Sweden by using ELM time-varying for different ’s in Figure 1. For each , we estimate the number of cases up to days after which JHU data is collected. In our later experiments, we show that ELM time-varying is typically more accurate than the other three methods (polynomial, Poly time-varying, and ELM). This better accuracy conforms to the non-stationary behavior of the time-series data, or in other words that the best model parameters change over time. Hence, the result of ELM time-varying is shown explicitly for Sweden. According to our experimental result, we predict that a total of 23039, 23873, and 26184 people will be infected in Sweden on May 5, May 7, and May 11, 2020, respectively.
Histograms of error percentage of the four methods are shown in Figure 2 for different values of . The histograms are calculated by using a nonparametric kernel-smoothing distribution over the past 31 days for all 12 countries. The daily error percentage for each country in Table I is shown in Figures 3-11
. Note that the reported error percentage of ELM is averaged over 100 Monte Carlo trials. The average and the standard deviation of the error over 31 days is reported (in percentage) in the legend of each of the figures for all four methods. It can be seen that daily cross-validation is crucial to preserve a consistent performance through-out the pandemic, resulting in a more accurate estimate. In other words, the variations of the time-series as
increases is significant enough to change the statistics of the training and validation set, which, in turn, leads to different optimal hyperparameters as the length of the time-series grows. It can also be seen that ELM time-varying provides a more accurate estimate, especially for large values of . Therefore, for the rest of the experiments, we only focus on ELM time-varying as our favored approach.Another interesting observation is that the performance of ELM time-varying improves as increases. This observation verifies the general principle that neural networks typically perform better as more data becomes available. We report the average error percentage of ELM time-varying over the last 10 days of the time-series in Table II. We see that as increases the estimation error increases. When , ELM time-varying works well for most of the countries. It does not perform well for France and India. This poor estimation for a few countries could be due to a significant amount of noise in the time-series data, even possibly caused by inaccurately reported daily cases.
V Conclusion
We studied the estimation capabilities of two well-known approaches to deal with the spread of the COVID-19 pandemic. We showed that a small-sized neural network such as ELM provides a more consistent estimation compared to polynomial regression counterpart. We found that a daily update of the model hyperparameters is of paramount importance to achieve a stable prediction performance. The proposed models currently use the only samples of the time-series data to predict the future number of cases. A potential future direction to improve the estimation accuracy is to incorporate constraints such as infectious disease spread model, non-pharmaceutical interventions, and authority policies [folkhal].
Country | Sweden | Denmark | Finland | Norway | France | Italy | Spain | UK | China | India | Iran | USA |
1 day prediction | 0.9 | 0.5 | 0.9 | 0.2 | 0.8 | 0.1 | 0.5 | 0.3 | 0 | 0.7 | 0.1 | 0.4 |
3 days prediction | 2.6 | 0.7 | 0.7 | 0.6 | 2 | 0.3 | 2.5 | 1.3 | 0 | 2.1 | 0.2 | 1.7 |
7 days prediction | 2 | 4.8 | 2.2 | 1.2 | 18.2 | 1.1 | 3.1 | 3 | 0.2 | 8.8 | 0.6 | 4.9 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |