The recent worldwide health challenge caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a lot of fear and uncertainty for humanity. SARS-CoV-2 is a genetic variant of coronavirus that causes coronavirus disease 2019 (COVID-19). The crisis that governments started to face in the early stage of this phenomenon was controlling the pandemic and Covid-19 outbreak alongside maintaining economic balance and other aspects of governmental matters. One of the essentials to assist decision-makers in developing better solutions has been analyzing the pandemic growth and forecasting Covid-19 cases.
Accurate prediction of Covid-19 daily cases can assist governments with macro-decisions and controlling the pandemic better. Meanwhile, artificial intelligence techniques have proven that they are capable and accurate in finding patterns from indistinctive and complicated data features in different phenomena, such as pandemic epidemiological studies. Since the emergence of SARS-CoV-2, researchers have applied various techniques to study different aspects of the current pandemic, such as predicting COVID-19 cases growth rate.
Although multiple techniques such as [Pathan2020, Arora2020]
have utilized a variant of Recurrent Neural Networks (RNN) to predict daily cases, their proposed models have two shortcomings. Firstly, many studies[Lee2020, Hawas2020] have chosen the framing range by assuming a fixed specific number. However, RNN and LSTM require a proven best time step for framing the sequence data that guarantees sufficient distinctive features, and on the other hand, it prevents adding too much data to mislead the model. In other words, by controlling the amount of the sequence information, we try to provide the model with the most informative data sequence for training without feeding it extra data that can cause noise. Secondly, they utilized customized architectures obtained by trial and error, which might still not be the best topology chosen. As a result of these two main factors, there will be so much inaccuracy in prediction.
In this paper, we have taken a deep neuroevolutionary approach, using the Binary Bat algorithm to optimize the hyperparameters of a recurrent neural network with Long Short-Term Memory (LSTM) layers to predict daily cases. Hyperparameters optimization is an NP-hard problem as the optimal solution cannot be guaranteed to be obtained unless by performing an exhaustive search in the feasible region. Therefore, we have chosen the BBA algorithm as a well-known metaheuristic technique for exploring the best set of hyperparameters in the search space. This approach helps us obtain the optimum time-sequence as well as the most optimized architecture for our deep learning framework. We also introduce a new feature augmentation version of the latest available public COVID-19 dataset provided by the European Center for Disease Prevention and Control. It will be shown that the model’s accuracy is increased with the help of the new features and can simulate the regional pandemic behavior more precisely. To validate the framework and the final model, we have conducted various experiments that, in all cases, show the effectiveness of our approach.
In the following sections, we first investigated the related works and briefly talked about the background. In section 3, our proposed model is explained in detail, and we discussed why this approach had been taken for forecasting COVID-19 cases. In section 4, experimental results are presented and investigated in detail, and finally, in section 5, we discussed the conclusion and possible future works.
2 Related works
There are many studies on applications of artificial intelligence for the Covid-19 pandemic [Lalmuanawma2020, Ke2020, Tuli2020]. One of the main topics among these studies is predicting new cases to help health managers plan and develop appropriate strategies to deal with the Covid-19. Here we study some of them:
ArunKumar et al. [ArunKumar2021]
predicted the future trends of the cumulative fatalities of the top 10 countries in the range of 60 days, using RNN along with Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM).
utilized transfer learning in LSTM networks to forecast COVID-19 cases using the early COVID infected countries such as Italy and used the learned model to predict cases in other countries. The results of the model on multiple countries showed the effectiveness of this approach.
Shastri et al. [Shastri2021] proposed a nested ensemble model using LSTM to enhance the accuracy of predicting daily cases of India.
Abbasimehr et al. [Abbasimehr2021] studied three different hybrid deep models, namely multi-head attention, LSTM, and CNN, optimized with a Bayesian algorithm to forecast COVID-19 cases. The results showed the superiority of their proposed model among the studied benchmark models.
[Chandra2021] used LSTM, bidirectional LSTM and encoder-decoder LSTM models for multi-step forecasting of COVID-19 two-month ahead cases in India. They claimed that the deep models are promising in terms of finding the long-term prediction of cases.
Salgotra et al. [Salgotra2020]
utilized gene expression programming (GEP) to present a model for predicting confirmed cases (CC) and death cases (DC) in the fifteen most affected countries of the world. Two GEP models were introduced for CC and DC for all 15 countries. The results were shown that GEP provides better results than neural network models when the total experimental data is limited.
To estimate the possible spread of the Coronavirus 2 (SARS-CoV-2) in three Indian cities, a new GEP based model was presented in[Salgotra2020a]. The proposed model is utilized to predict the total number of cases based on CC, DC, and the other three parameters.
In [Chimmula2020], LSTM was used to predict the trends and possible stopping time of the current COVID-19 outbreaks in Canada. Since COVID-19 is a time series dataset, sequential networks are useful to extract a pattern from it. In [Chimmula2020], the internal connections of LSTM were established to improve its performance. The results show that the ending point of the COVID-19 outbreak was predicted in June 2020 in Canada.
To enhance the public health management in dealing with the COVID-19 in two high daily incidences of new cases and deaths, [daSilva2020]
3.1 Binary Bat Algorithm
Bat algorithm has been inspired by simplification and simulation of the echolocation capability of bats in 2010 [Yang2010]. Similar to other population-based algorithms, BA starts with randomly generated individuals. In BA, each bat represents an individual, which is a solution in the search space. Each bat can be represented by a group of vectors: frequency, velocity, and position. For the ith bat these vectors are updated according to equation (1), (2) and (3) respectively.
Where is the frequency of th bat, and show the maximum and minimum value of frequency respectively. is random number in the interval [0,1]. and indicate the velocity and position of the th bat. shows the best position by the entire population so far. Algorithm 1 shows the pseudo-code of the basic BA.
where n is the number of bats (the population size) and
is a uniformly distributed random real number in the range [0,1].r is pulse emission rate and increase over the course of iteration by the following equation:
Where is constant and shows the initial pulse emission rate of ith bat. BA utilizes a local search (lines 5-8) to create a solution near the obtained ones.
In Eq.(5), is one of the current best solutions which is selected by some selection mechanism. is a random number in the interval [-1,1], and A is the average loudness of all bats, which is calculated as follows:
Based on Eq.(6) loudness is decreased as the iteration processed. is similar to the cooling factor in simulated annealing [Yang2010]. The basic BA was developed for solving continuous problems [Yang2010]. A binary version of BA (BBA) was developed in [Mirjalili2014]. BBA employs a v-shaped transfer function to transfer all real-valued velocities to the range of [0,1] as follows:
where show the jth element of vector at iteration t. In BBA, the rule for updating bat’s position is redefined as:
3.2 Deep Recurrent Networks
Recurrent Neural Networks (RNN) were proposed as a solution to overcome simple neural networks’ inability to learn sequence data. In sequential data such as signals [Xiong2018], stock price [Rather2015], machine translation [Liu2015], the temporal arrangement and chain dependency of samples create meaningful patterns throughout the time. Since simple neural networks have a feed-forward structure, they cannot learn time-variant features. To overcome this shortcoming, different variety of feedback node connections were proposed as a variant of RNNs [Chung2014, Schuster1997, Soltani2016]. These connections shape a directed graph in the temporal sequence direction that can learn and extract the sequential data’s temporal instinct patterns. In RNNs, unlike simple neural networks, each node’s output depends on the output of previous nodes. In other words, it can be said that RNNs are capable of memorizing previous computations to the current state. Fig.1 indicates a simple recurrent network.
As it’s displayed in Figure 1, is the input at time step t, is the hidden state at step t which is also shown as rectangle units and has the role of memory in the network, and finally is the hidden state at time step t
. In this manner, recurrent networks can utilize previous computations. Although this structure seems to be promising in terms of keep tracking of previous states and working similar to memory, simple recurrent networks are not capable of memorizing more than a few earlier time steps due to the vanishing gradient problem[ShivaPrakash2019]
. Vanishing gradient encounters when Neural Network or, in this case, RNN is being trained by gradient-based learning and backpropagation method. Backpropagation computes the gradient of the output loss with respect to the network’s weights. The gradient is calculated using the chain rule and relative derivative. As a result of the consecutive multiplication of the chain rule, the gradient value usually drops to a tiny number in deeper neural networks, and as a result, the network stops learning. This means that the network will soon be incapable of learning the complicated instinct features of the sequence data and discover the long-term dependencies. In other words, it can not remember more complicated time-dependent sequential information, which is responsible for long-term memory.
3.3 Long Short-term Memory
To overcome the simple RNNs’ shortcomings, Long Short-Term Memory (LSTM) [Hochreiter1997] was proposed. The vanilla LSTM has the same chain-like architecture as RNN, which was introduced in the previous section. However, each memory unit of LSTM has a different structure and consists of more complicated functionalities than the vanilla RNN. In LSTM, each memory cell makes small modifications to the information by simple mathematical operators such as multiplication and addition on the information flowing through a mechanism called Cell states. This way, the LSTM unit can selectively keep or forget the information. This information generally has three main dependencies. Firstly, the previous information that is passed by the memory after the last timestep through the cell state. Secondly, the previous cell’s output which is also known as the hidden state, and lastly, the input at the current timestep. Another important term in LSTM is the analogy with conveyor belts as a mechanism to move the information flow through the LSTM block. As the information is being passed alongside the conveyor belts, the information can be added, removed, or modified by utilizing simple linear operators and Sigmoid neural net layers. This way of controlling information lets this primary component, also known as the Cell state, play a key role in keeping the main information and features for that particular time step. The generic architecture of LSTM is provided in Fig.2.
4 Forecasting COVID-19 with NAS-BBA
4.1 Dataset and Challenges
SARS-COV-2 is a newly discovered virus in late 2019. The world health organization officially announced the pandemic caused by this virus on December 31st, 2019. Therefore, to study this virus’s epidemiological behavior, especially in the first states of its outbreak, there was not much data available to analyze. It’s also worth mentioning that, to have a fair epidemiological evaluation and train an accurate model of the pandemic, we must only study the regions with the same culture and social behavior since an epidemic is highly dependent on those factors. These reasons lead us to have very limited data. In this paper, we use the open geographic distribution data of COVID-19 cases worldwide retrieved from the European Centre for Disease Prevention and Control111https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases to build a model for forecasting Iran’s daily cases on non-lockdown days. The raw version of this dataset consists of 12 features. An overview of 5 samples of the data with some of their main features is provided in Table .1. We utilize two features from this dataset to combine with two new features that we will introduce shortly.
European Centre for Disease Prevention and Control’s Dataset
Since one of the main causes of spreading COVID-19 is human interactions and in-person communications, controlling this matter was one of the macro decision-makers first concerns. As a result, by early April 2020, over one-third of the global population was under some form of movement restriction, quarantine, or COVID-19 lockdown. Although to prevent further economic damage, most countries’ health organizations started considering new protocols for routine economic activities and workplaces such as decreasing the number of employees and monitoring their health condition [Cirrincione2020]. Meanwhile, research works [Tay2020] showed that there is another important factor that can have a serious impact on the outbreak despite all the protocols, and that is the tendency of people to break the quarantine and have in-person social communications [Koh2020]. Therefore one of the main factors we have taken into account in this paper is the impact of non-workdays or holidays on the pandemic case numbers.
To do so, we introduce the first augmented feature by determining the type of days based on whether it’s a holiday or it’s a regular workday and is called "d_type". We extracted the holidays’ status of Iran from Google Calendar API and gave them the value of 1 if the corresponding day was a holiday and 0 if it’s a workday. The second augmented feature is extracted from the holidays feature because each holiday increases people’s tendency for unnecessary gatherings in quarantine. Therefore we introduced the "gathering" feature, and each sample gets a value of 1 if it’s a holiday or it’s a non-holiday, and between two holidays; otherwise, it gets 0. Lastly, we use an index to keep track of the sequence. A part of the new data is shown in Table .2.
4.2 Neural Architecture Search with BBA
To deal with the mentioned challenges, we have taken an Evolutionary Neural Architecture Search (NAS) approach to optimize the deep model hyperparameters. The optimization of hyperparameters is an NP-hard problem, which means finding the optimal solution we require to perform an exhaustive search in the solution space using metaheuristic techniques. Therefore we have chosen the BBA as it is a widely used metaheuristic algorithm [Gupta2019, Nakamura2012] and as the main paper claims, it is superior to its other competitive binary algorithms. From now, we refer to this proposed framework as NAS-BBA.
Previous research [Zoph2016, Stanley2002] suggest that neural architecture search techniques are able to design the simplest topology for the network as well as increase the performance of the final output. The NAS approach also develops a deep architecture with a sufficient number of parameters and not too many. This helps the model deal with limited data and learn abstract information from layers without getting overfitted. In the meanwhile, training time is longer for RNNs architecture compared to the architectures that can process the data in parallel. Moreover, adjusting the numbers of LSTM layers and the number of units in each layer will result in a large number of architectures. This is a time-consuming process and requires so much trial and error. Therefore NAS is a handy and reasonable approach to design an efficient deep model.
Before using BBA to optimize the model, there are two main factors that we have to consider. We first have to define an encoding for the population’s individuals so it can represent the problem clearly. The second thing that we have to focus on is utilizing a convenient fitness function for the problem. We will discuss these two factors and how we customized them for forecasting COVID-19 cases.
4.2.1 Defining Individuals
The individuals are defined using a hybrid encoding structure as the population of BBA. Each individual consists of 4 parts. The first two and encoded in the Binary scheme. Vector is responsible for determining the existence of a layer. In other words, if element has the value of 1, it means the layer is activated, and if it has the value of 0, it shows the absence of the corresponding layer in the model. The second binary vectorthat can be split into k subvectors. Each of the k new vectors determine the number of units in the corresponding layer, and lastly, the fourth vector represents the number of timesteps in which we use to frame the data for the sequential model. A simple representation of this encoding scheme is provided in Figure.3. The overall number of elements in each individual is fixed and can be calculated as Eq.(9). and are the maximum numbers of layers and activations, respectively. For instance, means that there are three layers that BBA can determine their existence. The first logarithm term in the equation is responsible for converting the maximum number of units in each layer to the suitable number of binary units capable of representing it. Likewise, the second logarithm term converts the maximum timesteps that we defined to the number of gray-code encoding units. It is also worth mentioning that maps to the least integer, greater than or equal to .
4.2.2 Selecting Fitness Function
For evaluating each Deep Model corresponding to each individual in the population, we need to select a convenient fitness function. Since the final goal is forecasting COVID-19 daily cases, we can conclude that the problem is regression. As the literature of artificial neural networks and deep learning models in regression problems suggests, we select Mean Squared Error (MSE) as BBA’s fitness function. Eq.(10)
In the training phase, for each generation, the current population will be altered as in Eq.(8). Each individual will then be split into parts explained previously and mapped to the corresponding component of the deep model as a candidate solution. Then the deep model will be trained with training as long as the specified criteria are met. Finally, the trained model will be evaluated with unseen data, and the MSE value will be returned to the BBA as the individual’s fitness value. In this way, we can evaluate every generated model. This process will end when BBA’s termination condition is reached, and the will be returned as the best solution.
In this section, we first introduce the deep structure used for forecasting COVID-19 cases, then we address the experimental setting and specified parameters, and finally, we discuss the experimental results.
5.1 Deep Model Structure
In this paper, we utilized a 5-layer deep recurrent network using vanilla LSTM units to forecast COVID-19 cases. We used an architecture consisting of two LSTM layers and two dense layers, and the output layer. Since we need at least one LSTM layer to extract the time dependency information and one output layer to output the predicted result, we consider the first and last layers as fixed layers and do not define a element in the individual corresponding to these layers. A simple scheme of this architecture is illustrated in Fig.4.
5.2 Experimental Setting
Dataset: In the experiments, we determined the maximum number of timesteps for framing the sequence data to 31. This number requires 5 elements of our individuals to be encoded in gray code. Before the evaluation of each individual, data is first framed into samples of timesteps and features as shown in Fig.5. We split it into two train and test data with the ratio of 80:20, respectively. Then we normalize the data, so it gets rescaled to the range of and use the train data for the training phase and test data to evaluate the model.
Deep Model: For the deep model, the maximum number of units in each layer is 31 for each normal LSTM layer and 63 for the other two dense layers. For the last layer, we used a single neuron to predict the output. As the literature suggests, we added a dropout rate of 0.8 to each LSTM layer and l2 regularization with a lambda rate of 0.01. We encoded the ReLU function by 1, and the Sigmoid function by 0 for each individual’s activation element corresponding to dense and outputs layers’ activations. The final structure of each individual consists of 32 elements. In the experiments, we trained each model by 200, 500, and 1000 epochs to evaluate the BBA’s individuals’ fitness, and after obtaining the best architecture, we trained the model by 2,000 epochs. Throughout this study, we run every experiment three times and report the mean RMSE loss as the final score.
BBA: The number of population, iterations and input parameters of the BBA is set as the base research paper [Mirjalili2014] determined and is provided in Table 3. It should be mentioned that, due to the high computational time of fitness evaluation, the number of BBA’s iteration, population number, and Deep Models’ epochs are kept limited for the experiments.
|Model iterations||200, 500, 1000|
The model is implemented with Tensorflow 2.2.0 in Python bound with BBA’s MATLAB code retrieved from
The model is implemented with Tensorflow 2.2.0 in Python bound with BBA’s MATLAB code retrieved fromhttps://www.mathworks.com/matlabcentral/fileexchange/44707-binary-bat-algorithm. To obtain unbiased results, all experiments are conducted using the same PC with the detailed configuration settings, as shown in Table 4.
|CPU||Intel Core i7-6700HQ|
|GPU||NVIDIA GeForce GTX 980|
|Operating System||Windows 10 Pro 64-bit|
|Implementation Environment||MATLAB R2018b|
5.3 Experimental Results
To evaluate the effectiveness of the proposed approach, we conducted several different experiments on the COVID-19 dataset. We first run the framework on a population of 10 and 20 with 200 epochs and compare the two output models. Then we studied the influence of epoch number on the improvement of the final architecture by setting it to 200, 500, and 1000 (M1-M3) and compared them with five customized models (Network1-Network5). To study the introduced data features’ effectiveness, we train and test the best model on the initial data and the new data with augmented features(M1 vs. M4). The obtained architectures from the NAS-BBA framework and the self-defined architectures with their corresponding detailed information are provided in Table .6.
As it can be observed in Table.5, there was a meaningful improvement in M1 to M3 networks when the epoch numbers increased from 200 to 1000. This also proves that the higher numbers of epochs give a sufficient amount of time to the NAS-BBA framework for a better evaluation of fitness corresponding to each individual. In other words, suppose we set the number of epochs to 1000 for the framework. This helps the deep architecture corresponding to each individual to be trained for a longer time and, as a result, provide a more accurate RMSE as fitness value, and therefore the best individual will be chosen with less error. Also, to study the effect of the population number on the framework’s accuracy, we conducted experiments on NAS-BBAS with 10 and 20 individuals (M1 and M4). Due to the high computational time of Deep Models evaluations for each individual, we kept the number of epochs to 200 for the BBA fitness evaluation. As shown in Fig.6, the mean loss value obtained by the NAS-BBA with 20 individuals had significant improvement compared to the 10-individual version. It is also evident in Fig.6 that both validation and train loss of NAS-BBA with a population of 20 (P20) has a decreasing trend to the last epoch. On the contrary, the P10 version almost started getting overfitted from the 1700th epoch, and the validation loss started increasing from then.
|Model||Mean RMSE Loss|
Deep Models’ Architecture Used for Forecasting COVID-19 Cases
|Network Name||Time steps||Existence||Activation Functions||LSTM1||LSTM2||Dense1||Dense2|
|E: Existent||N: Non-Existent:||R: ReLU||S: Sigmoid|
To further show the importance of having an optimized architecture to forecast COVID-19 cases and also showing the effectiveness of NAS-BBA, the performance of 5 more networks (Network1-Network5) was evaluated. We introduce these networks by setting their hyperparameters in the initially defined range. The important train and validation loss graphs of models are provided in Fig.8, and for better observing differences, train and validation’s loss values are plotted in Fig.7.
From the left learning curve graph of the Network2 model in Fig.8, it is evident that although the training learning curve of this model keeps a decreasing trend till the last epoch, the validation curve starts getting a sharply increasing curve after around the 1000th epoch. On the other side, it can also be seen that Network4’s train and validation loss graph both keep the decreasing trend almost through the whole training phase, but the network cannot decrease the loss value from the 1600th epoch. This happens due to the insufficient number of time steps or hidden units. Lastly, we can see that Network3 also gets overfitted shortly after around epoch 700 and doesn’t have any further improvements despite the fact that its hyperparameters such as hidden units and timesteps are closer to the one selected by NAS-BBA.
5.3.2 Effect of Dataset with Augmented Features
To validate the effectiveness of the proposed COVID-19 dataset with augmented features, we train the best-generated model (M4) with the new dataset and compare it with the original one. The learning curve plot of the two settings is illustrated in Fig.9. As it is evident in Fig.9, the validation loss of the single feature data doesn’t improve much after the 1000th epoch. However, it can be observed that in the train and validation loss obtained by learning the new dataset, the model almost keeps the regular decreasing trend till the last epoch. Also, the final validation loss for the model trained with the new data is superior over the model trained by the original data. The training and validation loss of the original and new datasets are provided in Table .7. Another important thing to interpret from Fig.9 is the overfitting of the best model on the original data in a short time after about the 400th epoch. This also shows that the model wasn’t capable of finding sufficient distinguished features on the training dataset to increase the accuracy of forecasting cases on validation data.
|Model||Train Loss||Validation Loss|
In this paper, we proposed a new approach and dataset to forecast the daily cases of COVID-19 more accurately than the common approaches that are based on trial and error in finding an acceptable architecture. We also mentioned the limitations caused by the data-hungry problem in deep learning models and investigated how the proposed approach guarantee to find an architecture that can provide a promising solution despite having limited data such as COVID-19 cases. To validate our proposed approach, NAS-BBA, we provided a set of detailed experiments and compared the results of different custom architectures and the ones obtained by the NAS-BBA framework. In all the cases, the results validate the proposed approach’s effectiveness in finding the best deep architecture for forecasting COVID-19 cases. Finally, we trained the best-generated model on the original data and the one proposed in this paper. The results indicate that our proposed dataset with augmented features provides a significant improvement to the model.
From this paper, there can be several topics for the research community that deserve further study. First, as we also mentioned in the experiment section, the deep model training phase for BBA individual fitness evaluation is time-consuming. One can find utilizing an alternative method for the training phase that is more accurate and faster. Secondly, the impact of other hyperparameters such as learning rates, regularization lambda coefficient, or optimization method on improving the final model can be studied. Also, an alternate optimization method can be utilized [Rahbar2020] for tuning the hyperparameters. Lastly, in this paper, we employed vanilla LSTM units to forecast COVID-19 cases. In future research, other variants of LSTM, recurrent units, and advanced structures such as the combination of convolution and recurrent neural networks can be utilized, and their efficacy can be studied.