Cellular Traffic Prediction with Recurrent Neural Network

by   Shan Jaffry, et al.

Autonomous prediction of traffic demand will be a key function in future cellular networks. In the past, researchers have used statistical methods such as Autoregressive integrated moving average (ARIMA) to provide traffic predictions. However, ARIMA based predictions fail to give an exact and accurate forecast for dynamic input quantities such as cellular traffic. More recently, researchers have started to explore deep learning techniques, such as, recurrent neural networks (RNN) and long-short-term-memory (LSTM) to autonomously predict future cellular traffic. In this research, we have designed a LSTM based cellular traffic prediction model. We have compared the LSTM based prediction with the base line ARIMA model and vanilla feed-forward neural network (FFNN). The results show that LSTM and FFNN accurately predicted future cellular traffic. However, it was found that LSTM train the prediction model in much shorter time as compared to FFNN. Hence, we conclude that LSTM models can be effectively even used with small amount of training data which will allow to timely predict future cellular traffic.



There are no comments yet.


page 1

page 2

page 3

page 4


NeuTM: A Neural Network-based Framework for Traffic Matrix Prediction in SDN

This paper presents NeuTM, a framework for network Traffic Matrix (TM) p...

Cellular Traffic Prediction and Classification: a comparative evaluation of LSTM and ARIMA

Prediction of user traffic in cellular networks has attracted profound a...

5G Traffic Prediction with Time Series Analysis

In todays day and age, a mobile phone has become a basic requirement nee...

House Price Prediction Using LSTM

In this paper, we use the house price data ranging from January 2004 to ...

Convolutional LSTM models to estimate network traffic

Network utilisation efficiency can, at least in principle, often be impr...

Traffic Flow Combination Forecasting Method Based on Improved LSTM and ARIMA

Traffic flow forecasting is hot spot research of intelligent traffic sys...

Enhancing Operation of a Sewage Pumping Station for Inter Catchment Wastewater Transfer by Using Deep Learning and Hydraulic Model

This paper presents a novel Inter Catchment Wastewater Transfer (ICWT) m...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Cellular communication is the most popular and ubiquitous telecommunication technology. Recently, owing to novel use cases, such as, multimedia video download, 4K/8K streaming etc., the amount of cellular data traffic has soared exponentially. It is expected that in the near future, i.e. by 2023, the monthly mobile data demands will exceed beyond 109 Exabyte (Exa = ) which currently rests at a modest 20 Exabytes per month consumption [2]

. Cellular users, nevertheless, will expect high speed and ubiquitous connectivity from the network operators. Providing unhindered, ubiquitous, and high quality of service will be a serious challenge for network operators. Network operators must update traffic planning tools so they can know in advance about the state of future traffic demands. Hence, operators will rely on data-driven self-organizing networks (SON) powered by machine learning (ML) and artificial intelligence (AI). ML and AI enabled networks can preemptively take important decisions with limited human intervention. Prediction of cellular and data traffic patterns will be a key job that SON perform.

Cellular traffic prediction will enable network operators to promptly distribute resources as per the requirement of competing users. With the informed network state, operators may also allow resource sharing between devices [9, 10]. This will also enable high spectral efficiency and will prevent outages caused due to cell overload. If a network can accurately predict future traffic loads in specific cells, it may take preventive actions to avoid outages. For example, network may permit device-to-device communication to relieve the base station [8].

Fig. 1: LTE-A network architecture.

Recent advances in data analytics and the availability of powerful computing machines have enabled operators to harness the power of Big data to analyze and predict network operations. Hence, advanced variations of data driven ML and AI techniques are playing an ever increasing role in all aspects of modern human lives. In particular, deep learning, a special class of ML and AI algorithms, can solve enormously complex problems by leveraging the power of very deep neural network layers [11]

. Deep learning algorithms can extract valuable feature information from the raw data to predict outcomes. Deep learning has made great strides recently due to advent of user friendly libraries and programming environment such as Tensorflow, Keras, PyTorch, Pandas, and Scikit etc. Deep learning algorithms such as recurrent neural network (RNN), convolutional neural network (CNN) etc. are being extensively used in application, such as, computer vision

[17], health informatics [14], speech recognition [4]

, and natural language processing

[18] etc. In future, it is anticipated that majority operations in sixth generation (6G) cellular network will be solely catered by AI and deep learning algorithms [19].

An AI-enabled SON network will perform long and short term analysis on the data obtained from the end users and/or network [3]. This self-optimization will reduce the over all capital expenditures (CAPEX) and operational expenditure (OPEX) required for network planning and maintenance. For example, a key issue concerning increasing CAPEX and OPEX for service providers is the identification and remedy of anomalies that may arise within a cell. To learn and prevent the cell from going into the anomalous state, it is necessary for the network to predict the future traffic demands.

Fig. 2: Grid 01 Internet Activity for first 5 days.

In the past researcher have proposed to forecast the cellular traffic using statistical models, such, as Autoregressive integrated moving average (ARIMA) and its variants [15]

. A known limitation of ARIMA is that it reproduce the time series patterns based on average of the past values. However, ARIMA may fail to accurately predict traffic patterns in highly dynamic environments such as cellular network. Nevertheless, ARIMA can give a descent estimate of future traffic and may serve as a baseline prediction model.

Recently, deep learning based techniques to forecast any time series traffic is getting more popular. For cellular applications, deep learning techniques learn the past history of network traffic to train models such as vanilla feed-forward neural network (FFNN), recurrent neural network (RNN), or long-short-term-memory (LSTM) etc. In [13], researchers have proposed to use RNN with multi-task learning to design a spatio-temporal prediction model for cellular networks. Researchers in [20] have applied neural network models on cellular traffic to analyze trade activities in urban business districts. A comparative study between LSTM and ARIMA models was conducted by researchers in [1].

Inspired by the works presented earlier, in this paper we will use the real world call data record to forecast future cellular traffic using LSTM. In particular, we will compare our results with the ARIMA model and vanilla feed forward neural network (FFNN) models. We will demonstrate that LSTM models learn the traffic patterns very quickly as compared to FFNN and ARIMA models.

The rest of the paper is organized as follows. The system model is presented in Section II. The cellular traffic prediction model is presented in Section III. We discuss the results in Section IV followed by conclusion in Section V.

Ii System Model

Figure 1 shows the our system model which comprise of Long Term Evolution - Advanced (LTE-A) network. The architecture of LTE-A is broadly categorized into three layers. The core network (CN), the access network, and the end user equipment (UE) [5].

The wireless communication take place between a UE and evolved NodeB (eNB) over the access network which is called UMTS terrestrial radio access network (E-UTRAN) in LTE-A nomenclature. The core network, which is formally known as evolved packet core (EPC), makes essential the network level decisions. The EPC further contain several logical entities such as serving gateway (SGW), packet data network gateway (PGW), and mobility management unit (MMU) etc. Detailed explanation of these logical entities and LTE-A architecture is out of scope of current paper. Readers can refer relevant materials, for example [5]. The call data record (CDR) that we will use in this research was gathered at the EPC level layer. The execution of LSTM predictive model will also take place at this layer.

Ii-a Data Record Details

The call data record used in this research was published by Telecom Italia for Big Data Challenge competition [16]. Telecom Italia collected cellular and internet activities of its subscribers within the city of Milan in Italy. In the CDR, Milan city is divided into square grids. Each grid has a length of 0.235 Km and an area of 0.055 Km. The data record has been collected for 62 days, starting from 1st November 2013 till 1st January 2014. Data for the single day is stored in a single file which means that there are 62 files in the dataset. Readers can refer to [12] for detailed explanation on the CDR.

The spatio-temporal CDR contains following fields.

  • Grid ID.

  • Time Stamp: Raw timestamp was recorded in milliseconds units with the interval of 10 minutes.

  • Country code.

  • Inbound SMS Activity: Indicates the incoming SMS activity in a particular grid observed within 10 minute interval.

  • Outbound SMS Activity: Indicates the outgoing SMS activity in a particular grid observed within 10 minute interval.

  • Inbound Call Activity: Indicates the incoming calling in a particular grid observed within 10 minute interval.

  • Outbound Call Activity: Indicates the outgoing calling activity in a particular grid observed within 10 minute interval.

  • Internet Activity: Indicates the internet usage by cellular users in a particular grid observed within 10 minute interval.

Fig. 3: Single LSTM Cell.

The CDR does not specify activity in terms of particular units. However, an intuitive interpretations is that the activities are proportional to the amount of real traffic. For example, the magnitude of Inbound or outbound SMS activities are high for a greater number of SMS received or sent, respectively. The data was provided in the raw format. Hence, we will discuss the data cleansing method in the next step.

Ii-B Data cleansing

The CDR, in its raw format, could not be used to extract any meaningful information. Hence we applied data cleansing and filtering over the CDR. The timestamps were changed from milliseconds to minutes. There were some missing fields which we marked as zeros (0). There were multiple entry records for each timestamp. We summed them to make a single activity record per timestamp. Figure 2 shows the 24-hour Internet Activity for Grid 01.

In our prediction model, we have only used Internet traffic Activity. However, the our model can be used to predict activities for SMS and calls without any modification. We will discuss traffic prediction in the next section.

Iii Cellular Traffic Prediction

In this section, we will first briefly describe basics of feed forward and recurrent neural network (NN) followed by the LSTM based learning model.

Iii-a Feed Forward and Recurrent NN

In artificial neural networks, the nodes are connected to form a directed graph which is ideal for handling temporal sequence predictions. In vanilla feed forward network (FFNN), information flows only in forward direction. In FFNN, the input layer feeds the forward looking hidden layer for calculations and manipulations. The hidden layers forward the information to the output layer which produce regression of classification predictions.

Fig. 4: LSTM Network.

A NN maps inputs to the outputs by learning from the examples provided during the training phase and can be used for prediction in classification and regression problems. During the training process, the predictions are compared against the expected output values (often known as ground truth data) to calculate the loss function. In the beginning of the training, the loss function is usually quite high indicating the incorrect prediction by the model. With back propagation and gradient descent method, the model adjust the weights and biases corresponding to the input value to minimize the loss function. A fully trained NN has the minimal loss (also called as error) between the predicted and the expected output value

[6]. After successful training, the model is validated and a validation error is calculated. A model is fully trained for prediction when the training and validation errors are both minimized.

In recurrent neural network (RNN), though the learning process is the same as FFNN, the architecture is slightly different. RNNs takes the output of one layer, and feed it as the input to the next layer. Hence, each layer has information from the past input values. RNN considers the current input as well as the input received in the previous time steps during training and prediction. This enables RNN to learn the knowledge from the all previous time instances to make a well informed prediction for time series data.

However, vanilla RNNs have inherent vanishing and exploding gradient problem which halts the learning process as gradient either diminishes completely or explodes to very large value. Hence Long-Short-Term-Memory (LSTM), which is a variant of RNN was proposed in


. LSTMs were designed to avoid the long-term dependency issue, which is the cause of the vanishing-gradient problem in Vanilla RNNs


Iii-B Learning Through LSTMs

The structure of LSTM units (often known as cells) enable a neural network to learn long term dependencies. The learning processing is strictly controlled by multiple gates that allow (or bar) the flow of incoming data from the previous cell and/or input as shown in Figure 3. The standard LSTM unit is shown in Figure 3. There are three main gates in any LSTM unit, the forget gate (), the update or input gate (), and the output gate (). The cell state for the current unit is updated by the information passed through the update gate (). The candidate value for current cell’s state (i.e. ) is updated based on the information from the previous hidden state (i.e. ) and input . The update gate decides to allow or bar the flow of this candidate value to the output state. Finally the output gate allows the information to pass from the current cell. The forget gate lets the current cell keep or forget the state value from the previous time step. The prediction is made as

after passing through an activation function (often sigmoid or softmax).

Fig. 5: Traffic prediction with large-sized training set.

The LSTM cells are chained to form one layer of the LSTM network as shown in Figure 4. Each cells computes operation for one time step and transfer the output to the next cell. The number of cells in a LSTM network indicates the amount of observations of data that is being considered before making any prediction. For our case, the input is the internet activity and the number of observations is the amount of selected time steps T.

The expression for all the gates, cell states, out of the hidden layer, and the final prediction are given as below:


The final output is then calculated as :


In the equations above, symbol

represent the sigmoid function which is often known as the squashing function because it limits the output between 0 (gate OFF) and 1 (Gate fully ON). Formally, the sigmoid function is defined as

. Symbol is another squashing function and often

or rectified linear unit (relu) operations are used for

. Readers can refer to relevant literature to gather further information about these functions. [6]. The symbol represents the element-wise multiplication. Finally, and

are the vectors of weights and biases corresponding to the respective gates, hidden layer, input, and output layer. The exact values of these weights and biases are selected by the libraries described in the next sub-section.

Iii-C Training and Prediction Software

We have used Matlab for data cleansing and filtering. All the algorithms are implemented in Python using Keras and Scikit libraries with Tensorflow at the backend.

Fig. 6: Error for traffic prediction with medium-sized training set.

Iv Results and Discussion

In this section we will show the performance comparison of LSTM model with the base line ARIMA and vanilla feed forward neural network model. We have compared the performance of each technique with the ground truth test data from CDR. We have fixed the training epochs to 20 for each cycle. For LSTM model, we have used two hidden layers to make an even comparison with the FFNN and ARIMA. The first hidden layer contains 50 LSTM cells followed by a dense layer with single unit. The FFNN contains two hidden layers with first layer containing 5 activation units activated by relu operation. The second hidden layer contains one non-linear activation unit. Training and validation losses are calculated using mean absolute error.

Fig. 7: Traffic prediction with medium-sized training set.

Figure 5 shows the traffic prediction by LSTM, FFNN model, and ARIMA model. We used 7142 samples for training LSTM and FFNN models. For validation and testing, we used 893 samples for each case. The LSTM and FFNN both learned the pattern in less than 5 epochs due to large number of training examples. It can be observed that LSTM and FFNN predictions match to that of the ground truth data. The ARIMA model predicts very close to the ground truth but does not exactly match the traffic pattern.

Fig. 8: Traffic prediction with small-sized training set.

We later reduced the training samples to 3571. The training and validation errors for this case are shown in Figure 6 and the prediction results are presented in Figure 7. We can observe that LSTM and FFNN still predict very accurately. The ARIMA baseline model however does not exactly match the ground truth traffic. It should be noted from Figure 6 that when we reduced the number of training samples, the training and validation error for LSTM converges to near zero (0) only after 2 epochs. However, FFNN took at least 10 epochs to fully train the model to enable accurate predictions. Nevertheless, both models’ errors converged to zero before the 20 epochs limit.

When we further reduced the training samples to 892, we observed that after training the models for 20 epochs, the FFNN could not predict according to actual ground truth data. In fact, its performance worsened even to that of the ARIMA model. The LSTM, on the other hand, very accurately predicted the traffic activity. This is due to the fact that LSTM trained the network within 20 epochs and the training and validation error converged to zero as shown in Figure 9. On the other hand, the error for the FFNN remains high even after 20 epochs. Interestingly, FFNN could estimate patterns of future traffic, however, with very low accuracy.

V Conclusion

In this paper, we presented cellular data traffic prediction using recurrent neural network, in particular, with long-short-term-memory model. We demonstrated that LSTM and vanilla feed-forward neural networks predict more accurately as compared to the statistical ARIMA model. However, the LSTM models were shown to be learning more quickly as compared to the FFNN, even with a small amount of training data sample. As our future work, we are working to design a LSTM based resource allocation method for 6G networks.

Fig. 9: Error for traffic prediction with small-sized training set.


  • [1] A. Azari, P. Papapetrou, S. Denic, and G. Peters (2019) Cellular traffic prediction and classification: a comparative evaluation of lstm and arima. In International Conference on Discovery Science, pp. 129–144. Cited by: §I.
  • [2] P. Cerwall, A. Lundvall, P. Jonsson, R. Möller, S. Bävertoft, S. Carson, and I. Godor (2018) Ericsson mobility report 2018. Cited by: §I.
  • [3] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah (2019) Artificial neural networks-based machine learning for wireless networks: a tutorial. IEEE Communications Surveys & Tutorials 21 (4), pp. 3039–3071. Cited by: §I.
  • [4] L. Deng, G. Hinton, and B. Kingsbury (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. Cited by: §I.
  • [5] A. ElNashar, M. A. El-Saidny, and M. Sherif (2014) Design, deployment and performance of 4G-LTE networks: A practical approach. John Wiley & Sons. Cited by: §II, §II.
  • [6] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT press. Cited by: §III-A, §III-A, §III-B.
  • [7] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §III-A.
  • [8] S. Jaffry, S. F. Hasan, X. Gui, and Y. W. Kuo (2017) Distributed device discovery in prose environments. In TENCON 2017-2017 IEEE Region 10 Conference, pp. 614–618. Cited by: §I.
  • [9] S. Jaffry, S. F. Hasan, and X. Gui (2018) Effective resource sharing in mobile-cell environments. arXiv preprint arXiv:1808.01700. Cited by: §I.
  • [10] S. Jaffry, S. F. Hasan, and X. Gui (2018) Shared spectrum for mobile-cell’s backhaul and access link. In 2018 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. Cited by: §I.
  • [11] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §I.
  • [12] M. S. Parwez, D. B. Rawat, and M. Garuba (2017)

    Big data analytics for user-activity analysis and user-anomaly detection in mobile wireless network

    IEEE Transactions on Industrial Informatics 13 (4), pp. 2058–2065. Cited by: §II-A.
  • [13] C. Qiu, Y. Zhang, Z. Feng, P. Zhang, and S. Cui (2018) Spatio-temporal wireless traffic prediction with recurrent neural network. IEEE Wireless Communications Letters 7 (4), pp. 554–557. Cited by: §I.
  • [14] D. Ravì, C. Wong, F. Deligianni, M. Berthelot, J. Andreu-Perez, B. Lo, and G. Yang (2016) Deep learning for health informatics. IEEE journal of biomedical and health informatics 21 (1), pp. 4–21. Cited by: §I.
  • [15] Y. Shu, M. Yu, O. Yang, J. Liu, and H. Feng (2005) Wireless traffic modeling and prediction using seasonal arima models. IEICE transactions on communications 88 (10), pp. 3992–3999. Cited by: §I.
  • [16] (2014) Telecom Italia, Open Big Data, Milano Grid. Note: Online External Links: Link Cited by: §II-A.
  • [17] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience 2018. Cited by: §I.
  • [18] T. Young, D. Hazarika, S. Poria, and E. Cambria (2018) Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine 13 (3), pp. 55–75. Cited by: §I.
  • [19] A. Zappone, M. Di Renzo, and M. Debbah (2019) Wireless networks design in the era of deep learning: model-based, ai-based, or both?. arXiv preprint arXiv:1902.02647. Cited by: §I.
  • [20] Y. Zhao, Z. Zhou, X. Wang, T. Liu, Y. Liu, and Z. Yang (2019) CellTradeMap: delineating trade areas for urban commercial districts with cellular networks. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 937–945. Cited by: §I.