I Introduction
The overall traffic generated by mobile networks continues to accelerate. Telecom operators are expanding capacity by acquiring radio spectrum and deploying new base stations; however, at the cost of Capital Expenditure (CAPEX) and operation expenditure (OPEX). Network management automation is a key enabler for dynamic network optimization and reducing CAPEX/OPEX costs.
Accurate prediction of key performance indicators (KPIs) has become increasingly important as it can help telecom operators for better network optimization and network planning. For example, with realtime predictive analytics, network can be intelligently configured at the right time and at right place. Thus, the radio resources are better utilized with optimizations such as dynamic load balancing/resource allocation, adaptive traffic treatment and adaptive scheduler selection [DBLP:journals/corr/abs180304311]
. Therefore, the use of machine learning, analytics and artificial intelligence is inevitable for network intelligence. Accordingly, ORAN alliance is leading the industry to embed intelligence in every layer of RAN architecture for closed loop automation
[oran].Realtime prediction of traffic/KPIs in wireless networks is however challenging from following perspectives:

Streaming network measurement: a distributed system is needed to collect network measurement data from network elements with strict latency requirement.

Multiscale temporal and spatial dependency: recent history captures instant momentum of traffic change; while, periodicity (daily/weekly pattern) and seasonality (monthly/yearly trend) capture global trends. Lack of multiscale and longrange temporal structure in the model will lead to inaccurate prediction. Furthermore, traffic in different geographical locations could correlate with each other due to user mobility.

Network configuration change: network configurations generally undergo constant changes, which will impact the KPIs and the user’s behavior.

External influence: regular traffic patterns can be distorted by external factors such as weather, holiday and local events (e.g., incidents, festival/sport activities, etc.).
It is a big challenge to capture all these factors in a single model, largely due to the highdimensionality of the input/output of the model (a.k.a., the curse of highdimensionality). In addition to model development, realtime prediction needs significant computation resources due to low latency requirements of prediction. This requires us to develop an universal model that can be developed and maintained to predict various KPIs for each and every cells in the network. Note that creating models at the granularity of per cell and per KPI basis is infeasible as it increases latency and computational difficulties for the KPI prediction in production implementation.
In this work, we introduce an efficient and effective solution for realtime traffic prediction, as a major step towards building an ecosystem that enables proactive network planning and optimization for the next generations of radio access networks.
Our major contributions are summarized as follows:

We propose a generic hierarchical deep learning framework that predicts various KPIs at the cell level.

Our model can capture instantaneous, periodic seasonal temporal patterns, spatial patterns, as well as heterogeneous external factors, such as network configurations, day of week, etc.

We perform extensive experiments, with realworld LTE network streaming measurement data and validate the performance superiority of DeepAuto over traditional supervised learning models, timeseries models.

We propose a realtime prediction framework for RAN optimization.
The remaining of the paper is organized as follows: Section II, discusses related work. Section III provides background and system architecture. Section IV proposes our DeepAuto hierarchical deep learning model. Section V provides performance comparison of the proposed approach with the benchmark methods. Section VI summarizes our work with our concluding remarks.
Ii Related Work
Timeseries models such as Moving Average (MA) or Autoregressive integrated moving average (ARIMA) have been used in [arima_shu], [arima_zhou]
to predict the future traffic load. MA can predict a single feature based on its own historical data, leaving other impacting factors unconsidered. Regression models (such as ARIMA, Vector Autoregression, etc) allow extra features/variables to be included in the model but usually can only handle small input/output dimensions. Random Forest, a supervised machine learning algorithms has been used in
[Zhang:2017:TPB:3139958.3140053]to predict traffic load. The tree based ensemble supervised algorithms (such as random forest, gradient boosting machine etc) handle highdimension input/output but usually ignore the sequential/temporal dependencies among the inputs or are difficult to model due to increased complexity. Traditional methods usually feed all factors indiscriminately, which often leads to a model with huge parameter space and difficult to optimize.
Deep learning has been used in a variety of contexts in mobile networks. [DBLP:journals/corr/abs180304311] provides a survey of all the deep learning applications. Cell load prediction has been studied in [Jin_infocom, DBLP:journals/corr/abs180900811]. [Jin_infocom, DBLP:journals/corr/abs180900811] show that the traffic demand exhibits spatial and temporal patterns, which help to predict the traffic load. [Jin_infocom, DBLP:journals/corr/abs180900811] study the cell load prediction by spatiotemporal analysis on a grid framework using Long short term memory (LSTM). However, this framework is unsuitable when the cell is not placed in regular grid, which usually is the case. Furthermore, the analysis is not scalable for nationwide coverage due to high training and prediction complexity. Even though models using LSTM are good at modeling the sequential dependency, but they fail to capture mixed sequence inputs at various temporal scales.
Iii System Architecture
Fig. 1 provides focus area for LTE/5G RAN applications using artificial intelligence. Network operators over years have developed significant number of data sources from various network elements that may be used to perform classifications and develop predictive algorithms. Potentially these predictions will provide more insights into operations and enable opportunities for performance optimizations. Our aim here is to provide a prediction framework which is extensible across multiple cell level (such as cell load, channel quality) and user (UE) level KPI predictions (such as UE throughput, UE latency, UE BW demand and UE location predictions).
We illustrate our work using most important cell level KPIs including cell load prediction and radio channel quality prediction. Key applications such as adaptive scheduler selection and cell load balancing will be enabled using cell load and channel quality KPI prediction.
Fig. 2, presents the general framework of our reusable prediction engine. For this prediction, we use realtime data collected from various network elements. The realtime collection platform then publishes the data into streamprocessing software such as Apache Kafka [Narkhede:2017:KDG:3175825]. The overall latency of the collection system is order of few seconds. Thus, we are able to perform short term prediction with a horizon from a few seconds to a few hours. The data is then consumed by various short term predictors. The predictions are exposed via microservices that provide a unified interface to other downstream applications such as load balancing. The realtime data collection architecture is well suited for 5G automation.
Iv Hierarchical Deep Learning Model
Problem definition: the KPI prediction is framed as timeseries prediction and in general using Nonlinear Autoregressive with exogenous inputs (NARX) framework [billings].
(1) 
where represents the vector of variables of interest at time . is the externally determined variables that have potential impact on the target, and is the error term. Eqation 1 can be further written as , where is the vector concatenation of and . Given the historical observations , the goal is to learn a nonlinear function to predict .
KPIs in cellular networks can be generated and aggregated with different spatial granularities. Depending on the target of interest the spatial granularity can range from a single cell or a base station to a spatial region covering a group of lowerlevel entities. The future dynamics of KPIs often rely heavily on the recent momentum, periodic patterns and seasonal trends. A KPI, e.g., cell load, can highly correlate with other KPIs such as number of active user and throughput, as well as KPIs of neighboring cells due to spatial interactions captured by user mobility. On the other hand, network configuration updates can have potentially systematic impact on network KPIs, while other external factors, such as weather and local events often lead to abrupt changes. We proposed, DeepAuto, a hierarchical deep learning model architecture as shown in Fig. 3, that can capture heterogeneous temporal, spatial and external factors in a compact and structural way.
The proposed solution ingests streaming network measurement data collected from cellular network. It typically includes periodic samples of cell performance counters and eventdriven UE session data. The model aggregates measurement data for each spatial unit (e.g., cell load for a cell) and calculates a set of KPI as a timeseries with predefined time granularity . Both raw streaming data and calculated KPI timeseries are: a) stored in appropriate database as historical data for model training; and b) fed into deployed model for realtime online prediction. The model receives input as the recent, near and distant temporal KPIs from given historical observations to model the multiscale temporal structure of locality, periodicity and seasonality. The local input is denoted as with timestamps used. The periodic input is denoted as , where is the period, typically one day. Likewise, the seasonal part is denoted as , where is a large period capturing the seasonal trend, typically weekly or monthly.
Multiple recurrent neural networks are horizontally stacked to model the multiscale temporal dependency, that are, , and . The function
represents the recurrent neuron. In particular, we use Long ShortTerm Memory (LSTM). Unlike classical RNNs, LSTM addresses the problem of longterm dependencies by introducing a purposebuilt
memory cell [hochreiter1997long][rumelhart1986learning] to store information of previous time steps. Access to memory cells is guarded by “input”, “output” and “forget” gates. Information stored in memory cells is available to the LSTM for a much longer time than in a classical RNN, which allows the model to make more contextaware predictions. One typical implementation is via iterating the following composite functions:(2) 
where
is the logistic sigmoid function,
are the bias terms and and , , and are the input gate, forget gate, output gate and cell vectors respectively, all of which have the same size as the hidden state vector . The weight matrix indicates the connections between gates, the cell, input and hidden states.External features
are extracted from network configurations and streaming external data, such as weather data, weekdays/weekends/holidays. Feedforward neural networks are applied to learn embeddings of the effect
.Finally a fusion layer is designed to aggregate the effects of all factors. Specifically, , where notation denotes vector concatenation. A final fullyconnected layer is applied to predict the target KPI .
Spatial Dependency: DeepAuto model allows to include spatial dependency between network entities. It can be determined via traffic interaction and statistical correlation analysis using historical data. A spatial graph is first built where nodes are spatial units and edge weights capture interaction intensity. For the KPI prediction of single spatial unit (e.g., a cell): top neighbors are selected via the ranking of edge weights, e.g., using KPI correlation coefficients. The KPIs of neighbors can be concatenated into vector as the model input.
V Experiments and Evaluation
The objective of DeepAuto and the reusable prediction engine is to provide shortterm (seconds to few minutes) and midterm (hours) forecasts of various cell level and UE level KPIs. In this section, we illustrate our model using two important cell level KPI prediction i) cell load prediction ii) channel quality prediction.

Cell load prediction: The objective is to predict average cell load a.k.a. Physical Resource Block (PRB) utilization in the next 1 min, 15 min and 1 hour for each cell. PRB utilization for each LTE subframe is the percentage of resource blocks used within each LTE subframe. Average PRB utilization at the cell is computed as the mean of the PRB utilization of each subframe.

Channel quality prediction: For channel quality prediction, we use Reference Signal Received Quality distribution (RSRQ), an indicator of interference experienced by the user. RSRQ is reported in the radio resource control (RRC) measurement report [3gpp.36.133] with a typical periodicity of 5 seconds. Here, our objective is to predict the aggregate RSRQ distribution over next 5 minutes.
In the experiments that follows we have provided detailed evaluation of our framework for cell load prediction objective. Finally, we briefly provide results of the channel quality prediction.
Va Performance Results for cell load prediction
We perform our experiments using different datasets in two phases. In the first phase we characterize and show superior performance of DeepAuto against various baseline algorithms. In the second phase, we build a realtime production grade prediction model using a large scale dataset and evaluate performance of future predictions against various metrics.
i) Batched PM counters data: In the first phase, we collected Performance Measurements (PMs) counters from eNB across the nation aggregated at 15 minutes intervals [3gpp.32.425]. We collected 3 months of data from April 2018June 2018 for nearly 1.5k cells within the same geographical area (corresponding to one spatial cluster) and about 1M records collected at interval of 15 min. The amount of raw data collected is around 441 MB (compressed).
ii) realtime streaming data: We collected realtime Cell Traffic Recordings (CTR) [3gpp.32.423]
from various network management system (NMS) across the nation from nearly 500k cells aggregated at an internal of 1 minutes. The data volume collected over 14 days during July 2018 is over 400 GB (compressed) in size. Compared to PM counters, realtime streaming data contains significant number of missing values. Thus compared to CTR data, PM data is more reliable for prediction; however, it incurs additional collection latency. For PM and realtime streaming data, missing values are filled by using linear interpolation.
VA1 Evaluation (Phase 1)
Feature  RMSE 

=5, =0, =0  0.0675 
=10, =0, =0  0.067 
=20, =0, =0  0.0668 
=20, =1, =0  0.064 
=20, =2, =0, external features  0.0628 
Next, we present the exploratory experiments conducted using PM counters dataset. In order to examine the existence of longterm and/or shortterm repetitive patterns in cell load, we plot autocorrelation of cell load in Fig. 4 for a randomly selected cell. It depicts that correlational is high for 1 day and 1 week. This implies importance of inclusion of periodic and seasonal patterns.
Training DeepAuto model
: We use modified mean square error (MMSE) as a loss function during training phase.
(3) 
where is the true value, is predicted value for example and dimension. Bias is added to give more importance to those critical examples, e.g., overloaded cell utilization, to achieve a better prediction for critical load.
The temporal feature set for a cell at time includes cell load and number of UEs at cell at time . Additional external features at time include day of the week, hour of the day, cell configuration details such as band, power and bandwidth. During training, data is split in train, validation and test set while maintaining the temporal order of observations in the ratio of 4:1:1 respectively. As the machine learning models are sensitive to the scale of the inputs, the data are normalized into the range [0, 1] by using feature scaling. DeepAuto accuracy is improved by hyperparameter search. First, we perform a search over parameters , and for local, periodic and seasonal trends. Table. I
shows the performance of DeepAuto for 1 step prediction (15 min horizon) as we optimize temporal features selection by varying the temporal parameters. As expected, including periodic and seasonal pattern improved the accuracy of the results. The performance is improved by optimizing learning rate (
) and batch size. Finally parameter for the loss function is optimized. After hyperparameter search we use batch size of 1024, and .Algorithm  Horizon  RMSE  MAE  MAPE 
DeepAuto  15 min  0.0628  0.0425  12.5 
Naive  15 min  0.074  0.053  18.2 
Random Forest  15 min  0.0642  0.0432  13.1 
XGBoost  15 min  0.0638  0.0431  12.9 
DeepAuto  120 min  0.094  0.065  19.9 
Naive  120 min  0.140  0.09  27.0 
Random Forest  120 min  0.098  0.067  21.1 
XGBoost  120 min  0.098  0.067  20.9 
Next, we provide comparison of DeepAuto model against various baseline algorithms. Details of the baseline algorithms used in comparison are presented below:

Naive: In this method, prediction at time .

Random Forest: We used random forest model implemented via H2O.ai [h2o_Java_software]. The prediction result is optimized by varying the number of trees in {50, 100, 200}, splits rate at each node in {0,8, 1.0} and depth of the tree in {6, 10, 15}.

XGBoost: In this method, we used XGBoost from H2O.ai [h2o_Java_software]. The prediction result is optimized using the same parameter set as mentioned in Random Forest.
For comparison between various baseline algorithms, we use the PM counter batch data source as we were unable to train baselines models with CTR data due to sheer amount of volume. For fair comparison, we use the same feature set including temporal and external features for Random Forest and XGBoost as that of DeepAuto.^{1}^{1}1Randomforest and XGBoost algorithms in general require the complete training dataset to be loaded into memory for fair comparison. It is usually not feasible to make use them for large scale training.
Table II
compares the performance of DeepAuto under metrics including Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) with the horizon of 15 min and 120 minutes. For MAPE we only consider cells with cell load greater than a threshold of 70% to reduce the bias for low load cells, where 70% is the sum of mean and standard deviation of the cell load in the training dataset. DeepAuto performs superior compared to other baseline methods in all metrics considered. For 15 min horizon, DeepAuto showed upto 15% improvement in RMSE compared to naive method, 2.5 % improvement compared to random forest and 1.5% improvement compared to XGBoost. For longer horizon of 120 mins, DeepAuto showed upto 32% reduction in RMSE compared to naive method, 4% improvement over XGBoost. We observe that the performance improvement of DeepAuto over other method improves with longer horizon.
VA2 Evaluation (Phase 2)
After initial investigation and validating superiority of DeepAuto, we developed a production grade model for a large scale network deployment. During phase 2, we utilize realtime streaming data source with a latency of 1 min. Similar to phase 1, we optimize DeepAuto accuracy by optimizing hyperparameters. We build a model for each of the network management system (NMS) where the realtime data is first received. Prediction phase uses realtime Apache Kafka [Narkhede:2017:KDG:3175825] feed from nationwide eNBs. The prediction engine runs at every regional center close to each of the network management system (NMS) to reduce prediction latency. The model generated from the training phase is used to predict the future cells loads for next 1 min, 15 min and 1 hour at a granularity of 1 min. The results are then fetched by various microservices as needed to cater to various applications. The trained model is regularly updated to capture any tending traffic changes not captured by the model.
Metric  Horizon  

1 min  15 min  1 hour  
RMSE  0.083  0.066  0.067 
MAE  0.053  0.043  0.044 
MAPE  14.1  12.0  13.04 
Table III describes the performance of DeepAuto under various metrics MAE, MAPE and RMSE while predicting cell load for future horizons including next 1 min, next 15 min average cell load and next 1 hour average cell load. Here, we have used average cell load for prediction instead of instantaneous cell load due to noisy nature of CTR data and possibly unreliable predictions. For MAPE we only consider high load cells where cell load is greater than 60% to reduce the bias of low load cells. Note that 1 min data is noisy due to realtime nature of data and presence of various missing data points compared to the batched data source. Even though DeepAuto allows to exploit spatial dependency, maintaining model for each cluster is restrictive in production. Furthermore, our analysis observed that the use of spatial relationship did not always help to improve the performance. Hence, we decided to deploy a single global model for each of the NMS and maintain the latency requirement with minimal loss of accuracy.
VB Performance Results for channel quality prediction
We utilize DeepAuto framework for predicting the RSRQ distribution. The RSRQ values from the UE are reported at every 5 seconds interval within a range from 0 to 34. To make our analysis more tractable, we group the RSRQ values for each cell by timestamp at 5 min interval. The objective is to predict RSRQ probability distribution function (PDF) in the next 5 mins for each cell. We use realtime streaming data source with a volume of about 386 MB (compressed) from 1.5k cells. We use KullbackLeibler (KL) divergence as a loss metric for comparing true and predicted distribution. The loss function used during training and testing is given by:
(4) 
where is the KL divergence, is the number of LTE cells in the dataset and is the KL divergence calculated at cell as:
(5) 
where is the actual PDF and is the predicted PDF. Similar to the cell prediction, we use external features such as cell configuration, day of week, hour of day and minute of day.
Feature  KL divergence 

=5, =0, =0  0.038 
=10, =0, =0  0.0365 
=20, =0, =0  0.036 
=25, =1, =1  0.036 
=25, =1, external features  0.0353 
Table. IV compares different combination of features, locality, periodicity, seasonality, and externality. As expected, including additional temporal improves prediction accuracy. Note that traditional statistical/machine learning methods seem unsuitable for this problem. Thus, baseline performance is not provided for other methods such as ARIMA, randomforest. Naive method of using as prediction resulted in KL divergence of 0.14 while DeepAuto achieved a KL divergence value of 0.0353 (over 75% improvement compared to naive method).
Vi Conclusion
Accurate forecasting of RAN KPIs represents an essential part LTE/5G RAN automation. We provided an unified, efficient and effective traffic prediction architecture that predicts various RAN KPIs in real time. We presented the prediction model DeepAuto, hierarchical deep learning framework, that constructively captures spatial, temporal and external factors, as well as network configuration changes in a scalable manner. We validated our framework using two KPI prediction: cell load prediction and channel quality prediction. We showed that DeepAuto is able to forecast accurately over both short term to medium term time horizon. Specifically, DeepAuto reduced the prediction error by upto 15% in RMSE for short term cell load prediction, 32% gain in long term cell load prediction and 75% improvement in KL divergence for channel prediction compared to the naive method of using recent measurements.