Edge Intelligence in Softwarized 6G: Deep Learning-enabled Network Traffic Predictions

07/31/2021 ∙ by Shah Zeb, et al. ∙ Gwangju Institute of Science and Technology SEECS Orientation 0

The 6G vision is envisaged to enable agile network expansion and rapid deployment of new on-demand microservices (i.e., visibility services for data traffic management, mobile edge computing services) closer to the network's edge IoT devices. However, providing one of the critical features of network visibility services, i.e., data flow prediction in the network, is challenging at the edge devices within a dynamic cloud-native environment as the traffic flow characteristics are random and sporadic. To provide the AI-native services for the 6G vision, we propose a novel edge-native framework to provide an intelligent prognosis technique for data traffic management in this paper. The prognosis model uses long short-term memory (LSTM)-based encoder-decoder deep learning, which we train on real time-series multivariate data records collected from the edge μ-boxes of a selected testbed network. Our result accurately predicts the statistical characteristics of data traffic and verify against the ground truth observations. Moreover, we validate our novel framework model with two performance metrics for each feature of the multivariate data.



There are no comments yet.


page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The successful commercialization of 5G networks paves the way for discussion on the next evolution towards 6G networks and defines it’s vision and requirements [1]

. While 5G architecture is built on service-based architecture (SBA), the vision of 6G hyper-flexible architecture revolves around the state-of-the-art artificial intelligence (AI)-native design that brings the intelligent decision making abilities in futuristic applications of digital society such as, digital twin-enabled self-evolving innovative industries, etc. 

[2]. The Third Generation Partnership Project (3GPP) is reportedly shifting towards new AI-inspired models to monitor and enhance the SBA performance [3].

The critical functional component to the enhanced core network will be the seamless convergence of AI models, software-defined network (SDN) and network functions virtualization (NFV)-enabled communication networks, and edge/hybrid cloud-native computing architecture [4]. Similarly, virtualization and containerization are an integral part of cloud-native computing infrastructure, which lies in the domain of software development and IT operations (DevOps) [5]. By adopting DevOps-based design strategies, telecom companies can successfully provide and break down purpose-built hardware services (i.e., edge computation) and software-based solutions into real microservices (i.e., orchestrate application deployments) [6]. These microservices then move towards the edge of a network to satisfy key performance indicators (KPIs), e.g., data rates, network latency, energy efficiency [7]. Collectively, these new features contribute towards the intelligence in visibility services (i.e., monitoring KPIs) to futuristic innovative network operations and management systems for accurate network traffic flows forecasting. Besides, during the network operations, the data traffic flow having a time-series (TS) data nature behaves non-linearly with aperiodic characteristics in an increasingly dynamic and complex network environment [8]. Similarly, the integration of popular Internet-of-thing (IoT) technology with a de-facto cloud/edge computing model increases the importance of visibility services in inferring future traffic behavior from forecasted traffic for providing enhanced quality-of-service (QoS) and quality-of-experience (QoE) [9].

Despite significant potential usage of AI and other technologies, their widespread deployment is yet to be seen for predicting network traffic as there are many challenges to their comprehensive adoption in networked systems.

Insufficient resources for TS data: Storage and computational resources needed for executing AI algorithms over network data flows near the edge network are limited and insufficient [2]. Similarly, as high number of IoT devices connect to access networks, traffic volume unprecedentedly grows.

Need for large high-quality labeled data

: Machine learning (ML) algorithms need a large amount of labeled data for model training and learning, while most of the data from traffic flow at network points are unlabelled TS raw data that needs to be processed 

[8]. Moreover, software and hardware network configurations, highly random sporadic geographic events, and network infrastructure distribution, and others phenomenon can lead to an abrupt change in traffic flows, effecting the quality of labeled data for the AI model construction.

Optimized network architecture for AI: The design of existing network infrastructure lags behind the support for AI-inspired applications and services. Network resources can be drained off with the deployment of AI-based solutions [10]. Therefore, the current networking infrastructure needs to evolve and adapt cloud-native AI deployment strategies that can provide a balanced support for both the AI-based microservices functions and other diverse network functions.

One potential solution to approaching the preceding challenges is adopting edge intelligence or edge-native AI framework design that integrates AI models, state-of-the-art communication networks, and cloud-native edge computing design to accurately predict traffic flows. In this work, we propose a novel idea of intelligent method for analyzing and predicting the data traffic of network at edge devices from the context of emerging 6G vision of softwarized networks. For this purpose, we utilize the testbeds resources of cloud-native enabled OpenFlow over Trans-Eurasia Information Network (OF@TIEN++) Playground, a multisite cloud connecting ten sites in nine Asian countries over TEIN network [11]. The main contributions of this paper are as follows.

  • We present the novel method for predicting statistical characteristics of data traffic inflow at the edge devices of the cloud-native enabled network using a deep learning (DL) method.

  • For this purpose, we collect the time series raw data of traffic flow at the edge of the network, which is sent to the visibility center for storage and processing.

  • For using AI service, we orchestrate the Kubeflow deployment at the orchestration center, which is used to develop and train the DL model for the prognosis.

  • We train the model on collected TS data and predicted the statistical properties of data traffic. Moreover, we analyze the developed model’s performance in terms of root-mean-square error (RMSE) and coefficient of determination (R2) metrics.

The rest of this article is structured as follows. Section II presents the experimental model. Section III provides the discussion on designing Kubeflow-based deployment of a learning service. Section IV presents the system validation and results. Finally, Section V concludes this paper.

Ii Experimental Model Description

This section provides the details of designing the designing of Kubernetes (K8s)-based edge cluster with GIST playground control center, packet tracing and flows summarization for dataset collection.

Ii-a OF@TEIN Playground Overview

The SDN-enabled multi-site clouds of OF@TEIN playground (OPG) interconnect multiple National Research and Education Network (NREN) of partner countries, enabling miniaturized academic experiments. Launched in 2020, it served as an Open Federated Playgrounds for AI-inspired SmartX Services that has support for IoT-SDN/NFV-Cloud functionalities. From Fig. 1, we can observe the communication layered illustration of multi-site OPG infrastructure. The playground (PG) within OPG supports the logical space in each centralized location/site called SmartX PG Tower designated to develop, administer, and utilize the resources of distributed server-based hyper-converged cloud-native special boxes automatically. It maintains and dynamically distributes numerous physical and virtual resources for developers to execute research experiments and validate operational and development requirements in real-time. Note that SmartX PG Tower leads the monitor and control function of network by installing and utilizing multiple operating centers, which are Provisioning and Orchestration (P+O), Visibility (V), and Intelligence (I).

Fig. 1: Illustration of K8s-based edge cluster over OF@TEIN playground.

Ii-B K8s Edge Cluster over OF@TEIN Playground

For this work, we selected the P+O center of PG Tower at GIST site to orchestrate and deploy AI-based learning microservice using Kubeflow to forecast traffic flows based on the accumulated multivariate processed data set. We extract processed data from the strings of measured visibility flows collected from the SmartX MicroBox (-box) intended for the data lake storage in the visibility center. We placed these -boxes at the multi-site edge locations of the OPG, having computing-storage-networking resources to allow IoT-Cloud-SDN/NFV functionalities-based experiments. To provide tenacious multi-access networked connectivity in each -box, we enabled three network interfaces, i.e., two wired network interfaces and one wireless connectivity interface. Two network interfaces are assigned public Internet Protocols (IP)-based addresses and configured as control interface and data interface. Moreover, data from the connected IoT devices or data lakes is over-the-air offloaded to -boxes. Afterward, we prepared and configured each -box as K8s-orchestrated worker nodes to support cloud-native containerized functionalities, which are provisioned and managed from K8s master in the P+O center of GIST PG. Each -box has SDN-coordinated unique/dedicated connectivity with other boxes with support for mesh-style networking, forming K8s edge clusters over OPG networked environment. A private IP addressing scheme is employed inside each -box which the K8s master manages for orchestrated pods and container communication.

Ii-C Network Traffic Data Set Collection

We use extended Berkeley Packet Filtering (eBPF)-enabled packet tracing tools such as IO Visor for measuring statistical summary of network traffic data. IO Visor-based packet tracing employs the eBPF core functionalities [12] which enables in-kernel virtual machines (VMs) with byte-code tracing program execution. IO Visor has the main advantage to monitor and trace user and kernel events (through kprobe and uprobe), providing statistics in maps fetched on the points of interest [13]. To collect network packet flows periodically, we implement a data collection software that leverages eBPF and IO Visor to directly accumulate raw packets from the network interface of -box and enables information compilation from each packet with a small number of CPU cycles (c.f  Fig. 2). While Apache-spark with Scala application program interface (API) is utilized to facilitates scalable, high-throughput, fault-tolerant flows processing. The processed data flows generate five features containing a statistical summary of the network traffic packet. Later the generated multivariate data are pushed out using Spark stream and stored at MongoDB as NoSQL database leveraging distributed messaging store of Apache Kafka.

Iii Kubeflow-based AI Service Design

Fig. 2: Design of Kubeflow-based AI service orchestration in K8s cluster of GIST site’s P+O center.

Kubeflow consists of toolsets that enable and inscribes numerous critical stages of the ML/DL development cycle, i.e., preparing data, model learning, experimentation and tuning, feature extractions/transformation. Moreover, Kubeflow leverages the benefits of the K8s cluster HPC capabilities for container orchestration and auto-scaling computing resources for ML/DL jobs/pipelines. Therefore, we deploy and orchestrate the Kubeflow using K8s Master at the P+O center of GIST PG to perform the high-performance data analytics (HPDA) by leveraging K8s cluster capabilities (c.f. Fig. 

2). It enables the platform for accurate data traffic prediction by applying DL algorithm on collected TS data flow, which is explained in the subsequent sections.

Iii-a DL-based Data Prediction Model

Fig. 3: Folded representation of RNN and LSTM unit cells.

Traditional neural networks (NN) are unable to utilize the information learned in the previous steps (past observations) to make the spatio-temporal learning on TS data and predict traffic features accurately. Numerous recurrent NN (RNN) algorithms are developed for the prediction problems based on the unit RNN cell architecture (c.f. Fig. 


) due to their natural interpretation property of TS data analysis, and they allow data information to persists by connecting the previous informational state to the present task as an input. However, their performance suffers from the constraints in learning long-term dependencies/correlations of the TS data because of the vanishing gradient problem. In this paper, we use uses a sequence-to-sequence (s2s) deep learning model for the prediction based on long short-term memory (LSTM) neuron cells. In the following subsections, we first explain the LSTM cell and then discuss the encoder-decoder architecture based on the LSTM cell for prediction.

Iii-A1 LSTM Cell

LSTM cell overcomes the RNNs vanishing gradient problem using back-propagation algorithms over time [14]. The derivatives of the error for learning newly updated weights do not quickly vanish as they are distributed over sums and sent back in time, enabling LSTM units to learn and discover long-term correlated features over lengthy sequences in input multi-variate data. As shown in Fig. 3

, an LSTM cell receives an input sequence vector

at current time which, together with the previous cell state , and hidden state , is used to trigger the different three gates by utilizing their activation processing units. Note that onward in study, bold variable notation denotes a vector. A cell state of the LSTM unit can be considered as a memory unit, and its state can be read and modified through connected three gates. These LSTM unit gates are, 1) forget gate (), which sets and decides what information to discard based on assigned condition, 2) input gate (), which updates the memory cell state based on assigned conditions, and 3) output gate (), which sets the output depending upon the input sequence and cell state with assigned conditions. The gates and cell updates of the LSTM unit at time can be formulated as

where, is the element-wise multiplication. Please note that each selected gate has a distinctly associated weight vector

, and a bias vector

, that are learned throughout the change of state and new information addition during processing in the training phase. Moreover, each LSTM gate uses specific activation function (c.f. Fig. 

3) for processing, i.e., sigmoid () or hyperbolic tangent ([15, Sec. 3].

Fig. 4: The schematic illustration shows the proposed DL-based prediction model with encoder-decoder architecture.

Iii-A2 LSTM cell-based Encoder-Decoder Model

We used the single layer encoder-decoder architecture that employs the LSTM cells (c.f. Fig. 4) in each layer to predict the vector set of output sequences of data traffic, , based on the set of TS input sequences, , which represents all the past observations of collected traffic. Note that represents the current time-dependent sequence vector containing the observation values for number of multivariate traffic features, is the lookback length of time, is the horizon window of future prediction and, . The encoder produces the encoded temporal representation of current information sequences

through LSTM units in single layer. The encoded output sequence vector is provided to the LSTM decoder through a repeat vector. Similarly the encoder status of LSTM units are also simultaneously passed to the decoder units. Then, the decoder uses the cell state of repeat vector as initial temporal representation to reconstruct the target output feature prediction of network data. Now the final objective of the prediction model training problem is minimizing the output error

for training samples in each -th current time to find the optimized parameter space, , as,



is the selected Huber loss function for this study, and given as


In (1) and (2),

is the hyperparameter cut-off threshold to switch between two error functions (squared loss and absolute loss) which is 1 in our study,

represents the -th feature observation value at each time-step in the input/predicted sequence of data, and comprises of learned weights and biases vectors at each time-step.

Iv Experimental Results Analysis

Device Type Specifications
Ubuntu 16.04.4 LTS OS, Intel Xeon®CPU E5-2690
V2@3.00GHz, 12x8 GB DDR3 Memory, 5.5 TB
HDD, 4 network interfaces of 1 Gbits/sec (Gbps)
Provisioning &
Ubuntu 18.04.2 LTS OS,SYS-E200-8D SuperServer
with Intel Xeon® D-1528 with 6 Cores @1.90 GHz,
32 GB RAM, 480 GB Intel SSD DC S3500
Ubuntu 18.04.2 LTS OS, Intel Xeon® Scalable 5118
12x2 Cores @2.3GHz, Samsung 256 GB DDR4 RAM,
512x2 GB and 1.6x4 TB SSD, Mellanox 100G SmartX
NIC (2 ports), 16x6 GB Tesla T4 Nvidia® GPUs
Edge -box
Ubuntu 18.04.2 LTS OS, Supermicro SuperServer
E300-8D (Mini-1U Server) with 4 @2.2GHz Intel
Cores, 32 GB Memory, 240 GB HDD, 2x10 Gbps
+ 6x1 Gbps network interfaces
TABLE I: Device Specifications on Control Tower and -boxes

In this section, we discuss the methodology and experimental results obtained on the real network traffic dataset collected from the edge -boxes of playground testbed resources to analyze and evaluate the proposed prediction model and validate its effectiveness in terms of two performance metrics (RMSE and R2).

We collected multivariate network flow data at the visibility center for one month with an interval gap of 5-minutes using eBPF-based packet tracing software (Sec. II-C

). The collected dataset comprised 43000 time-series records with five features representing statistical features of data flow, divided into training (65%) and test/validation (35%). To train the prediction model, we deployed the Kubeflow and trained the LSTM-based encoder-decoder model on Jupyter Notebook with DL libraries (Tensorflow & Keras) and Scikit-learn library on the accessed dashboard of deployed Kubeflow (c.f. Fig. 

2). This enables us to run the DL workloads on a fully automated and scalable cloud-native environment. GPU resources of Intelligence Centers are used to run the deep learning jobs. Table. I shows the hardware specifications of used resources for this experiment.

Fig. 5: Training loss and validation loss curves for our proposed DL-based prediction model.
Fig. 6: Average databytes in data traffic predicted against the collected groundtruth observations.

A single-layer encoder and decoder architecture with time distributed layer is implemented with 100 LSTM cells in each layer. We applied the min-max function in the Scikit-learn library to normalize the value of the observed features in the range of

. We trained the model on the past 20 hours of observation to predict the next 10 hours of output samples, which are compared with the groundtruth observations for model validation. Hyperparameters like batch size and epoch time for learning are kept fix at 32 and 40, respectively. We selected the Adam optimizer to learn the optimized parameters while minimizing the

in (2) during the training process. We applied a callback utility function ”Learning Rate Schedule” to obtain the updated learning rate value through training from the defined range of . It uses the updated learning rate on the Adam optimizer with the current epoch and current learning rate. Fig. 5 shows the trend of loss function during the training and validation stage of prediction mode against the epoch time. It reveals that beyond the epoch interval of 20, the validation and training loss is comparatively low and converging to avoid overfitting/underfitting in the prediction model.

Fig. 7: Predicted minimum databytes in data traffic against the collected groundtruth observations.
Fig. 8:

Predicted standard deviation in databytes for data traffic predicted against the collected groundtruth observations.

The trained prediction model predicts the future samples of each statistical feature of network packet traffic in the selected future horizon window of 10 hours. Each predicted feature sample is verified against the ground truth observation and plotted. Fig. 6 presents the statistics of average (mean) data bytes recorded in the network flow at the edge -boxes against the predicted average data bytes by our learning model. Similarly, Fig. 7 and Fig. 8 show the predicted statistics of minimum (min) and standard deviation (std) in observed data bytes of the network packet flow against the ground truth observations. These results show that our learning model can accurately learn and predict the future trend in mean, min, and std statistics of data bytes at -boxes over time and matches the recorded ground truth observations.

Lastly, Fig. 9 shows the predicted total traffic of the packet flows at the edge devices at the edge layer against the observed ground truth. It learns the total data bytes statistics trend, which depicts the network traffic load over a specific period of time. Note that the statistics of avg, std, min, and maximum (max) data bytes are recorded in kilobytes (KB) while total data bytes in megabytes (MB) units. Collectively predicting these five features characterizes the network traffic trend and accurately gives insight into the traffic statistics at the -boxes of the edge layer.

Fig. 9: Total databytes in data traffic predicted against the collected groundtruth observations.

To analyze and validate the deviations between the learned model prediction samples and ground truth observations, we evaluated two performance metrics, RMSE and R2, of each feature of data traffic. RMSE can take values from a range of , while R2 takes a value in the range of . Table. II shows the performance metrics for each data feature, showing that most of the features RMSE is closer to zero while R2 values are closer to 1. Low values of RMSE and R2 value closer to 1 imply that the learned model can accurately predict the five multivariate statistical features of data traffic. It can be observed that the performance of our model in predicting avg, min, max, and std statistics is better than total bytes as RMSE and R2 score of four features is reasonably good compared to the total data bytes.

V Conclusion

Emerging computing techniques, AI and state-of-the-art communication network enablers (SDN/NFV) are critical parts of the 6G vision, increasing the importance of intelligent network management. We proposed a DL-based novel intelligent prognosis technique for predicting statistical properties of data traffic incurred at the edge devices of the network. For this purpose, we captured, collected, and pre-processed the live TS traffic data from the edge -boxes using the testbed network resources at GIST PG and stored it in the visibility center. We orchestrate the Kubeflow deployment using K8s master at the orchestration center to train the LSTM-based seq2seq DL model on the collected TS data. We predicted various features of data traffic based on past observation into the future horizon window of 10 hours. We evaluated the predicted future observations with ground truth observation in terms of RMSE and R2. Results showed that our model accurately predicts the future observations of all features. For future work, network resource automation and scaling based on predicted traffic can be explored.


This publication has been produced with co-funding of the European Union (EU) for the Asi@Connect Project under Grant contract ACA 2016-376-562. The contents of this documents are the sole responsibility of GIST and can under no circumstances be regarded as reflecting the position of the EU.

RMSE 5.33 8.63 6.03 231.64 33.12
R2 0.968 0.909 0.954 0.686 0.946
TABLE II: Prediction Performance for various features of Data Traffic


  • [1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,” IEEE Netw., vol. 34, no. 3, pp. 134–142, 2019.
  • [2] K. B. Letaieff, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The roadmap to 6G: AI empowered wireless networks,” IEEE Commun. Mag., vol. 57, no. 8, pp. 84–90, 2019.
  • [3] S. M. A. Zaidi, M. Manalastas, H. Farooq, and A. Imran, “SyntheticNET: A 3GPP Compliant Simulator for AI Enabled 5G and Beyond,” IEEE Access, vol. 8, pp. 82 938–82 950, 2020.
  • [4] Y. Xiao, G. Shi, Y. Li, W. Saad, and H. V. Poor, “Toward self-learning edge intelligence in 6G,” IEEE Commun. Mag., vol. 58, no. 12, pp. 34–40, 2020.
  • [5] M. Waseem, P. Liang, and M. Shahin, “A systematic mapping study on microservices architecture in DevOps,” Journal of Systems and Software, vol. 170, p. 110798, 2020.
  • [6] O. Arouk and N. Nikaein, “Kube5G: A cloud-native 5G service platform,” in IEEE GCC, 2020, pp. 1–6.
  • [7] W. Xia, Y. Wen, C. H. Foh, D. Niyato, and H. Xie, “A Survey on Software-Defined Networking,” IEEE Commun. Surveys Tuts., vol. 17, no. 1, pp. 27–51, 2015.
  • [8] Q. Zhaowei, L. Haitao, L. Zhihui, and Z. Tao, “Short-term traffic flow forecasting method with M-B-LSTM hybrid network,” IEEE Trans. on Intell. Transp. Syst., pp. 1–11, 2020.
  • [9] M. Alsaeedi, M. M. Mohamad, and A. A. Al-Roubaiey, “Toward adaptive and scalable OpenFlow-SDN flow control: A survey,” IEEE Access, vol. 7, pp. 107 346–107 379, 2019.
  • [10] X. Tang, C. Cao, Y. Wang, S. Zhang, Y. Liu, M. Li, and T. He, “Computing power network: The architecture of convergence of computing and networking towards 6G requirement,” China Communications, vol. 18, no. 2, pp. 175–185, 2021.
  • [11] M. A. Rathore, M. Usman, and J. Kim, “Maintaining SmartX multi-view visibility for OF@ TEIN+ distributed cloud-native edge boxes,” Trans. Emerg. Telecommun. Technol., p. e4101, 2020.
  • [12] M. A. Rathore, A. C. Risdianto, T. Nam, and J. Kim, “Comparing IO visor and Pcap for security inspection of traced packets from smartx box,” in Advances in Computer Science and Ubiquitous Computing.   Springer, 2017, pp. 1263–1268.
  • [13] B. Gregg, “Linux 4. X tracing tools: Using BPF superpowers,” 2016.
  • [14] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The performance of LSTM and BiLSTM in forecasting time series,” in IEEE Big Data, 2019, pp. 3285–3292.
  • [15]

    Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. L. Zhao, “Time series classification using multi-channels deep convolutional neural networks,” in

    International conference on web-age information management, 2014, pp. 298–310.