On Network Traffic Forecasting using Autoregressive Models

Various statistical analysis methods are studied for years to extract accurate trends of network traffic and predict the future load mainly to allocate required resources. Besides, many stochastic modeling techniques are offered to represent fundamental characteristics of different types of network traffic. In this study, we analyze autoregressive traffic forecasting techniques considering their popularity and wide-use in the domain. In comparison to similar works, we present important traffic characteristics and discussions from the literature to create a self-consistent guidance along with the survey. Then, we approach to techniques in the literature revealing which network characteristics they can capture offering a characteristic-based framework. Most importantly, we aim to fill the gap between the statistical analysis of those methods and their relevance with networking by discussing significant aspects and requirements for accurate forecasting from a network-telemetric perspective.


page 1

page 2

page 3

page 4


Forecasting Busy-Hour Downlink Traffic in Cellular Networks

The dramatic growth in cellular traffic volume requires cellular network...

CyberBunker 2.0 – A Domain and Traffic Perspective on a Bulletproof Hoster

In September 2019, 600 armed German cops seized the physical premise of ...

Exploratory Data Analysis of a Network Telescope Traffic and Prediction of Port Probing Rates

Understanding the properties exhibited by large scale network probing tr...

Deep Echo State Networks for Short-Term Traffic Forecasting: Performance Comparison and Statistical Assessment

In short-term traffic forecasting, the goal is to accurately predict fut...

Scaling Migrations and Replications of Virtual Network Functions based on Network Traffic Forecasting

Migration and replication of virtual network functions (VNFs) are well-k...

Understanding the Modeling of Computer Network Delays using Neural Networks

Recent trends in networking are proposing the use of Machine Learning (M...

An Overview of Self-Similar Traffic: Its Implications in the Network Design

The knowledge about the true nature of the traffic in computer networkin...

I Introduction

In the early years of the Internet, network traffic had been modeled with relatively easy statistical approaches. There were only a few commonly-used services and protocols, and they are actively used by a very limited number of users. In contrast, it is much harder to predict traffic patterns and characteristics in today’s communication systems. Even if many different techniques are embodied for analysis and prediction, several concerns must be addressed for an accurate traffic engineering. Time-series models are quite popular to extract the temporal patterns of network traffics and make predictions depending on those patterns [25].

There are different approaches for forecasting such as exponential smoothing [26, 46], wavelets [49, 60] and hybrid methods including multiple approches [38, 50]

. Besides, neural networks (NNs) and autoregressive models are two of frequently-used group of techniques for network traffic prediction in practice. They are considered as the fundamental elements of forecasting toolbox. Today, network traffic forecasting with NN is quite popular as a different approach than traditional stochastic modeling

[56, 19, 63]. They detect patterns and structures in input data, learn through many iterations and use such experience to evaluate new data similar to learning process of human beings. NNs are more successful to capture complex relationships in data thanks to their non-linear nature but the data required for training is much higher in comparison to autoregressive models. Even if NNs seem like the primary alternative for them, autoregressive methods are dominantly studied especially for the prediciton of network traffic excluding other domains.

To the best of our knowledge, there are only a few surveys in the literature on network traffic forecasting [27, 3] and the existing ones do not even touch the significant network flow aspects. Besides, there is not a systematical study that builds a grounding for traffic forecasting research offering an analysis framework. In this study, we present a self-consistent study that analyzes requirements, characteristics, and examples of temporal autoregressive models for forecasting since they are mostly employed and practically used models for network traffic prediction. Rather than examining the statistical foundation of the models, we review all aspects of forecasting from a higher-level networking perspective. Fig. 1 shows a mindmap that summarizes all important headlines of the study. Accordingly, our contributions are listed as:

  • We review the relevant dynamics of autoregressive modeling techniques which are common in various studies (Section II).

  • We discuss different characteristics of time-series data from networking perspective for a better comprehension of the forecasting studies rather than touching analytical details (Section III). Moreover, we use such characteristics as a framework to analyze forecasting studies.

  • We present a short analysis of different aspects of traffic flows (Section IV-A) and analyzed various autoregressive studies concerning about such aspects and characteristics (Section IV-B). We also group the studies under a more general meta-framework apart from the characteristics.

  • We point out common issues and challanges, and also possible research directions in general (Section V).

Fig. 1: A summarizing picture for the whole study.

Ii Brief Introduction to Autoregressive Modeling

Autoregressive (AR) models are stochastic models that consume the input values (of past) in a time sequence into a regression function to predict future values for related time-series. Autoregressive Moving Average (ARMA) [7], Autoregressive Integrated Moving Average (ARIMA) [4], Fractional ARIMA (FARIMA) [53], Seasonal ARIMA (SARIMA), Autoregressive Conditional Heteroskedasticity (ARCH) [10], Generalised ARCH (GARCH) [15], Exponential GARCH (EGARCH), Autoregressive Conditional Duration (ACD) [16]

, Stochastic Autoregressive Mean (SAM), and Nonlinear Auto Regressive with Exogenous (NARX) are falling into this category. Indeed there are statistical differences, for instace, while ARIMA models focus on conditional mean through temporal series, ARCH methods take conditional variance into consideration for modeling. In this study, we especially focus on those techniques under autoregression scope considering their practicality, relatively shorter modeling duration, less data requirement, and lower complexity. The dynamics of those models is simple shown in Fig.

2 for a better understanding. Note that, it is a very brief illustration and omits iterative processes which are required for optimization of the model.

Fig. 2: Abstract block diagram for the dynamics autoregressive models.

AR models consist of three phases: (i) Statistical modeling with respect to some criteria, (ii) parameter estimation and (iii) forecasting. (i) The first phase is related to detect correlation in time-series using autocorrelation functions (ACFs) and identify the model based-on widely-used criteria such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)

[13]. (ii) Secondly, related coefficients of the identified model are estimated using well-known estimation methods such as Maximum Likelihood Estimation (MLE) and Least Mean Square Error (LMSE). After parameter estimation, (iii) future-points of the time-series are predicted and the accuracy is presented with respect to different metrics.

Lastly, there are a number of metrics to evaluate the performance of time-series modeling techniques. All of the presented studies in Section IV-B employs one or more metric(s) to analyze the accuracy of their proposed techniques and compare with other techniques. Generally, after the model is constructed, such metrics are used to compare fitted (i.e., predicted values according to the model) and actual values. Table I shows those metrics briefly and they are associated to the studies presented in next sections.

Abbreviation Metric Uniteless Scale-free
APE Absoulute percentage error
MAE Mean absolute error
MARE Mean absolute relative error
MAPE Mean absolute percentage error
MPE Mean percentage error
MSE Mean square error
NMSE Normalize mean square error
NRMSE Normalize root mean square error
RMSE Root mean square error
SER Signal-to-error ratio
TABLE I: List performance metrics commonly used in different studies. If a metric is uniteless, it is defined as a ratio or percentage independent by its actual measuring unit. Scale-free, on the other hand, represents normalized metrics.

Since dozens of studies analyze the statistical aspects of AR models in depth in various domains, we are not taking this approach here and keep related discussion limited with this section. The terms touched here may be considered as a guidance for a better comprehension of the rest of this study, especially in Section IV.

Iii Traffic Characteristics

Network traffic may reflect various characteristics that are vital to detect for accurate forecasting. Those characteristics are not only observed in network traffics but any subject analyzing temporal data such as economics, physics, and psychology. In this section, we introduce six characteristics from a network-telemetric perspective. Those characteristics also construct our comparison framework to analyze and compare network traffic forecasting studies in Section IV-B.

Self-similarity. Being introduced 40 years ago [33], self-similarity of network traffic is discussed in a number of studies [17, 39]. In a very practical sense, self-similar objects are observed in the same shape when magnified or shrunk. From the network traffic perspective, it means that a proportional segment of measuring (e.g., number of packets or amount of data during a certain of time with a predefined granularity) tends to be observed in a different time scale. In this sense, self-similarity is also related to long-range dependency in practice.

Seasonality. It is quite common to observe nearly the same patterns with a certain frequency in any domain of temporal measuring. Typical weather conditions in each ”season” of a year are great examples. Similarly, for network traffics, weekend and weekdays, holidays or certain hours of a day show very similar patterns, periodically. Comprehension of the seasonality in data flows is crucial to analyze the nature of traffic, and also forecast the future traffic load possibly reflecting congruent patterns.

Non-stationarity. In a stationary process, the mean, variance, and correlation model stay constant over time and it is one of the very common assumptions for time-series modeling and stochastic processes. However, network traffic may show changing statistical characteristic leading to a change in modeling as well [29, 8, 37]. Therefore, before forecasting, a time-series model should be capable and sensitive to detect such changes that are observed frequently depending on various factors such as the number of users, connections, and bandwidth utilization of related network elements in practice.

Multifractal. In aggregated network traffics (i.e., consisting of multiple flows originated by multiple sources), it is possible to observe self-similar characteristics of individual network flows. This type of traffic flows are not only indicated as self-similar or fractal but multifractal [43]. Detecting multifractal behaviors and fractal patterns of multiple flows simultaneously are naturally important yet challenging for forecasting.

Long-range dependency (LRD). Various time-dependent systems or physical phenomena show correlated behavior during large time scales. For instance, Hurst confirmed such a situation in Nile River’s repeating rain and drought conditions observed a long period of time [24] and Hurst parameter becomes the fundamental detection technique of the LRD. For network traffic, especially the Internet, the long-range dependency is observed in different significant studies [21, 30]. Together with the self-similar properties, the LRD shifts the traffic modeling perspective from memoryless stochastic processes (e.g., Poisson) to long-memory time-series.

Short-range dependency (SRD). In comparison to the LRD, the correlation can be observed for shorter time scales in short-range dependent processes. That is, the dependence among the observations quickly dissolves and it is related to quickly decaying correlations. Many traditional time-series modeling techniques examine the SRD; however, it is not enough to reflect today’s network traffic pattern. SRD can be considered while forecasting short-term traffic employing relatively low-complexity models.

Note that those characteristics are mutually complementary. For instance, LRD and self-similarity require similar methods to be detected but they are not the same: the former is strongly related to time-dependency while the latter one covers the variance in scale as well. However, the common case is being dependent on scaling patterns in long time intervals. It is also possible to analyze such characteristics through multifractal traffic analysis. Similarly, the seasonality actually infers the non-stationary characteristics but while some studies directly consider seasonality, the others are looking for the non-stationarities from a wider perspective. Therefore, we present those elements separately as they are individually addressed in various works in the literature.

Iv Forecasting in the Literature

After a high-level introduction on the autoregressive models and the definition of traffic characteristics, we relate all that background to forecasting methods in the literature discussing a number of significant aspects. In this section, first, we present further key aspects that fundamentally reshape network traffic analysis pointing to significant studies. Then, we review various studies that exemplify what is covered so far.

Iv-a Analysis on Different Aspects of Traffic

Before presenting the temporal techniques for traffic modeling, we introduce some short discussions about the different aspects of network traffic. We selected those aspects since they are considered as milestones in forecasting domain that change research perspective and lead to more accurate forecasting or comprehension to underlying reasons for the limited success in flow prediction. Through this section, we evaluate important studies on the analysis of such aspects for a better understanding and further thinking on forecasting.

On self-similarity. Self-similarity was a complex issue and significantly required novel stochastic techniques to be considered in forecasting. In [31]

, the nature of self-similarity in Ethernet traffic is discussed both practically and statistically. The authors analyzed self-similarity relying on rescaled range analysis (through Hurst parameter), variances in aggregated processes and periodogram-based analysis in frequency domain. It is revealed that the stochastic processes to model Ethernet traffics are not capable to reflect their self-similar nature which is proven with statistical analysis on a four-year captured Ethernet traffic. Moreover, the authors suggest Hurst parameter is much effective to detect burstiness in traffic in comparison to known parameters such as the index of dispersion, peak-to-mean ratio, and coefficients of variation. Lastly, they list two ways to generate network traffic satisfying self-similarity property: fractural Gaussian noise (FGN) and chaos maps. The study is quite revealing (though its age, +25 years) to deeply understand the underlying statistical properties of the real Ethernet traffic and also the analysis of self-similarity.

On forecasting range. It is important to estimate the limits of success in forecasting and define requirements precisely for accurate resource allocation for the potential traffic. In [41], the authors focus on three fundamental issues of the traffic forecasting: (i) how far into the future a traffic can be predicted, (ii) how much resources are required to minimize uncertainty and (iii) which characteristics of traffic are the most effective. For (i), they define the maximum prediction interval (MPI) with having a limited prediction error

(e.g., 20%) with a certain probability

(e.g., 99.9%) during an interval of time. Besides, the tradeoff between maximizing MPI and minimizing prediction error is well-discussed. (ii) In most of the referenced studies here, first- and second-order statistics of the traffic are considered and there needs to be a certain amount of resource for accurate statistical modeling. (iii) According to the study, the prediction efficiency may not be directly related to the selected method (which are ARMA and MMPP here) but the nature of traffic. For instance, while the ethernet traffic is not predictable with a certain error for a reasonable MPI, the study shows much better performance on Internet traffic due to the higher multiplexing factor. That is, it is not efficient to take sample traces (e.g., sessions) for prediction if the traffic is fed from many different (i.e., statistically independent) sources exactly like the Internet. Similarly, the efficiency in the use of certain characteristics needs to be discussed per scenario. The study draws the conclusion that the LRD may not matter in traffic management for delay-sensitive services, accordingly.

On heavy-tail distribution. Heavy-tail distribution in network traffic is shown to be strongly related to the transfer size and interarrival times with the self-similar nature of the (Internet) traffic [12]

. Heavy-tail distribution is defined as having a heavier tail than the exponential distribution

[6]. It easily misleads the traffic forecasting methods relying on basic statistics. In [18]

, the main focus is on detection and characterization of heavy-tail distribution. The existing estimators are not accurate to reflect heavy-tail characteristics since they have idealized assumptions such as stationarity and independence. Therefore, first, the authors present quantile-quantile and complementary cumulative distribution function (CCDF) plot for the detection of heavy-tails indicating the drawbacks of those methods. Then, the performance of four different estimators for the characterization (i.e., detecting the tail exponent) and their sensitivity to noise are analyzed. To compensate for their weaknesses, the authors propose a new wavelet-based method to filter long-range dependent data and increase the efficiency of previously used estimators. In the end, the homogeneity of long-range variance (i.e., time-varying LRD exponent) is discussed. Even if the study does not directly present a forecasting method, it is worth pointing out to understand the effects of heavy-tail distribution and LRD for the analysis of time-series data.

On fractality. Self-similar traffic patterns are widely detected using Hurst parameter; however, it is rather complicated to analyze multifractal flows following the same procedure. Complex (and high-speed as stated in the study) networks generally comprise multiple self-similar and independent flows that can be modeled as different stochastic processes and cumulative analysis of such fractal flows (i.e., considering as a single flow) shows a multifractal behavior. In [36], the authors relate multifractal patterns of the flow to the locality phenomenon of Hurst parameter in self-similar networks. It is also indicated that varying Hurst parameter would be the key method to generate traffic flows with multiple self-similar characteristics.

On complexity of stochastic modeling.

Without inspecting for correlation in any scale, it is questioned that if we can model various network traffics using simpler approaches e.g., Poisson distribution in packet interarrival times. It indeed directly depends on the nature of the traffic. For instance, according to

[40], modeling TCP traffic with Poisson (or other models) cannot capture LRD and burstiness and eventually results in degrading performance of forecasting in terms of average packet delay or maximum queue size. That is, a deeper analysis and more sophisticated modeling are required to represent such traffic e.g., TCP in wide-area networking. On the other hand, high aggregation on the Internet (for instance a traffic sample captured from the backbone traffic) may nearly follow Poisson distribution [45] and it eventually indicated the weakness of bursty traffic in the backbone traffic.

Iv-B Autoregressive Modeling Techniques

There are a number of studies that uses time-series statistical methods by modifying them according the nature of the applied network traffic. In this section, we present how they modify or enhance the techniques in Section II to satisfy various requirements. Table II shows comparative analysis taking different network characteristics into consideration. One can argue that the basic dynamics of some methods address several characteristics by default. For instance, while differencing in ARIMA is a solution to eliminate non-stationarity, ARMA automatically detects short-range patterns. However, a characteristic is marked for a study in Table II only if related study directly addresses a problem related to that particular characteristic specifically and its effectiveness is shown using required measurement techniques.

Fig. 3: The studies are divided into three main focal groups, dealing with repetitiveness, volatility, and dependency of network traffic.

The studies here take similar statistical approaches but focus on different characteristics. It is possible to divide them into three groups in terms of such characteristics to handle, repetitiveness, volatility, and dependency. Repetitiveness represents cyclic and usual patterns and it is directly related to self-similarity and seasonality. Volatility, on the other hand, covers the varying properties of network traffic such as non-stationaries and multifractals. Lastly, dependency represents the time-dependent characteristics which are long- and short-range dependency. This classification can be considered as a meta-framework corresponding to the characteristics presented in Section III. Fig. 3 groups the studies in this section according to this meta-framework.

As seen in Fig. 3, many studies have commonly addressed multiple characteristics. It is also reasonable to explain cross-relationships between categoricals through the figure. For instance, repetitiveness and dependency may be very closely-related since they both lead to temporal correlations. However, while dependency mainly focuses on timing issues, repetitiveness is also related to scaling in magnitude with an observable pattern. Volatility may require different analysis based-on decomposition but still has intersections between the others since it targets instability in both time and scale. Addressing the problems on all three categories is possible, though. Such studies are generally hybrid methods and expected to have higher complexity. In the rest of this section, the studies falling into those categories are discussed. Note that since half of them belongs to multiple groups, they are not gathered under individual headlines and sorted in chronological order.

Study Technique(s) Parameter Estimation Evaluation Domain Self- similarity Seasonal Non- stationarity Multifractal LRD SRD
Vujicic et al.[48] SARIMA - NMSE Public Safety
Jun et al.[28] ARMA MLSL LSL MSE Internet
El Hag and Sharif [14] AARIMA - MAE WAN
Anand et al.[2] GARCH MLE NMSE(-like) Internet
Yu et al.[59] ARIMA MLE MAPE Cellular Network
Yu et al.[58] APM MLE MAPE Cellular Network
Zhang and Huang [61] ACD Particle Filter BHHH RMSE Data Center
Hu et al.[23] X-12 ARIMA STL - APE SNMP
Yu et al.[57] FARIMA MLE MMSE APE MAPE Cellular Network
Yimu et al.[54] GARCH LMD MLE RMSE P2P Multimedia
Markovic et al.[35] SARIMA Manual RMSE MAE Multimedia
Xu et al.[52] NARX-RF L-BFGS APE Cellular Network
TABLE II: List of autoregressive modelling and prediction methods.

It is quite likely that different network mechanisms cause traffic variations at different timescales. Therefore, it is generally hard to statistically model network traffic at once. In [34], the authors decompose the traffic into different timescales using a-trous Haar wavelet transform. Then, they apply different ARIMA models to each wavelet (i.e., traffic at different timescales) for one-step forecasting at seconds granularity. Combining the prediction for each, the forecast traffic which covers varying characteristics of the traffic through time is obtained. The results are compared with an NN-based method using the very same training data and it is shown that the proposed method shows better performance in terms of NMSE and MARE.

ARIMA/GARCH [62] offers a combined technique targetting various network characteristics such as SRD, LRD, self-similarity and multifractal. Using the abilities of ARIMA in linear traffic and GARCH for changing variance, the authors design a one-step predictor which shows a potential to be extended to make multi-k (or multi-step) predictions. Similar to [2], ARIMA/GARCH has a parameter estimation phase to tune the parameters of both ARIMA and GARCH using MLE based on the Box-Cox [5] method. The technique shows better performance than FARIMA in terms of signal-to-error ratio (SER) in various timescales and also experimental multi-k predictions. However, the authors do not present concrete accuracy results other than SER. Therefore, the correlation between predictions and actual traffic data is not directly observable.

In [48]

, instead of analyzing aggregated traffic, the authors profile user behaviors based on their hourly call-rate using K-means clustering method. They divide users (call-groups in the study context) into three groups as low, medium and high call-rate by the clustering method. Then, each group is separately modeled using SARIMA models for daily and weekly cyclic patterns and then add them up to make complete prediction reflecting all users’ behavior. The main motivation is taking advantage of well-defined group characteristics for more accurate predictions rather than the whole data which is relatively harder to model due to its complex and aggregated nature. However, it is not feasible to work on maximum granularity (i.e., per-user prediction) though. Therefore, the authors aim to provide (a degree of) scalability by grouping the users while increasing the prediction accuracy in comparison to modeling aggregated traffic. The results show that 57% of group-based predictions gives better results than the predictions on aggregated traffic in terms of NMSE. Besides, group-based forecasting paves the way of profiling individual users as far as they can be identified under a forecasting group.

[28] proposes a modified version of Least-square lattice (MLSL)[1]

method to calculate related autoregression (AR) parameters dynamically. MLSL relies on adaptive filter theory. Instead of evaluation of model parameters once using a set of training data, it dynamically updates AR model parameters per input (i.e., packet). The authors modified LSL to reduce computation cost and increase convergence speed. In comparison to least-squared method (LS) and ARMA, MLSL shows higher accuracy and faster convergence experimenting on synthetic data that have short-term dependence characteristics. For data generation, the authors use inverse Fourier transform to generate Fractal Gaussian Noise given Hurst parameter. Therefore, the data show self-similarity as well (as SRD).


proposes the Adjusted ARIMA Model (AARIMA) for modelling Internet traffic data at millisecond time scales. The authors speficially address self-similarity and LRD in Internet traffic whose samples are captured from Bellcore Internet Wide Area Network. They have shown that even if the residuals of ARIMA models give residuals with a white noise distribution, the models may not offer sufficient goodness of fit statistics. Therefore, they offer AARIMA as a quick and simple modeling method by modifying ARIMA where the first difference of the stationary series added as a regressor. Especially for modeling Internet traffic, the results show that AARIMA gives lower MAE finalizing in higher number of iterations in different datasets. Note that in terms of modeling phases presented in Section

2, evaluating AARIMA is exactly the same with ARIMA as specifically claimed by the authors and it satisfies all requirements for reliable residuals (e.g., white noise distribution, Box-Jenkins tests).

In [2], an enhanced (or generalized) ARCH model (GARCH) is introduced to develop a one-step predictor for non-linear traffic models e.g., Internet. The authors point out that constant-variance models like ARIMA and its successors (e.g., FARIMA and SARIMA) cannot fit the bursty (and non-linear) nature of the Internet traffic whereas GARCH is taking conditional variance into account to react changing traffic patterns. To be able to determine GARCH parameters, MLE is deployed using the training data. The results show that the forecast error of GARCH is significantly less than the ARIMA-ARCH model for one-step prediction (i.e., comparing to ARIMA(1,1,1)-ARCH(1)). However, its performance is open to validation in less aggregated traffics other than the Internet.

Despite its simplicity, [59] offers a leaner ARIMA differencing process by converting multiple stationarization operation (for both trend and seasonal patterns) to a single multiplicative process. The authors speficially target to eliminate seasonal patterns (i.e., analyzing 12-months (or lag) autocorrelation results) and also extracting 6-month patterns from a large network traffic data based-on a set of Chinese regions. They present single- and multi-step prediction MAPEs for each month; however, the results are not compared any other prediction method that is applied to related data. A similar approach is also presented in [9] namely Multiple Seasonal ARIMA model to obtain a model covering two different seasonal features in wireless network traffic using 5-minutes sampling.

In consideration of relatively complex and time consuming process of ARIMA, accumulation predicting model (APM) is proposed in [58]. APM especially addresses the traffic patterns with stable seasonal characteristic. For the detection of stable seasonality, the ratio of partial accumulation to total accumulation (i.e., cumulative traffic load for a certain amount of months to the whole year) is evaluated and constant (e.g., linear in the study) changes are interpreted as a reflection of the characteristic. Then, such interpretation is used to predict monthly traffic for the next year. When it is compared to ARIMA, APM results in lower MAPE in the detection of the total monthly traffic; however, ARIMA is more successful to estimate average daily traffic (with relatively higher APE deviation).

It is usually quite hard to model and forecast network traffic in minute-granularity. In [61], the authors propose an autoregressive model for short-term prediction, in minutes. Autoregressive conditional duration (ACD) [47]

is employed to model the distribution of interarrival times of continuous traffic for a certain duration. Related parameters of ACD is estimated using Berndt-Hall-Hall-Hausman (BHHH) algorithm. The traffic, as claimed by the authors, is non-stationary and has a non-Gaussian distribution. Then, the particle filtering method is applied considering the extracted distribution model for the prediction of the packet traffic in the upcoming minute. The results show that the proposed method is able to predict one-minute traffic (in MBs) with less error than ARMA in terms of RMSE.

Detecting seasonality is one of the crucial tasks for decompositon of the traffic. In [23]

, the adjustment of seasonality is detailedly investigated to understand underlying clearly and make more accurate forecasting. For the adjustment, the authors handle missing values first and soften outliers dividing them into four different types. Then, using seasonal-trend decomposition using Loess (STL)

[11] and X12-ARIMA [20], they decompose the data into three components: trend, season and irregularities. An extra step, diagnostics, is taken to examine the stability of adjusted data series with a persistent model for new data feeds. It is revealed that considering a daily seasonality leads to most accurate decomposition and show the trend with minimum irregularities after this procedure. The results show that the one-step forceasting after cleaning dailiy seasonality on Simple Network Management Protocol (SNMP) data gives the minimum forecasting error in comparsion to six other benchmarking methods including Holt-Winters [22, 51]

, ARIMA and linear regression. Another similar study working on SNMP data is conducted addressing the seasonality

[55]. After seasonal adjustment using STL, the authors also analyze self-similarity. Lastly, they apply ARIMA to remove residual autocorrelation on the adjusted data. The results show the success of the proposed study on forecasting network utilization under stationary and non-stationary assumptions, and varying size of training set with one-day seasonal cycles.

Even if short- and long-range dependency in various types of network traffic are discussed, fractal characteristics may be harder to detect. FARIMA, for instance, is designed to handle fractal patterns considering inadequency of ARIMA. In [57], the authors discuss the points where FARIMA fails, the multifractal characteristics. When hourly traffic of sequential days is examined in mobile 3G downlink traffic, it is shown that self-similarity and multifractal patterns exist and FARIMA would not be enough to model such patterns for the prediction. Therefore, a combined technique of FARIMA and ARIMA is embodied to eliminate (a) fractal, (b) long-range dependent and (c) short-range dependent characteristics successively. Finally, an effective method examining the change in Hurst parameter for predicted data is used for forecasting. The results show that the combined method results in less than 8% APE and nearly 2% MAPE that are considered as reasonable error rates for daily forecasting.

One of the significant outcomes of traffic forecasting is increasing the quality of service (QoS) by predicting user demand. In [54], the authors analyze peer-to-peer (P2P) video sharing to satisfy QoS requirements in multimedia traffic. They, first, determine the fundamental problems of such traffic which are claimed as self-similarity and LRD in the study. To deal with those characteristics, the whole temporal data of network traffic is divided smaller-term time series, or Product Functions (PFs), using local mean decomposition (LMD) [42]. LMD helps to process the series as multiple short-range series independent by its long-term patterns. Then, several PFs are iteratively forecast using GARCH and the predicted values are summed up to obtain a final prediction. The results are compared ARMA and WNN models in terms of RMSE and it is shown that proposed method offer more accurate predictions.

After high-definition videos have changed multimedia trends, upcoming 4K/8K videos are expected to become popular in the same direction. However, increasing resolution in videos naturally affects required network resources to watch them online. In [35], the authors analyze the nature of 4K video traffic for modeling and forecasting so that resource allocation can be handled in advance. They experiment with multiple SARIMA models with various modeling parameters to extract both seasonal and non-seasonal patterns. The best model is selected with respect to AIC and optimized minimizing RMSE and MAE without using a parameter estimation method. Further optimization and model fitting are examined using Ljung-Box tests [4] and Empirical Cumulative Distribution Function (ECDF) graphs. The results show that 4K videos have frequently changing frame size variance without a certain pattern and therefore it is not possible to make long-term predictions.

The authors in [52] address a very practical problem: The varying network use in holidays especially for cellular networks. It is highly possible that people are communicating in special days and holidays much more frequently than ordinary days and it directly affects the quality of service of such periods. In cellular networks, the estimation of changes in traffic patterns in those days per base station is important for service providers to arrange required resources. In [52]

, eliminating long and seasonal trends of traffic, the traffic patterns in holidays are extracted for each base station and they are clustered using K-means clustering. Then, the data in each cluster are modeled using random forest (RF) method to obtain the relationship between the input variables and the traffic data of similar patterns (in the same cluster). This model is also used for the prediction, combining with Nonlinear Auto Regressive with Exogenous model (NARX-RF) whose parameters are estimated using Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm


. Therefore, the study takes adventage of time-series, unsupervised and supervised learning methods all together. It is compared with Facebook’s Prophet

[44] and the results show that it has significantly lower absolute percentage error (APE).

V Discussion and Conclusion

In this study, we propose a guideline and make a broad discussion about traffic forecasting with autoregressive methods focusing on the network-related issues rather than the statistical analysis. We especially pay attention to make the whole study consistent touching to various characteristics and grounding different studies in the literature on a framework depending on such characteristics. In the last section, we are highlighting a wrap-up discussion to cover the most general issues of forecasting that we extract from all studies presented here.

It is clearly understood that offering a general purpose model is not possible even if the network characteristics are quite similar. More than 50% of the studies we reviewed majorly address seasonality in network traffic. Even if the most obvious seasonal patterns such as holidays and festivals are well-captured, some other cyclic patterns still need to be modeled more carefully. In such cases, the size of training data and optimization and estimation steps of autoregressive algorithms presented in Section II have become crucial. That is, a modeling problem actually consists of multiple optimization problems whose results reshape the intended model and this issue leads to uncountably different models with varying parameters.

In certain cases, even a single well-identified model becomes insufficient to make accurate predictions and the solution consequently converges to combined or hybrid models

. It increases complexity and required time for training, and decreases flexibility of the models as they rely on various dependent parameters. This issue also leads to development or employment of different techniques on network traffic analysis such as machine learning and neural networks. Indeed

emerging techniques

bring their own issues on the table and it strongly becomes a trade-off between complexity and accuracy. Moreover, when other performance metrics such as prediction range and confidence interval are included, the coverage of the optimization problem gets beyond the limits. Related to that, it is considerable to

involve heuristics and field expertize to narrow analytical problems to more practical ones

. For instance, Facebook’s Prophet [44] offers forecasting ecosystem rather than a prediction technique called ”analysts-in-the-loop” where the experts can directly involve the forecasting process instead of a fully-automatized prediction. Therefore, aside from sufficient statistical knowledge, it is also valuable to understand domain-specific network requirements to success.

Lastly, the common requirement for all forecasting methods is sufficient training data and proportional training time. Therefore, there is a huge necessity for both practical and real-time techniques that can be dynamically trained and reshaped using spontaneous data. It may not be possible for the complex nature of the network traffic but research on sufficient heuristics that either ease the training process or increase the performance of forecasting alongside the statistical modeling. Autoregressive models, in this sense, are relatively easier to comprehend and be evolved.

In summary, we aim to fill the gap between the statistical analysis of autoregressive forecasting methods and their relevance with networking by discussing significant aspects and requirements for accurate forecasting from a network-telemetric perspective. Even if we focus on the autoregressive methods in the survey part, we believe that our discussion of network traffic forecasting is conducted in a much more broader sense. For future work, we intend to expand the survey with more modern methods such as machine learning and neural networks.


  • [1] S. T. Alexander (1986) The Least Squares Lattice Algorithm. In Adaptive Signal Processing: Theory and Applications, pp. 142–153. External Links: ISBN 978-1-4612-4978-8, Document, Link Cited by: §IV-B.
  • [2] N. C. Anand, C. Scoglio, and B. Natarajan (2008) GARCH - Non-linear time series model for traffic modeling and prediction. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium: Pervasive Management for Ubiquitous Networks and Services, NOMS 2008, pp. 694–697. External Links: Document, ISBN 9781424420667 Cited by: §IV-B, §IV-B, TABLE II.
  • [3] E. Bolshinsky and R. Freidman (2012) Traffic Flow Forecast Survey. External Links: Link Cited by: §I.
  • [4] G. E. P. Box (2015) Time series analysis: forecasting and control. John Wiley & Sons. Cited by: §II, §IV-B.
  • [5] G. E. Box and D. R. Cox (1964) An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological) 26 (2), pp. 211–243. Cited by: §IV-B.
  • [6] M. C. Bryson (1974) Heavy-Tailed Distributions: Properties and Tests. Technometrics 16 (1), pp. 61–68. External Links: Document, Link, https://amstat.tandfonline.com/doi/pdf/10.1080/00401706.1974.10489150 Cited by: §IV-A.
  • [7] J. A. Cadzow (1983-01) ARMA Time Series Modeling: an Effective Method. IEEE Transactions on Aerospace and Electronic Systems AES-19 (1), pp. 49–58. External Links: Document, ISSN 0018-9251 Cited by: §II.
  • [8] J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun On the nonstationarity of internet traffic. In Proceedings of the 2001 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, series = SIGMETRICS ’01, year = 2001, isbn = 1-58113-334-0, location = Cambridge, Massachusetts, USA, pages = 102–112, numpages = 11, url = http://doi.acm.org/10.1145/378420.378440, doi = 10.1145/378420.378440, acmid = 378440, publisher = ACM, address = New York, NY, USA,, Cited by: §III.
  • [9] C. Chen, Q. Pei, and L. Ning (2009) Forecasting 802.11 traffic using seasonal ARIMA model. In Proceedings of the 2009 International Forum on Computer Science-Technology and Applications IFCSTA, Vol. 2, pp. 347–350. External Links: Document, ISBN 9780769539300 Cited by: §IV-B, TABLE II.
  • [10] H. Chen, J. Wu, and S. Gao (2006-10)

    A Study of Autoregressive Conditional Heteroscedasticity Model in Load Forecasting

    In Proceedings of the 2006 International Conference on Power System Technology, Vol. , pp. 1–8. External Links: Document, ISSN Cited by: §II.
  • [11] R. Cleveland and W. Cleveland (1990-01) STL: A seasonal-trend decomposition procedure based on Loess. Journal of Official Statistics 6. Cited by: §IV-B.
  • [12] M. E. Crovella and A. Bestavros (1997-12) Self-similarity in World Wide Web Traffic: Evidence and Possible Causes. IEEE/ACM Transactions on Networking 5 (6), pp. 835–846. External Links: ISSN 1063-6692, Link, Document Cited by: §IV-A.
  • [13] J. Ding, V. Tarokh, and Y. Yang (2018-06) Bridging AIC and BIC: A New Criterion for Autoregression. IEEE Transactions on Information Theory 64 (6), pp. 4024–4043. External Links: Document, ISSN 0018-9448 Cited by: §II.
  • [14] H. M.A. El Hag and S. M. Sharif (2007) An adjusted ARIMA model for internet traffic. External Links: Document, ISBN 142440987X Cited by: §IV-B, TABLE II.
  • [15] R. F. Engle and K. F. Kroner (1995) Multivariate Simultaneous Generalized ARCH. Econometric Theory 11 (1), pp. 122–150. External Links: ISSN 02664666, 14694360, Link Cited by: §II.
  • [16] R. F. Engle and J. R. Russell (1998-09) Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data. Econometrica 66 (5), pp. 1127–1162. External Links: Document, Link Cited by: §II.
  • [17] A. Erramilli, M. Roughan, D. Veitch, and W. Willinger (2002-05) Self-similar traffic and network dynamics. In Proceedings of the Proceedings of the IEEE, Vol. 90, pp. 800–819. External Links: Document, ISSN 0018-9219 Cited by: §III.
  • [18] A.O. Fapojuwo and L. I.W.C. (2006) Statistical methods for computer network traffic analysis. IEE Proceedings Commununication 153 (5), pp. 633–638. External Links: Document, ISBN 0 86341 458 3, ISSN 13502425 Cited by: §IV-A.
  • [19] G. Feng (2015-12) Network Traffic Prediction Based on Neural Network. In Proceedings of the 2015 International Conference on Intelligent Transportation, Big Data and Smart City, Vol. , pp. 527–530. External Links: Document, ISSN Cited by: §I.
  • [20] D. F. Findley, B. C. Monsell, W. R. Bell, M. C. Otto, and B. Chen (1998) New capabilities and methods of the X-12-ARIMA seasonal-adjustment program. Journal of Business & Economic Statistics 16 (2), pp. 127–152. Cited by: §IV-B.
  • [21] M. Grossglauser and J. -. Bolot (1999-10) On the relevance of long-range dependence in network traffic. IEEE/ACM Transactions on Networking 7 (5), pp. 629–640. External Links: Document, ISSN 1063-6692 Cited by: §III.
  • [22] C. C. Holt (2004) Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20 (1), pp. 5 – 10. External Links: ISSN 0169-2070, Document, Link Cited by: §IV-B.
  • [23] K. Hu, A. Sim, D. Antoniades, and C. Dovrolis (2013) Estimating and forecasting network traffic performance based on statistical patterns observed in SNMP data.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    7988 LNAI, pp. 601–615.
    External Links: Document, ISBN 9783642397110, ISSN 03029743 Cited by: §IV-B, TABLE II.
  • [24] H. Hurst (1956) Methods of Using Long-term Storage in Reservoirs. In Proceedings of theProceedings of the Institution of Civil Engineers, Vol. 5, pp. 519–543. External Links: Document, Link, https://doi.org/10.1680/iicep.1956.11503 Cited by: §III.
  • [25] R.J. Hyndman and G. Athanasopoulos (2014) Forecasting: principles and practice. OTexts. External Links: ISBN 9780987507105, Link Cited by: §I.
  • [26] R. Jašek, A. Szmit, and M. Szmit (2013) Usage of Modern Exponential-Smoothing Models in Network Traffic Modelling. In Nostradamus 2013: Prediction, Modeling and Analysis of Complex Systems, I. Zelinka, G. Chen, O. E. Rössler, V. Snasel, and A. Abraham (Eds.), Heidelberg, pp. 435–444. External Links: ISBN 978-3-319-00542-3 Cited by: §I.
  • [27] M. Joshi and T. H. Hadi (2015) A review of network traffic analysis and prediction techniques. arXiv preprint arXiv:1507.05722. Cited by: §I.
  • [28] L. Jun, L. Tong, and L. Xing (2007) Network traffic prediction algorithm and its practical application in real network. In Proceedings of the 2007 IFIP International Conference on Network and Parallel Computing Workshops, NPC 2007, pp. 512–517. External Links: Document, ISBN 0769529437 Cited by: §IV-B, TABLE II.
  • [29] T. Karagiannis, M. Molle, M. Faloutsos, and A. Broido (2004-03) A nonstationary poisson view of internet traffic. In Proceedings of the IEEE INFOCOM 2004, Vol. 3, pp. 1558–1569 vol.3. External Links: Document, ISSN 0743-166X Cited by: §III.
  • [30] T. Karagiannis, M. Molle, and M. Faloutsos (2004-Sep.) Long-range dependence ten years of Internet traffic modeling. IEEE Internet Computing 8 (5), pp. 57–64. External Links: Document, ISSN 1089-7801 Cited by: §III.
  • [31] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson (1993) On the self-similar nature of Ethernet traffic. ACM SIGCOMM Computer Communication Review 23 (4), pp. 183–193. External Links: Document, ISBN 1063-6692, ISSN 01464833, Link Cited by: §IV-A.
  • [32] D. C. Liu and J. Nocedal (1989-08-01) On the limited memory BFGS method for large scale optimization. Mathematical Programming 45 (1), pp. 503–528. External Links: ISSN 1436-4646, Document, Link Cited by: §IV-B.
  • [33] B. B. Mandelbrot (1982) The fractal geometry of nature. Freeman, San Francisco, CA. External Links: Link Cited by: §III.
  • [34] G. Mao (2005) Real-time Network Traffic Prediction Based on a Multiscale Decomposition. In Proceedings of the 4th International Conference on Networking - Volume Part I, ICN’05, Berlin, Heidelberg, pp. 492–499. External Links: ISBN 3-540-25339-4, 978-3-540-25339-6, Link, Document Cited by: §IV-B, TABLE II.
  • [35] D. R. Markovic, A. M. Gavrovska, and I. S. Reljin (2017) 4K video traffic analysis using seasonal autoregressive model for traffic prediction. In Proceedings of the 24th Telecommunications Forum, TELFOR 2016, External Links: Document, ISBN 9788674666494 Cited by: §IV-B, TABLE II.
  • [36] G. Millan, M. Chait, and G. Lefranc (2016) The locality phenomenon in the analysis of self-similar network traffic flows. In Proceedings of the 2016 IEEE International Conference on Automatica, ICA-ACCA 2016, External Links: Document, ISBN 9781509011476 Cited by: §IV-A.
  • [37] R. Morris and D. Lin (2000-03) Variance of aggregated web traffic. In Proceedings of the IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 1, pp. 360–366 vol.1. External Links: Document, ISSN 0743-166X Cited by: §III.
  • [38] L. Ouyang, F. Zhu, G. Xiong, H. Zhao, F. Wang, and T. Liu (2017-10) Short-term traffic flow forecasting based on wavelet transform and neural network. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Vol. , pp. 1–6. External Links: Document, ISSN 2153-0017 Cited by: §I.
  • [39] K. Park and W. Willinger (2000) Self-Similar Network Traffic and Performance Evaluation. 1st edition, John Wiley & Sons, Inc., New York, NY, USA. External Links: ISBN 0471319740 Cited by: §III.
  • [40] V. Paxson and S. Floyd (1995-06) Wide area traffic: the failure of Poisson modeling. IEEE/ACM Transactions on Networking 3 (3), pp. 226–244. External Links: Document, ISSN 1063-6692 Cited by: §IV-A.
  • [41] A. Sang and S. Li (2002-07) A predictability analysis of network traffic. Computer Networks 39, pp. 329–345. External Links: Document Cited by: §IV-A.
  • [42] J. S. Smith (2005) The local mean decomposition and its application to EEG perception data. Journal of The Royal Society Interface 2 (5), pp. 443–454. External Links: Document Cited by: §IV-B.
  • [43] M. S. Taqqu, V. Teverovsky, and W. Willinger (1997) Is Network Traffic Self-Similar or Multifractal?. Fractals 05 (01), pp. 63–73. External Links: Document, Link Cited by: §III.
  • [44] S. J. Taylor, M. Park, U. States, B. Letham, M. Park, and U. States (2017) Forecasting at Scale. External Links: Document, ISSN 15372731 Cited by: §IV-B, §V.
  • [45] G. Terdik and T. Gyires (2009-02) LéVy Flights and Fractal Modeling of Internet Traffic. IEEE/ACM Transactions on Networking 17 (1), pp. 120–129. External Links: ISSN 1063-6692, Link, Document Cited by: §IV-A.
  • [46] D. Tikunov and T. Nishimura (2007-Sep.) Traffic prediction for mobile network using Holt-Winter’s exponential smoothing. In Proceedings of the 15th International Conference on Software, Telecommunications and Computer Networks, Vol. , pp. 1–5. External Links: Document, ISSN Cited by: §I.
  • [47] R. S. Tsay (2009) Autoregressive Conditional Duration Models. In Palgrave Handbook of Econometrics: Volume 2: Applied Econometrics, T. C. Mills and K. Patterson (Eds.), pp. 1004–1024. External Links: ISBN 978-0-230-24440-5, Document, Link Cited by: §IV-B.
  • [48] B. Vujicic and L. Trajkovic (2006-05) Prediction of traffic in a public safety network. In 2006 IEEE International Symposium on Circuits and Systems, Vol. , pp. 4 pp.–. External Links: Document, ISSN 0271-4302 Cited by: §IV-B, TABLE II.
  • [49] X. Wang and X. Shanand (2002-06) A wavelet-based method to predict internet traffic. In Proceedings of the IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions, Vol. 1, pp. 690–694 vol.1. External Links: Document, ISSN Cited by: §I.
  • [50] Y. Wei, J. Wang, and C. Wang (2011) Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model. In Proceedings of the Advances in Neural Networks – ISNN 2011, D. Liu, H. Zhang, M. Polycarpou, C. Alippi, and H. He (Eds.), Berlin, Heidelberg, pp. 152–159. External Links: ISBN 978-3-642-21111-9 Cited by: §I.
  • [51] P. R. Winters (1960) Forecasting Sales by Exponentially Weighted Moving Averages. Management Science 6 (3), pp. 324–342. External Links: Document, Link, https://doi.org/10.1287/mnsc.6.3.324 Cited by: §IV-B.
  • [52] M. Xu, Q. Wang, and Q. Lin (2018) Hybrid holiday traffic predictions in cellular networks. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium: Cognitive Management in a Cyber World, NOMS 2018, External Links: Document, ISBN 9781538634165 Cited by: §IV-B, TABLE II.
  • [53] O. W. W. Yang (1999) Traffic Prediction Using FARIMA Models. pp. 3–7. External Links: ISBN 078035284X Cited by: §II.
  • [54] J. I. Yimu, Y. Yongge, Z. Chuanxin, J. Chenchen, and W. Ruchuan (2015) Research of a novel flash p2p network traffic prediction algorithm. Procedia Computer Science 55 (Itqm), pp. 1293–1301. External Links: Document, ISSN 18770509, Link Cited by: §IV-B, TABLE II.
  • [55] W. Yoo and A. Sim (2016) Time-Series Forecast Modeling on High-Bandwidth Network Measurements. Journal of Grid Computing 14 (3), pp. 463–476. External Links: Document, ISSN 15729184 Cited by: §IV-B, TABLE II.
  • [56] E. S. Yu and C. Y. R. Chen (1993-11) Traffic prediction using neural networks. In Proceedings of the GLOBECOM ’93. IEEE Global Telecommunications Conference, Vol. , pp. 991–995 vol.2. External Links: Document, ISSN Cited by: §I.
  • [57] Y. Yu, M. Song, Y. Fu, and J. Song (2013) Traffic prediction in 3G mobile networks based on multifractal exploration. Tsinghua Science and Technology 18 (4), pp. 398–405. External Links: Document, ISSN 10070214 Cited by: §IV-B, TABLE II.
  • [58] Y. Yu, M. Song, Z. Ren, and J. Song (2011) Network traffic analysis and prediction based on APM. In Proceedings of the 2011 6th International Conference on Pervasive Computing and Applications, ICPCA 2011, pp. 275–280. External Links: Document, ISBN 9781457702082 Cited by: §IV-B, TABLE II.
  • [59] Y. Yu, J. Wang, M. Song, and J. Song (2011) Network traffic prediction and result analysis based on seasonal ARIMA and correlation coefficient. In Proceedings of the 2010 International Conference on Intelligent System Design and Engineering Application, ISDEA 2010, Vol. 1, pp. 980–983. External Links: Document, ISBN 9780769542126 Cited by: §IV-B, TABLE II.
  • [60] Y. Zang, F. Ni, Z. Feng, S. Cui, and Z. Ding (2015-07) Wavelet transform processing for cellular traffic prediction in machine learning networks. In Proceedings of the 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), Vol. , pp. 458–462. External Links: Document, ISSN Cited by: §I.
  • [61] G. Zhang and D. Huang (2013) Short-term network traffic prediction with ACD and particle filter. In Proceedings of the 5th International Conference on Intelligent Networking and Collaborative Systems, INCoS 2013, pp. 189–191. External Links: Document, ISBN 9780769549880 Cited by: §IV-B, TABLE II.
  • [62] B. Zhou, D. He, Z. Sun, and W.H. Ng (2006) Network traffic modeling and prediction with ARIMA/GARCH. In Proceedings of theHET-NETs’ 06 Conference, External Links: ISBN 5037259746 Cited by: §IV-B, TABLE II.
  • [63] Q. Zhuo, Q. Li, H. Yan, and Y. Qi (2017-11) Long short-term memory neural network for network traffic prediction. In

    Proceedings of the 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)

    Vol. , pp. 1–6. External Links: Document, ISSN Cited by: §I.