I Introduction
In the early years of the Internet, network traffic had been modeled with relatively easy statistical approaches. There were only a few commonlyused services and protocols, and they are actively used by a very limited number of users. In contrast, it is much harder to predict traffic patterns and characteristics in today’s communication systems. Even if many different techniques are embodied for analysis and prediction, several concerns must be addressed for an accurate traffic engineering. Timeseries models are quite popular to extract the temporal patterns of network traffics and make predictions depending on those patterns [25].
There are different approaches for forecasting such as exponential smoothing [26, 46], wavelets [49, 60] and hybrid methods including multiple approches [38, 50]
. Besides, neural networks (NNs) and autoregressive models are two of frequentlyused group of techniques for network traffic prediction in practice. They are considered as the fundamental elements of forecasting toolbox. Today, network traffic forecasting with NN is quite popular as a different approach than traditional stochastic modeling
[56, 19, 63]. They detect patterns and structures in input data, learn through many iterations and use such experience to evaluate new data similar to learning process of human beings. NNs are more successful to capture complex relationships in data thanks to their nonlinear nature but the data required for training is much higher in comparison to autoregressive models. Even if NNs seem like the primary alternative for them, autoregressive methods are dominantly studied especially for the prediciton of network traffic excluding other domains.To the best of our knowledge, there are only a few surveys in the literature on network traffic forecasting [27, 3] and the existing ones do not even touch the significant network flow aspects. Besides, there is not a systematical study that builds a grounding for traffic forecasting research offering an analysis framework. In this study, we present a selfconsistent study that analyzes requirements, characteristics, and examples of temporal autoregressive models for forecasting since they are mostly employed and practically used models for network traffic prediction. Rather than examining the statistical foundation of the models, we review all aspects of forecasting from a higherlevel networking perspective. Fig. 1 shows a mindmap that summarizes all important headlines of the study. Accordingly, our contributions are listed as:

We review the relevant dynamics of autoregressive modeling techniques which are common in various studies (Section II).

We discuss different characteristics of timeseries data from networking perspective for a better comprehension of the forecasting studies rather than touching analytical details (Section III). Moreover, we use such characteristics as a framework to analyze forecasting studies.

We point out common issues and challanges, and also possible research directions in general (Section V).
Ii Brief Introduction to Autoregressive Modeling
Autoregressive (AR) models are stochastic models that consume the input values (of past) in a time sequence into a regression function to predict future values for related timeseries. Autoregressive Moving Average (ARMA) [7], Autoregressive Integrated Moving Average (ARIMA) [4], Fractional ARIMA (FARIMA) [53], Seasonal ARIMA (SARIMA), Autoregressive Conditional Heteroskedasticity (ARCH) [10], Generalised ARCH (GARCH) [15], Exponential GARCH (EGARCH), Autoregressive Conditional Duration (ACD) [16]
, Stochastic Autoregressive Mean (SAM), and Nonlinear Auto Regressive with Exogenous (NARX) are falling into this category. Indeed there are statistical differences, for instace, while ARIMA models focus on conditional mean through temporal series, ARCH methods take conditional variance into consideration for modeling. In this study, we especially focus on those techniques under autoregression scope considering their practicality, relatively shorter modeling duration, less data requirement, and lower complexity. The dynamics of those models is simple shown in Fig.
2 for a better understanding. Note that, it is a very brief illustration and omits iterative processes which are required for optimization of the model.AR models consist of three phases: (i) Statistical modeling with respect to some criteria, (ii) parameter estimation and (iii) forecasting. (i) The first phase is related to detect correlation in timeseries using autocorrelation functions (ACFs) and identify the model basedon widelyused criteria such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
[13]. (ii) Secondly, related coefficients of the identified model are estimated using wellknown estimation methods such as Maximum Likelihood Estimation (MLE) and Least Mean Square Error (LMSE). After parameter estimation, (iii) futurepoints of the timeseries are predicted and the accuracy is presented with respect to different metrics.Lastly, there are a number of metrics to evaluate the performance of timeseries modeling techniques. All of the presented studies in Section IVB employs one or more metric(s) to analyze the accuracy of their proposed techniques and compare with other techniques. Generally, after the model is constructed, such metrics are used to compare fitted (i.e., predicted values according to the model) and actual values. Table I shows those metrics briefly and they are associated to the studies presented in next sections.
Abbreviation  Metric  Uniteless  Scalefree 

APE  Absoulute percentage error  ✓  ✓ 
MAE  Mean absolute error  
MARE  Mean absolute relative error  ✓  ✓ 
MAPE  Mean absolute percentage error  ✓  ✓ 
MPE  Mean percentage error  ✓  ✓ 
MSE  Mean square error  
NMSE  Normalize mean square error  ✓  
NRMSE  Normalize root mean square error  ✓  
RMSE  Root mean square error  
SER  Signaltoerror ratio  ✓  ✓ 
Since dozens of studies analyze the statistical aspects of AR models in depth in various domains, we are not taking this approach here and keep related discussion limited with this section. The terms touched here may be considered as a guidance for a better comprehension of the rest of this study, especially in Section IV.
Iii Traffic Characteristics
Network traffic may reflect various characteristics that are vital to detect for accurate forecasting. Those characteristics are not only observed in network traffics but any subject analyzing temporal data such as economics, physics, and psychology. In this section, we introduce six characteristics from a networktelemetric perspective. Those characteristics also construct our comparison framework to analyze and compare network traffic forecasting studies in Section IVB.
Selfsimilarity. Being introduced 40 years ago [33], selfsimilarity of network traffic is discussed in a number of studies [17, 39]. In a very practical sense, selfsimilar objects are observed in the same shape when magnified or shrunk. From the network traffic perspective, it means that a proportional segment of measuring (e.g., number of packets or amount of data during a certain of time with a predefined granularity) tends to be observed in a different time scale. In this sense, selfsimilarity is also related to longrange dependency in practice.
Seasonality. It is quite common to observe nearly the same patterns with a certain frequency in any domain of temporal measuring. Typical weather conditions in each ”season” of a year are great examples. Similarly, for network traffics, weekend and weekdays, holidays or certain hours of a day show very similar patterns, periodically. Comprehension of the seasonality in data flows is crucial to analyze the nature of traffic, and also forecast the future traffic load possibly reflecting congruent patterns.
Nonstationarity. In a stationary process, the mean, variance, and correlation model stay constant over time and it is one of the very common assumptions for timeseries modeling and stochastic processes. However, network traffic may show changing statistical characteristic leading to a change in modeling as well [29, 8, 37]. Therefore, before forecasting, a timeseries model should be capable and sensitive to detect such changes that are observed frequently depending on various factors such as the number of users, connections, and bandwidth utilization of related network elements in practice.
Multifractal. In aggregated network traffics (i.e., consisting of multiple flows originated by multiple sources), it is possible to observe selfsimilar characteristics of individual network flows. This type of traffic flows are not only indicated as selfsimilar or fractal but multifractal [43]. Detecting multifractal behaviors and fractal patterns of multiple flows simultaneously are naturally important yet challenging for forecasting.
Longrange dependency (LRD). Various timedependent systems or physical phenomena show correlated behavior during large time scales. For instance, Hurst confirmed such a situation in Nile River’s repeating rain and drought conditions observed a long period of time [24] and Hurst parameter becomes the fundamental detection technique of the LRD. For network traffic, especially the Internet, the longrange dependency is observed in different significant studies [21, 30]. Together with the selfsimilar properties, the LRD shifts the traffic modeling perspective from memoryless stochastic processes (e.g., Poisson) to longmemory timeseries.
Shortrange dependency (SRD). In comparison to the LRD, the correlation can be observed for shorter time scales in shortrange dependent processes. That is, the dependence among the observations quickly dissolves and it is related to quickly decaying correlations. Many traditional timeseries modeling techniques examine the SRD; however, it is not enough to reflect today’s network traffic pattern. SRD can be considered while forecasting shortterm traffic employing relatively lowcomplexity models.
Note that those characteristics are mutually complementary. For instance, LRD and selfsimilarity require similar methods to be detected but they are not the same: the former is strongly related to timedependency while the latter one covers the variance in scale as well. However, the common case is being dependent on scaling patterns in long time intervals. It is also possible to analyze such characteristics through multifractal traffic analysis. Similarly, the seasonality actually infers the nonstationary characteristics but while some studies directly consider seasonality, the others are looking for the nonstationarities from a wider perspective. Therefore, we present those elements separately as they are individually addressed in various works in the literature.
Iv Forecasting in the Literature
After a highlevel introduction on the autoregressive models and the definition of traffic characteristics, we relate all that background to forecasting methods in the literature discussing a number of significant aspects. In this section, first, we present further key aspects that fundamentally reshape network traffic analysis pointing to significant studies. Then, we review various studies that exemplify what is covered so far.
Iva Analysis on Different Aspects of Traffic
Before presenting the temporal techniques for traffic modeling, we introduce some short discussions about the different aspects of network traffic. We selected those aspects since they are considered as milestones in forecasting domain that change research perspective and lead to more accurate forecasting or comprehension to underlying reasons for the limited success in flow prediction. Through this section, we evaluate important studies on the analysis of such aspects for a better understanding and further thinking on forecasting.
On selfsimilarity. Selfsimilarity was a complex issue and significantly required novel stochastic techniques to be considered in forecasting. In [31]
, the nature of selfsimilarity in Ethernet traffic is discussed both practically and statistically. The authors analyzed selfsimilarity relying on rescaled range analysis (through Hurst parameter), variances in aggregated processes and periodogrambased analysis in frequency domain. It is revealed that the stochastic processes to model Ethernet traffics are not capable to reflect their selfsimilar nature which is proven with statistical analysis on a fouryear captured Ethernet traffic. Moreover, the authors suggest Hurst parameter is much effective to detect burstiness in traffic in comparison to known parameters such as the index of dispersion, peaktomean ratio, and coefficients of variation. Lastly, they list two ways to generate network traffic satisfying selfsimilarity property: fractural Gaussian noise (FGN) and chaos maps. The study is quite revealing (though its age, +25 years) to deeply understand the underlying statistical properties of the real Ethernet traffic and also the analysis of selfsimilarity.
On forecasting range. It is important to estimate the limits of success in forecasting and define requirements precisely for accurate resource allocation for the potential traffic. In [41], the authors focus on three fundamental issues of the traffic forecasting: (i) how far into the future a traffic can be predicted, (ii) how much resources are required to minimize uncertainty and (iii) which characteristics of traffic are the most effective. For (i), they define the maximum prediction interval (MPI) with having a limited prediction error
(e.g., 20%) with a certain probability
(e.g., 99.9%) during an interval of time. Besides, the tradeoff between maximizing MPI and minimizing prediction error is welldiscussed. (ii) In most of the referenced studies here, first and secondorder statistics of the traffic are considered and there needs to be a certain amount of resource for accurate statistical modeling. (iii) According to the study, the prediction efficiency may not be directly related to the selected method (which are ARMA and MMPP here) but the nature of traffic. For instance, while the ethernet traffic is not predictable with a certain error for a reasonable MPI, the study shows much better performance on Internet traffic due to the higher multiplexing factor. That is, it is not efficient to take sample traces (e.g., sessions) for prediction if the traffic is fed from many different (i.e., statistically independent) sources exactly like the Internet. Similarly, the efficiency in the use of certain characteristics needs to be discussed per scenario. The study draws the conclusion that the LRD may not matter in traffic management for delaysensitive services, accordingly.On heavytail distribution. Heavytail distribution in network traffic is shown to be strongly related to the transfer size and interarrival times with the selfsimilar nature of the (Internet) traffic [12]
. Heavytail distribution is defined as having a heavier tail than the exponential distribution
[6]. It easily misleads the traffic forecasting methods relying on basic statistics. In [18], the main focus is on detection and characterization of heavytail distribution. The existing estimators are not accurate to reflect heavytail characteristics since they have idealized assumptions such as stationarity and independence. Therefore, first, the authors present quantilequantile and complementary cumulative distribution function (CCDF) plot for the detection of heavytails indicating the drawbacks of those methods. Then, the performance of four different estimators for the characterization (i.e., detecting the tail exponent) and their sensitivity to noise are analyzed. To compensate for their weaknesses, the authors propose a new waveletbased method to filter longrange dependent data and increase the efficiency of previously used estimators. In the end, the homogeneity of longrange variance (i.e., timevarying LRD exponent) is discussed. Even if the study does not directly present a forecasting method, it is worth pointing out to understand the effects of heavytail distribution and LRD for the analysis of timeseries data.
On fractality. Selfsimilar traffic patterns are widely detected using Hurst parameter; however, it is rather complicated to analyze multifractal flows following the same procedure. Complex (and highspeed as stated in the study) networks generally comprise multiple selfsimilar and independent flows that can be modeled as different stochastic processes and cumulative analysis of such fractal flows (i.e., considering as a single flow) shows a multifractal behavior. In [36], the authors relate multifractal patterns of the flow to the locality phenomenon of Hurst parameter in selfsimilar networks. It is also indicated that varying Hurst parameter would be the key method to generate traffic flows with multiple selfsimilar characteristics.
On complexity of stochastic modeling.
Without inspecting for correlation in any scale, it is questioned that if we can model various network traffics using simpler approaches e.g., Poisson distribution in packet interarrival times. It indeed directly depends on the nature of the traffic. For instance, according to
[40], modeling TCP traffic with Poisson (or other models) cannot capture LRD and burstiness and eventually results in degrading performance of forecasting in terms of average packet delay or maximum queue size. That is, a deeper analysis and more sophisticated modeling are required to represent such traffic e.g., TCP in widearea networking. On the other hand, high aggregation on the Internet (for instance a traffic sample captured from the backbone traffic) may nearly follow Poisson distribution [45] and it eventually indicated the weakness of bursty traffic in the backbone traffic.IvB Autoregressive Modeling Techniques
There are a number of studies that uses timeseries statistical methods by modifying them according the nature of the applied network traffic. In this section, we present how they modify or enhance the techniques in Section II to satisfy various requirements. Table II shows comparative analysis taking different network characteristics into consideration. One can argue that the basic dynamics of some methods address several characteristics by default. For instance, while differencing in ARIMA is a solution to eliminate nonstationarity, ARMA automatically detects shortrange patterns. However, a characteristic is marked for a study in Table II only if related study directly addresses a problem related to that particular characteristic specifically and its effectiveness is shown using required measurement techniques.
The studies here take similar statistical approaches but focus on different characteristics. It is possible to divide them into three groups in terms of such characteristics to handle, repetitiveness, volatility, and dependency. Repetitiveness represents cyclic and usual patterns and it is directly related to selfsimilarity and seasonality. Volatility, on the other hand, covers the varying properties of network traffic such as nonstationaries and multifractals. Lastly, dependency represents the timedependent characteristics which are long and shortrange dependency. This classification can be considered as a metaframework corresponding to the characteristics presented in Section III. Fig. 3 groups the studies in this section according to this metaframework.
As seen in Fig. 3, many studies have commonly addressed multiple characteristics. It is also reasonable to explain crossrelationships between categoricals through the figure. For instance, repetitiveness and dependency may be very closelyrelated since they both lead to temporal correlations. However, while dependency mainly focuses on timing issues, repetitiveness is also related to scaling in magnitude with an observable pattern. Volatility may require different analysis basedon decomposition but still has intersections between the others since it targets instability in both time and scale. Addressing the problems on all three categories is possible, though. Such studies are generally hybrid methods and expected to have higher complexity. In the rest of this section, the studies falling into those categories are discussed. Note that since half of them belongs to multiple groups, they are not gathered under individual headlines and sorted in chronological order.
Study  Technique(s)  Parameter Estimation  Evaluation  Domain  Self similarity  Seasonal  Non stationarity  Multifractal  LRD  SRD 
Mao [34]  ARIMA Wavelet  MLE LS  MARE NMSE  LAN  ✓  
Zhou et al.[62]  ARIMA GARCH  ACF MLE  SER  WAN  ✓  ✓  ✓  
Vujicic et al.[48]  SARIMA    NMSE  Public Safety  ✓  
Jun et al.[28]  ARMA MLSL  LSL  MSE  Internet  ✓  ✓  ✓  
El Hag and Sharif [14]  AARIMA    MAE  WAN  ✓  ✓  
Anand et al.[2]  GARCH  MLE  NMSE(like)  Internet  ✓  
Chen et al.[9]  SARIMA  ACF LS  MAPE  WLAN  ✓  ✓  
Yu et al.[59]  ARIMA  MLE  MAPE  Cellular Network  ✓  ✓  
Yu et al.[58]  APM  MLE  MAPE  Cellular Network  ✓  
Zhang and Huang [61]  ACD Particle Filter  BHHH  RMSE  Data Center  ✓  
Hu et al.[23]  X12 ARIMA STL    APE  SNMP  ✓  
Yu et al.[57]  FARIMA  MLE MMSE  APE MAPE  Cellular Network  ✓  ✓  ✓  
Yimu et al.[54]  GARCH LMD  MLE  RMSE  P2P Multimedia  ✓  ✓  
Yoo and Sim [55]  ARIMA STL  MLE  RMSE MAE  SNMP  ✓  ✓  
Markovic et al.[35]  SARIMA  Manual  RMSE MAE  Multimedia  ✓  
Xu et al.[52]  NARXRF  LBFGS  APE  Cellular Network  ✓ 
It is quite likely that different network mechanisms cause traffic variations at different timescales. Therefore, it is generally hard to statistically model network traffic at once. In [34], the authors decompose the traffic into different timescales using atrous Haar wavelet transform. Then, they apply different ARIMA models to each wavelet (i.e., traffic at different timescales) for onestep forecasting at seconds granularity. Combining the prediction for each, the forecast traffic which covers varying characteristics of the traffic through time is obtained. The results are compared with an NNbased method using the very same training data and it is shown that the proposed method shows better performance in terms of NMSE and MARE.
ARIMA/GARCH [62] offers a combined technique targetting various network characteristics such as SRD, LRD, selfsimilarity and multifractal. Using the abilities of ARIMA in linear traffic and GARCH for changing variance, the authors design a onestep predictor which shows a potential to be extended to make multik (or multistep) predictions. Similar to [2], ARIMA/GARCH has a parameter estimation phase to tune the parameters of both ARIMA and GARCH using MLE based on the BoxCox [5] method. The technique shows better performance than FARIMA in terms of signaltoerror ratio (SER) in various timescales and also experimental multik predictions. However, the authors do not present concrete accuracy results other than SER. Therefore, the correlation between predictions and actual traffic data is not directly observable.
In [48]
, instead of analyzing aggregated traffic, the authors profile user behaviors based on their hourly callrate using Kmeans clustering method. They divide users (callgroups in the study context) into three groups as low, medium and high callrate by the clustering method. Then, each group is separately modeled using SARIMA models for daily and weekly cyclic patterns and then add them up to make complete prediction reflecting all users’ behavior. The main motivation is taking advantage of welldefined group characteristics for more accurate predictions rather than the whole data which is relatively harder to model due to its complex and aggregated nature. However, it is not feasible to work on maximum granularity (i.e., peruser prediction) though. Therefore, the authors aim to provide (a degree of) scalability by grouping the users while increasing the prediction accuracy in comparison to modeling aggregated traffic. The results show that 57% of groupbased predictions gives better results than the predictions on aggregated traffic in terms of NMSE. Besides, groupbased forecasting paves the way of profiling individual users as far as they can be identified under a forecasting group.
[28] proposes a modified version of Leastsquare lattice (MLSL)[1]
method to calculate related autoregression (AR) parameters dynamically. MLSL relies on adaptive filter theory. Instead of evaluation of model parameters once using a set of training data, it dynamically updates AR model parameters per input (i.e., packet). The authors modified LSL to reduce computation cost and increase convergence speed. In comparison to leastsquared method (LS) and ARMA, MLSL shows higher accuracy and faster convergence experimenting on synthetic data that have shortterm dependence characteristics. For data generation, the authors use inverse Fourier transform to generate Fractal Gaussian Noise given Hurst parameter. Therefore, the data show selfsimilarity as well (as SRD).
[14]
proposes the Adjusted ARIMA Model (AARIMA) for modelling Internet traffic data at millisecond time scales. The authors speficially address selfsimilarity and LRD in Internet traffic whose samples are captured from Bellcore Internet Wide Area Network. They have shown that even if the residuals of ARIMA models give residuals with a white noise distribution, the models may not offer sufficient goodness of fit statistics. Therefore, they offer AARIMA as a quick and simple modeling method by modifying ARIMA where the first difference of the stationary series added as a regressor. Especially for modeling Internet traffic, the results show that AARIMA gives lower MAE finalizing in higher number of iterations in different datasets. Note that in terms of modeling phases presented in Section
2, evaluating AARIMA is exactly the same with ARIMA as specifically claimed by the authors and it satisfies all requirements for reliable residuals (e.g., white noise distribution, BoxJenkins tests).In [2], an enhanced (or generalized) ARCH model (GARCH) is introduced to develop a onestep predictor for nonlinear traffic models e.g., Internet. The authors point out that constantvariance models like ARIMA and its successors (e.g., FARIMA and SARIMA) cannot fit the bursty (and nonlinear) nature of the Internet traffic whereas GARCH is taking conditional variance into account to react changing traffic patterns. To be able to determine GARCH parameters, MLE is deployed using the training data. The results show that the forecast error of GARCH is significantly less than the ARIMAARCH model for onestep prediction (i.e., comparing to ARIMA(1,1,1)ARCH(1)). However, its performance is open to validation in less aggregated traffics other than the Internet.
Despite its simplicity, [59] offers a leaner ARIMA differencing process by converting multiple stationarization operation (for both trend and seasonal patterns) to a single multiplicative process. The authors speficially target to eliminate seasonal patterns (i.e., analyzing 12months (or lag) autocorrelation results) and also extracting 6month patterns from a large network traffic data basedon a set of Chinese regions. They present single and multistep prediction MAPEs for each month; however, the results are not compared any other prediction method that is applied to related data. A similar approach is also presented in [9] namely Multiple Seasonal ARIMA model to obtain a model covering two different seasonal features in wireless network traffic using 5minutes sampling.
In consideration of relatively complex and time consuming process of ARIMA, accumulation predicting model (APM) is proposed in [58]. APM especially addresses the traffic patterns with stable seasonal characteristic. For the detection of stable seasonality, the ratio of partial accumulation to total accumulation (i.e., cumulative traffic load for a certain amount of months to the whole year) is evaluated and constant (e.g., linear in the study) changes are interpreted as a reflection of the characteristic. Then, such interpretation is used to predict monthly traffic for the next year. When it is compared to ARIMA, APM results in lower MAPE in the detection of the total monthly traffic; however, ARIMA is more successful to estimate average daily traffic (with relatively higher APE deviation).
It is usually quite hard to model and forecast network traffic in minutegranularity. In [61], the authors propose an autoregressive model for shortterm prediction, in minutes. Autoregressive conditional duration (ACD) [47]
is employed to model the distribution of interarrival times of continuous traffic for a certain duration. Related parameters of ACD is estimated using BerndtHallHallHausman (BHHH) algorithm. The traffic, as claimed by the authors, is nonstationary and has a nonGaussian distribution. Then, the particle filtering method is applied considering the extracted distribution model for the prediction of the packet traffic in the upcoming minute. The results show that the proposed method is able to predict oneminute traffic (in MBs) with less error than ARMA in terms of RMSE.
Detecting seasonality is one of the crucial tasks for decompositon of the traffic. In [23]
, the adjustment of seasonality is detailedly investigated to understand underlying clearly and make more accurate forecasting. For the adjustment, the authors handle missing values first and soften outliers dividing them into four different types. Then, using seasonaltrend decomposition using Loess (STL)
[11] and X12ARIMA [20], they decompose the data into three components: trend, season and irregularities. An extra step, diagnostics, is taken to examine the stability of adjusted data series with a persistent model for new data feeds. It is revealed that considering a daily seasonality leads to most accurate decomposition and show the trend with minimum irregularities after this procedure. The results show that the onestep forceasting after cleaning dailiy seasonality on Simple Network Management Protocol (SNMP) data gives the minimum forecasting error in comparsion to six other benchmarking methods including HoltWinters [22, 51], ARIMA and linear regression. Another similar study working on SNMP data is conducted addressing the seasonality
[55]. After seasonal adjustment using STL, the authors also analyze selfsimilarity. Lastly, they apply ARIMA to remove residual autocorrelation on the adjusted data. The results show the success of the proposed study on forecasting network utilization under stationary and nonstationary assumptions, and varying size of training set with oneday seasonal cycles.Even if short and longrange dependency in various types of network traffic are discussed, fractal characteristics may be harder to detect. FARIMA, for instance, is designed to handle fractal patterns considering inadequency of ARIMA. In [57], the authors discuss the points where FARIMA fails, the multifractal characteristics. When hourly traffic of sequential days is examined in mobile 3G downlink traffic, it is shown that selfsimilarity and multifractal patterns exist and FARIMA would not be enough to model such patterns for the prediction. Therefore, a combined technique of FARIMA and ARIMA is embodied to eliminate (a) fractal, (b) longrange dependent and (c) shortrange dependent characteristics successively. Finally, an effective method examining the change in Hurst parameter for predicted data is used for forecasting. The results show that the combined method results in less than 8% APE and nearly 2% MAPE that are considered as reasonable error rates for daily forecasting.
One of the significant outcomes of traffic forecasting is increasing the quality of service (QoS) by predicting user demand. In [54], the authors analyze peertopeer (P2P) video sharing to satisfy QoS requirements in multimedia traffic. They, first, determine the fundamental problems of such traffic which are claimed as selfsimilarity and LRD in the study. To deal with those characteristics, the whole temporal data of network traffic is divided smallerterm time series, or Product Functions (PFs), using local mean decomposition (LMD) [42]. LMD helps to process the series as multiple shortrange series independent by its longterm patterns. Then, several PFs are iteratively forecast using GARCH and the predicted values are summed up to obtain a final prediction. The results are compared ARMA and WNN models in terms of RMSE and it is shown that proposed method offer more accurate predictions.
After highdefinition videos have changed multimedia trends, upcoming 4K/8K videos are expected to become popular in the same direction. However, increasing resolution in videos naturally affects required network resources to watch them online. In [35], the authors analyze the nature of 4K video traffic for modeling and forecasting so that resource allocation can be handled in advance. They experiment with multiple SARIMA models with various modeling parameters to extract both seasonal and nonseasonal patterns. The best model is selected with respect to AIC and optimized minimizing RMSE and MAE without using a parameter estimation method. Further optimization and model fitting are examined using LjungBox tests [4] and Empirical Cumulative Distribution Function (ECDF) graphs. The results show that 4K videos have frequently changing frame size variance without a certain pattern and therefore it is not possible to make longterm predictions.
The authors in [52] address a very practical problem: The varying network use in holidays especially for cellular networks. It is highly possible that people are communicating in special days and holidays much more frequently than ordinary days and it directly affects the quality of service of such periods. In cellular networks, the estimation of changes in traffic patterns in those days per base station is important for service providers to arrange required resources. In [52]
, eliminating long and seasonal trends of traffic, the traffic patterns in holidays are extracted for each base station and they are clustered using Kmeans clustering. Then, the data in each cluster are modeled using random forest (RF) method to obtain the relationship between the input variables and the traffic data of similar patterns (in the same cluster). This model is also used for the prediction, combining with Nonlinear Auto Regressive with Exogenous model (NARXRF) whose parameters are estimated using Limitedmemory BroydenFletcherGoldfarbShanno (LBFGS) algorithm
[32]. Therefore, the study takes adventage of timeseries, unsupervised and supervised learning methods all together. It is compared with Facebook’s Prophet
[44] and the results show that it has significantly lower absolute percentage error (APE).V Discussion and Conclusion
In this study, we propose a guideline and make a broad discussion about traffic forecasting with autoregressive methods focusing on the networkrelated issues rather than the statistical analysis. We especially pay attention to make the whole study consistent touching to various characteristics and grounding different studies in the literature on a framework depending on such characteristics. In the last section, we are highlighting a wrapup discussion to cover the most general issues of forecasting that we extract from all studies presented here.
It is clearly understood that offering a general purpose model is not possible even if the network characteristics are quite similar. More than 50% of the studies we reviewed majorly address seasonality in network traffic. Even if the most obvious seasonal patterns such as holidays and festivals are wellcaptured, some other cyclic patterns still need to be modeled more carefully. In such cases, the size of training data and optimization and estimation steps of autoregressive algorithms presented in Section II have become crucial. That is, a modeling problem actually consists of multiple optimization problems whose results reshape the intended model and this issue leads to uncountably different models with varying parameters.
In certain cases, even a single wellidentified model becomes insufficient to make accurate predictions and the solution consequently converges to combined or hybrid models
. It increases complexity and required time for training, and decreases flexibility of the models as they rely on various dependent parameters. This issue also leads to development or employment of different techniques on network traffic analysis such as machine learning and neural networks. Indeed
emerging techniquesbring their own issues on the table and it strongly becomes a tradeoff between complexity and accuracy. Moreover, when other performance metrics such as prediction range and confidence interval are included, the coverage of the optimization problem gets beyond the limits. Related to that, it is considerable to
involve heuristics and field expertize to narrow analytical problems to more practical ones
. For instance, Facebook’s Prophet [44] offers forecasting ecosystem rather than a prediction technique called ”analystsintheloop” where the experts can directly involve the forecasting process instead of a fullyautomatized prediction. Therefore, aside from sufficient statistical knowledge, it is also valuable to understand domainspecific network requirements to success.Lastly, the common requirement for all forecasting methods is sufficient training data and proportional training time. Therefore, there is a huge necessity for both practical and realtime techniques that can be dynamically trained and reshaped using spontaneous data. It may not be possible for the complex nature of the network traffic but research on sufficient heuristics that either ease the training process or increase the performance of forecasting alongside the statistical modeling. Autoregressive models, in this sense, are relatively easier to comprehend and be evolved.
In summary, we aim to fill the gap between the statistical analysis of autoregressive forecasting methods and their relevance with networking by discussing significant aspects and requirements for accurate forecasting from a networktelemetric perspective. Even if we focus on the autoregressive methods in the survey part, we believe that our discussion of network traffic forecasting is conducted in a much more broader sense. For future work, we intend to expand the survey with more modern methods such as machine learning and neural networks.
References
 [1] (1986) The Least Squares Lattice Algorithm. In Adaptive Signal Processing: Theory and Applications, pp. 142–153. External Links: ISBN 9781461249788, Document, Link Cited by: §IVB.
 [2] (2008) GARCH  Nonlinear time series model for traffic modeling and prediction. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium: Pervasive Management for Ubiquitous Networks and Services, NOMS 2008, pp. 694–697. External Links: Document, ISBN 9781424420667 Cited by: §IVB, §IVB, TABLE II.
 [3] (2012) Traffic Flow Forecast Survey. External Links: Link Cited by: §I.
 [4] (2015) Time series analysis: forecasting and control. John Wiley & Sons. Cited by: §II, §IVB.
 [5] (1964) An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological) 26 (2), pp. 211–243. Cited by: §IVB.
 [6] (1974) HeavyTailed Distributions: Properties and Tests. Technometrics 16 (1), pp. 61–68. External Links: Document, Link, https://amstat.tandfonline.com/doi/pdf/10.1080/00401706.1974.10489150 Cited by: §IVA.
 [7] (198301) ARMA Time Series Modeling: an Effective Method. IEEE Transactions on Aerospace and Electronic Systems AES19 (1), pp. 49–58. External Links: Document, ISSN 00189251 Cited by: §II.
 [8] On the nonstationarity of internet traffic. In Proceedings of the 2001 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, series = SIGMETRICS ’01, year = 2001, isbn = 1581133340, location = Cambridge, Massachusetts, USA, pages = 102–112, numpages = 11, url = http://doi.acm.org/10.1145/378420.378440, doi = 10.1145/378420.378440, acmid = 378440, publisher = ACM, address = New York, NY, USA,, Cited by: §III.
 [9] (2009) Forecasting 802.11 traffic using seasonal ARIMA model. In Proceedings of the 2009 International Forum on Computer ScienceTechnology and Applications IFCSTA, Vol. 2, pp. 347–350. External Links: Document, ISBN 9780769539300 Cited by: §IVB, TABLE II.

[10]
(200610)
A Study of Autoregressive Conditional Heteroscedasticity Model in Load Forecasting
. In Proceedings of the 2006 International Conference on Power System Technology, Vol. , pp. 1–8. External Links: Document, ISSN Cited by: §II.  [11] (199001) STL: A seasonaltrend decomposition procedure based on Loess. Journal of Official Statistics 6. Cited by: §IVB.
 [12] (199712) Selfsimilarity in World Wide Web Traffic: Evidence and Possible Causes. IEEE/ACM Transactions on Networking 5 (6), pp. 835–846. External Links: ISSN 10636692, Link, Document Cited by: §IVA.
 [13] (201806) Bridging AIC and BIC: A New Criterion for Autoregression. IEEE Transactions on Information Theory 64 (6), pp. 4024–4043. External Links: Document, ISSN 00189448 Cited by: §II.
 [14] (2007) An adjusted ARIMA model for internet traffic. External Links: Document, ISBN 142440987X Cited by: §IVB, TABLE II.
 [15] (1995) Multivariate Simultaneous Generalized ARCH. Econometric Theory 11 (1), pp. 122–150. External Links: ISSN 02664666, 14694360, Link Cited by: §II.
 [16] (199809) Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data. Econometrica 66 (5), pp. 1127–1162. External Links: Document, Link Cited by: §II.
 [17] (200205) Selfsimilar traffic and network dynamics. In Proceedings of the Proceedings of the IEEE, Vol. 90, pp. 800–819. External Links: Document, ISSN 00189219 Cited by: §III.
 [18] (2006) Statistical methods for computer network traffic analysis. IEE Proceedings Commununication 153 (5), pp. 633–638. External Links: Document, ISBN 0 86341 458 3, ISSN 13502425 Cited by: §IVA.
 [19] (201512) Network Traffic Prediction Based on Neural Network. In Proceedings of the 2015 International Conference on Intelligent Transportation, Big Data and Smart City, Vol. , pp. 527–530. External Links: Document, ISSN Cited by: §I.
 [20] (1998) New capabilities and methods of the X12ARIMA seasonaladjustment program. Journal of Business & Economic Statistics 16 (2), pp. 127–152. Cited by: §IVB.
 [21] (199910) On the relevance of longrange dependence in network traffic. IEEE/ACM Transactions on Networking 7 (5), pp. 629–640. External Links: Document, ISSN 10636692 Cited by: §III.
 [22] (2004) Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20 (1), pp. 5 – 10. External Links: ISSN 01692070, Document, Link Cited by: §IVB.

[23]
(2013)
Estimating and forecasting network traffic performance based on statistical patterns observed in SNMP data.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
7988 LNAI, pp. 601–615. External Links: Document, ISBN 9783642397110, ISSN 03029743 Cited by: §IVB, TABLE II.  [24] (1956) Methods of Using Longterm Storage in Reservoirs. In Proceedings of theProceedings of the Institution of Civil Engineers, Vol. 5, pp. 519–543. External Links: Document, Link, https://doi.org/10.1680/iicep.1956.11503 Cited by: §III.
 [25] (2014) Forecasting: principles and practice. OTexts. External Links: ISBN 9780987507105, Link Cited by: §I.
 [26] (2013) Usage of Modern ExponentialSmoothing Models in Network Traffic Modelling. In Nostradamus 2013: Prediction, Modeling and Analysis of Complex Systems, I. Zelinka, G. Chen, O. E. Rössler, V. Snasel, and A. Abraham (Eds.), Heidelberg, pp. 435–444. External Links: ISBN 9783319005423 Cited by: §I.
 [27] (2015) A review of network traffic analysis and prediction techniques. arXiv preprint arXiv:1507.05722. Cited by: §I.
 [28] (2007) Network traffic prediction algorithm and its practical application in real network. In Proceedings of the 2007 IFIP International Conference on Network and Parallel Computing Workshops, NPC 2007, pp. 512–517. External Links: Document, ISBN 0769529437 Cited by: §IVB, TABLE II.
 [29] (200403) A nonstationary poisson view of internet traffic. In Proceedings of the IEEE INFOCOM 2004, Vol. 3, pp. 1558–1569 vol.3. External Links: Document, ISSN 0743166X Cited by: §III.
 [30] (2004Sep.) Longrange dependence ten years of Internet traffic modeling. IEEE Internet Computing 8 (5), pp. 57–64. External Links: Document, ISSN 10897801 Cited by: §III.
 [31] (1993) On the selfsimilar nature of Ethernet traffic. ACM SIGCOMM Computer Communication Review 23 (4), pp. 183–193. External Links: Document, ISBN 10636692, ISSN 01464833, Link Cited by: §IVA.
 [32] (19890801) On the limited memory BFGS method for large scale optimization. Mathematical Programming 45 (1), pp. 503–528. External Links: ISSN 14364646, Document, Link Cited by: §IVB.
 [33] (1982) The fractal geometry of nature. Freeman, San Francisco, CA. External Links: Link Cited by: §III.
 [34] (2005) Realtime Network Traffic Prediction Based on a Multiscale Decomposition. In Proceedings of the 4th International Conference on Networking  Volume Part I, ICN’05, Berlin, Heidelberg, pp. 492–499. External Links: ISBN 3540253394, 9783540253396, Link, Document Cited by: §IVB, TABLE II.
 [35] (2017) 4K video traffic analysis using seasonal autoregressive model for traffic prediction. In Proceedings of the 24th Telecommunications Forum, TELFOR 2016, External Links: Document, ISBN 9788674666494 Cited by: §IVB, TABLE II.
 [36] (2016) The locality phenomenon in the analysis of selfsimilar network traffic flows. In Proceedings of the 2016 IEEE International Conference on Automatica, ICAACCA 2016, External Links: Document, ISBN 9781509011476 Cited by: §IVA.
 [37] (200003) Variance of aggregated web traffic. In Proceedings of the IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 1, pp. 360–366 vol.1. External Links: Document, ISSN 0743166X Cited by: §III.
 [38] (201710) Shortterm traffic flow forecasting based on wavelet transform and neural network. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Vol. , pp. 1–6. External Links: Document, ISSN 21530017 Cited by: §I.
 [39] (2000) SelfSimilar Network Traffic and Performance Evaluation. 1st edition, John Wiley & Sons, Inc., New York, NY, USA. External Links: ISBN 0471319740 Cited by: §III.
 [40] (199506) Wide area traffic: the failure of Poisson modeling. IEEE/ACM Transactions on Networking 3 (3), pp. 226–244. External Links: Document, ISSN 10636692 Cited by: §IVA.
 [41] (200207) A predictability analysis of network traffic. Computer Networks 39, pp. 329–345. External Links: Document Cited by: §IVA.
 [42] (2005) The local mean decomposition and its application to EEG perception data. Journal of The Royal Society Interface 2 (5), pp. 443–454. External Links: Document Cited by: §IVB.
 [43] (1997) Is Network Traffic SelfSimilar or Multifractal?. Fractals 05 (01), pp. 63–73. External Links: Document, Link Cited by: §III.
 [44] (2017) Forecasting at Scale. External Links: Document, ISSN 15372731 Cited by: §IVB, §V.
 [45] (200902) LéVy Flights and Fractal Modeling of Internet Traffic. IEEE/ACM Transactions on Networking 17 (1), pp. 120–129. External Links: ISSN 10636692, Link, Document Cited by: §IVA.
 [46] (2007Sep.) Traffic prediction for mobile network using HoltWinter’s exponential smoothing. In Proceedings of the 15th International Conference on Software, Telecommunications and Computer Networks, Vol. , pp. 1–5. External Links: Document, ISSN Cited by: §I.
 [47] (2009) Autoregressive Conditional Duration Models. In Palgrave Handbook of Econometrics: Volume 2: Applied Econometrics, T. C. Mills and K. Patterson (Eds.), pp. 1004–1024. External Links: ISBN 9780230244405, Document, Link Cited by: §IVB.
 [48] (200605) Prediction of traffic in a public safety network. In 2006 IEEE International Symposium on Circuits and Systems, Vol. , pp. 4 pp.–. External Links: Document, ISSN 02714302 Cited by: §IVB, TABLE II.
 [49] (200206) A waveletbased method to predict internet traffic. In Proceedings of the IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions, Vol. 1, pp. 690–694 vol.1. External Links: Document, ISSN Cited by: §I.
 [50] (2011) Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model. In Proceedings of the Advances in Neural Networks – ISNN 2011, D. Liu, H. Zhang, M. Polycarpou, C. Alippi, and H. He (Eds.), Berlin, Heidelberg, pp. 152–159. External Links: ISBN 9783642211119 Cited by: §I.
 [51] (1960) Forecasting Sales by Exponentially Weighted Moving Averages. Management Science 6 (3), pp. 324–342. External Links: Document, Link, https://doi.org/10.1287/mnsc.6.3.324 Cited by: §IVB.
 [52] (2018) Hybrid holiday traffic predictions in cellular networks. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium: Cognitive Management in a Cyber World, NOMS 2018, External Links: Document, ISBN 9781538634165 Cited by: §IVB, TABLE II.
 [53] (1999) Traffic Prediction Using FARIMA Models. pp. 3–7. External Links: ISBN 078035284X Cited by: §II.
 [54] (2015) Research of a novel flash p2p network traffic prediction algorithm. Procedia Computer Science 55 (Itqm), pp. 1293–1301. External Links: Document, ISSN 18770509, Link Cited by: §IVB, TABLE II.
 [55] (2016) TimeSeries Forecast Modeling on HighBandwidth Network Measurements. Journal of Grid Computing 14 (3), pp. 463–476. External Links: Document, ISSN 15729184 Cited by: §IVB, TABLE II.
 [56] (199311) Traffic prediction using neural networks. In Proceedings of the GLOBECOM ’93. IEEE Global Telecommunications Conference, Vol. , pp. 991–995 vol.2. External Links: Document, ISSN Cited by: §I.
 [57] (2013) Traffic prediction in 3G mobile networks based on multifractal exploration. Tsinghua Science and Technology 18 (4), pp. 398–405. External Links: Document, ISSN 10070214 Cited by: §IVB, TABLE II.
 [58] (2011) Network traffic analysis and prediction based on APM. In Proceedings of the 2011 6th International Conference on Pervasive Computing and Applications, ICPCA 2011, pp. 275–280. External Links: Document, ISBN 9781457702082 Cited by: §IVB, TABLE II.
 [59] (2011) Network traffic prediction and result analysis based on seasonal ARIMA and correlation coefficient. In Proceedings of the 2010 International Conference on Intelligent System Design and Engineering Application, ISDEA 2010, Vol. 1, pp. 980–983. External Links: Document, ISBN 9780769542126 Cited by: §IVB, TABLE II.
 [60] (201507) Wavelet transform processing for cellular traffic prediction in machine learning networks. In Proceedings of the 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), Vol. , pp. 458–462. External Links: Document, ISSN Cited by: §I.
 [61] (2013) Shortterm network traffic prediction with ACD and particle filter. In Proceedings of the 5th International Conference on Intelligent Networking and Collaborative Systems, INCoS 2013, pp. 189–191. External Links: Document, ISBN 9780769549880 Cited by: §IVB, TABLE II.
 [62] (2006) Network traffic modeling and prediction with ARIMA/GARCH. In Proceedings of theHETNETs’ 06 Conference, External Links: ISBN 5037259746 Cited by: §IVB, TABLE II.

[63]
(201711)
Long shortterm memory neural network for network traffic prediction.
In
Proceedings of the 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)
, Vol. , pp. 1–6. External Links: Document, ISSN Cited by: §I.