I Introduction
Internet traffic characterisation is an important problem for network researchers and vendors. The subject has a long history. Early works [1, 2] discovered that the correlation structure of traffic exhibits selfsimilarity and that the durations of individual flows of packets exhibit heavytails [3]. These works were later challenged and refined (see Section VI for a summary). By comparison the distribution of the amount of traffic seen on a link in a given time period has seen comparatively less research interest. This is surprising as this quantity can be extremely useful in network planning.
In this paper we use a rigorous statistical approach to fitting a statistical distribution to the amount of traffic within a given time period. Formally, we choose some timescale and let be the amount of traffic seen in the time period
. We investigate the distribution of the random variable
over a wide range of values of . We show that the distribution of the variable has considerable implications for network planning; for assessing how often a link is over capacity and in particular for service level agreements (SLAs), and for traffic pricing, particularly using the 95th percentile scheme [4].Previous authors have claimed that has a normal (or Gaussian) distribution [5, 6, 7]. Others claim is Gaussian plus a tail associated with bursts [8, 9]. A variable has a lognormal distribution if its logarithm is normally distributed where is the mean and
is the standard deviation of the distribution. We use a wellestablished statistical methodology
[10] to show that a lognormal distribution is a better fit than Gaussian or Weibull^{1}^{1}1A variable has a Weibull distribution with parameters (known as shape) and(known as scale) if its probability density function follows
when and is otherwise. for the vast majority of traces. This holds over a wide range of timescales (from msec to sec). This paper is the most comprehensive investigation of this phenomenon the authors know about. We study a large number of publicly available traces from a diverse set of locations (including commercial, academic and residential networks) with different link speeds and spanning the last 15 years.The structure of the paper is as follows. In Section II we describe the datasets used. In Section III we describe our bestpractice procedure for fitting traffic and demonstrate that lognormal is the best fit distribution for our traces under a variety of circumstances. We examine those few traces that do not follow this distribution and find it occurs when a link spends considerable time either having an outage or completely at maximum capacity. In Section IV we demonstrate that the lognormal distribution is the most useful for estimating how often a link is over capacity. In Section V we show that the lognormal distribution provides good estimates when looking at 95th percentile pricing. In Section VI we give related work. Finally, Section VII gives our conclusions.
Ii Network Traffic Traces
A key contribution of our work stems from the spatial and temporal diversity of the studied traces. The dataset spans a period of 15 years and comprises traces.
CAIDA traces. We have used CAIDA traces captured at an Internet data collection monitor which is located at an Equinix data centre in Chicago [11]. The data centre is connected to a backbone link of a Tier 1 ISP. The monitor records an hourlong traces four times a year, usually from to UTC. Each trace contains billions of IPv4 packets, the headers of which are anonymised. The average captured data rate is 2.5 Gbps. At the time of capturing, the monitored link had a capacity of 10 Gbps. Traces were captured between and . MAWI traces. The MAWI archive [12] consists of a collection of Internet traffic traces, captured within the WIDE backbone network that connects Japanese universities and research institutions to the Internet. Each trace consists of IP level traffic observed daily from to at a vantage point within WIDE. Traces include anonymised IP and MAC headers, along with an ntpd timestamp [12]. We have looked at traces (each one being minutes long). Traces were captured between and . On average, each trace consists of 70 million packets; the average captured data rate is 422 Mbps. The monitored link had a capacity of 1 Gbps. Twente University traces. We used traffic traces captured at five different locations ( traces from each location). Traces are diverse in terms of the link rates, types of users and captured time [13]. Each trace is minutes long. The first location is a residential network with a 300 Mbps link, which connects 2000 students (each one having a 100 Mbps access link); traces were captured in July 2002. The second location is a research institute network with a 1 Gbps link which connects 200 researchers (each one having a 100 Mbps access link); traces were captured between May and August 2003. The third location is at a large college with a 1 Gbps link which connects 1000 employees (each one having a 100 Mbps access link); traces were captured between February and July 2004. The fourth location is an ADSL access network with a 1 Gbps ADSL link used by hundreds of users (each one having a 256 Kbps to 8 Mbps access link); traces were captured between February and July 2004. The fifth location is an educational organisation with a 100 Mbps link connecting 135 students and employees (each one having a 100 Mbps access link); traces were captured between May and June 2007. Waikato University VIII traces. The Waikato dataset consists of traffic traces captured by the WAND group at the University of Waikato, New Zealand [14]. The capture point is at the link interconnecting the University with the Internet. All of the traces were captured using software that was specifically developed for the Waikato capture point and a DAG 3 series hardware capture card. All IP addresses within the traces are anonymised. In our study, we have used traces captured between April 2011 and November 2011. Auckland University IX traces. The Auckland dataset consists of traffic traces captured by the WAND group at the University of Waikato [15]. The traces were collected at the University of Auckland, New Zealand. The capture point is at the link interconnecting the University with the Internet. All IP addresses within the traces are anonymised. In our study, we have used traces captured in 2009.
Iii Fitting a statistical distribution to Internet traffic data
In this section we present an extensive statistical analysis applied to the datasets described in the previous section. The aim is to discover which statistical distribution best fits the traces. In contrast to the existing research (see Section VI), we are basing our analysis on the framework proposed by Clauset et al. [10], a comprehensive statistical framework developed specifically for testing powerlaw behaviour in empirical data^{2}^{2}2We have used the source code discussed in [16].. The framework combines maximumlikelihood fitting methods with goodnessoffit tests based on the Kolmogorov–Smirnov statistic and likelihood ratios. The method reliably tests whether the powerlaw distribution is the best model for a specific dataset, or, if not, whether an alternative statistical distribution (e.g., lognormal, exponential, Weibull) is. The framework performs the tests described above as follows: (1) the parameters of the powerlaw model are estimated for a given dataset; (2) the goodnessoffit between the data and the powerlaw is calculated, under the hypothesis that the powerlaw is the best fit to the provided traffic samples. If the resulting value is greater than the hypothesis is accepted (i.e. the power law is a plausible fit to the given data), otherwise the hypothesis is rejected; (3) alternative distributions are tested against the powerlaw as a fit to the data by employing a likelihood ratio test.
For the vast majority of the traces examined, the hypothesis was rejected; i.e. the powerlaw distribution was not a good fit. Consequently, we investigate alternative distributions by performing the likelihood ratio (LLR) test (following Clauset’s methodology), as follows:
where is the normalised LLR^{3}^{3}3 is calculated as , where is the log likelihood ratio [10]. between the powerlaw and alternative distributions and is the significance value for this test. is positive if the powerlaw distribution is a better fit for the data, and negative if the alternative distribution is a better fit for the data. A value less than means that the value of can be trusted to make a conclusion that one candidate distribution (powerlaw or alternative, depending on the sign of ) is a good fit for the data. In contrast, a value greater than means that there is nothing to be concluded from the likelihood ratio test.
Iiia Fitting the lognormal distribution to Internet traffic data
Figure 1 shows the results of the LLR test for all traces with lognormal, exponential and Weibull distribution as the alternative to powerlaw. For this test we have aggregated traffic at a timescale msec. The points marked with a circle are the ones with . It is clear that the lognormal distribution (black line in Figure 1) is the best fit for the studied traces; i.e. and for most traces when the alternative distribution (to the powerlaw which is almost always rejected) is the lognormal one^{4}^{4}4For clarity, in Figures 1(e) and 2(e) we only plot traces 60 – 107. For traces 1 – 59, is less than and the respective value is less than ; i.e. the alternative distribution is the best fit for the respective trace. The lognormal distribution is not the best fit for out of CAIDA traces, out Waikato traces, out of Auckland traces, out of Twente traces and out of MAWI traces. We examined these traces in more detail and discuss them in Section IIIB.
For the vast majority of traces the powerlaw distribution is favoured over the exponential one (i.e. ), as shown in Figure 1
. Thus, the exponential distribution cannot be considered as a good model for our traffic traces. On the other hand, the Weibull appears to be a better fit over the powerlaw distribution; however, when compared to the lognormal distribution, it still performs poorly (i.e.
or but ) for a substantial amount of traces.Identifying the lognormal distribution as the best fit for the vast majority of traffic traces at msec is very encouraging. This specific traffic aggregation timescale has been commonly studied in the literature [17, 18]. Next we investigate what the best model is for a range of aggregation timescales. The results are shown in Figure 2. As reflected by the and values, the lognormal distribution is the best fit for the vast majority of captured traces at all examined timescales ( msec to sec)^{5}^{5}5Note that it is possible that the network traffic may not follow a lognormal distribution at very fine or coarse aggregation granularities.. This is a strong result suggesting the generality of our observations. The good lognormal fit at time scales as small as msec is important for practical applications of the lognormal model.
We also examined QQ plots for a large number of traces^{6}^{6}6Due to lack of space, QQ plots are not included as we would have to present plots for each trace, separately.. The lognormal distribution appeared to be a better fit than other tested distributions and no deviations from the expected pattern were observed in the body or tail of the distribution.
IiiB Anomalous traces
As mentioned in Section IIIA, there are a small number of traces for which the lognormal distribution is not a good fit (none of the other examined distributions is, either). Figure 3(a) shows the PDF plot for one of the anomalous MAWI traces. Figure 3(b) shows the PDF for another MAWI trace for which the lognormal distribution is a good fit. It is obvious from Figure 3(a) that the link was either severely underutilised (see large spike on the left part of the plot area) or fully utilised (see smaller spike at the right part of the plot area) for higher data rates. All traces for which the lognormal distribution was not a good fit exhibited similar behaviour and (aggregated) traffic patterns. On the contrary, we did not observe any such behaviour for the majority of traces for which the lognormal distribution was the best fit. A likely explanation for the anomalous traces is that those traces contain either periods of overcapacity (traffic is at 100% of link capacity) or periods where the link is broken (no traffic).
IiiC Fitting the lognormal and Gaussian distributions using the correlation coefficient test
The linear correlation coefficient test has been widely used to assess the fit of a distribution to empirical data. To reinforce the results of Section IIIA, we employ the linear correlation coefficient assuming that the lognormal distribution is the best fit (as we showed in Section III). We compare the results of this test for both the lognormal and Gaussian distributions. We use the linear correlation coefficient as defined in [19]:
(1) 
where is the observed sample , and is the samples’ mean value. is sample from the reference distribution (lognormal in our case), which can be calculated from the inverse CDF of the reference random variable and is the respective mean value. The value of the correlation coefficient can vary between , with a , and indicating perfect correlation, no correlation and perfect anticorrelation, respectively. Strong goodnessoffit (GOF) is assumed to exist when the value of is greater than [17].
We measure the linear correlation coefficient for all datasets at four different aggregation timescales (ranging from 5 msec to 5 sec) and plot the results in Figures 4(a) to 4(e) for the lognormal distribution and Figures 4(f) to 4(j) for the Gaussian distribution. Traces are ordered by the value of for the given timescale. It can be clearly seen that for most traces when employing the test for the lognormal distribution, but this is not the case for the Gaussian distribution. is larger for smaller aggregation timescales indicating that the lognormal distribution is an even better fit as the aggregation gets finer. For very small values of , i.e. lower than 1 msec, data samples exhibit binary behaviour, where either a packet is transmitted or not during each examined time frame [18]. We have examined for very short (and large) aggregation timescales, and can confirm the absence of a model describing the data (for brevity, we have omitted the relevant figures).
Next, we calculate (the variation of ) for each dataset. gives an indication of the stability of for each dataset, for all timescales tested. This metric is defined as:
(2) 
where sec, sec, msec and msec. Figures 4(k) to 4(o) show the results for each dataset with the traces ranked by . For lognormal model, is very small (below ) for all traces, therefore we can conclude that is almost constant for all studied aggregation timescales. While is higher for Gaussian model. Furthermore, the error bars in Figures 4(p) to 4(t) represent the standard deviation of the correlation coefficient at different timescales (see xaxis). This again shows that for lognormal model is larger than (at different T values) for most CAIDA and MAWI traces, while it is larger than for all other datasets. This is not the case with the Gaussian model, where most values are less than .
Overall, the correlation coefficient test reinforces the results extracted in Section IIIA, providing strong evidence that the lognormal distribution is the best fit for all studied traces. Superior performance of our model can also be seen from comparison of our results for correlation coefficient with those in [20] where the Gaussian model was used.
Iv Bandwidth Provisioning
It has been previously suggested that network link provisioning could be based on fitted traffic models instead of relying on straightforward empirical rules [20]. In this way, over or underprovisioning can be mitigated or eliminated even in the presence of strong traffic fluctuations. Such approaches rely on having a statistical model that accurately describes the network traffic. This is therefore an excellent area for applying our findings on fitting the lognormal distribution to Internet traffic data. In the literature, the following inequality (the authors call it the “link transparency formula”) has been used for bandwidth provisioning [18]:
(3) 
In words, this inequality states that the probability that the captured traffic over a specific aggregation timescale is larger than the link capacity has to be smaller than the value of a performance criterion . The value of is chosen carefully by the network provider in order to meet a specific SLA [20]. Likewise, the value of the aggregation time should be sufficiently small so that the fluctuations in the traffic can be modelled as well, taking into account the buffering capabilities of network switching devices^{7}^{7}7Large traffic fluctuations at very short aggregation timescales are smoothed by the presence of buffers at network routers and switches..
We compare bandwidth provisioning using Meent’s approximation formula [20] (assuming Gaussian) and using a lognormal traffic model.
Iva Bandwidth provisioning using Meent’s formula
To find the minimum required link capacity, Meent et al. [20] proposed a bandwidth provisioning approach that is based on the assumption that the traffic follows a Gaussian distribution. Meent’s dimensioning formula is defined as follows [20]:
(4) 
where is the average value of the traffic,
is the variance at timescale
and is the performance criterion. The link capacity is obtained by adding a safety margin valueto the average of the captured traffic (see Equation 4). This safety margin value depends on and the ratio . As the value of decreases the safety margin increases. For example, when the value of decreases from to , then value of the safety margin increases by . This is different from conventional link dimensioning methods, where the safety margin is fixed to be 30% above the average of the presented traffic [21, 20]. Traffic tails are represented using the Chernoff bound, as follows:
(5) 
Here
is the moment generation function (MGF) of the captured traffic
.IvB Bandwidth provisioning based on the lognormal model
Here we investigate whether we could achieve more reliable bandwidth provisioning by adopting the lognormal traffic model. We calculate the mean and variance from the captured trace and generate the respective lognormal model. Then, we use the CDF function () to solve the link transparency formula shown in Equation 3. Hence, is defined as , which can be solved to find , as follows:
(6) 
IvC Comparison of bandwidth provisioning approaches
In this section, we compare the bandwidth provisioning approaches described above. The performance indicator is the empirical value of the performance criterion, which is denoted by and defined as follows:
(7) 
In words, this empirical value is the percentage of all the data samples of the captured traffic which are measured larger than the estimated link capacity. Ideally, would be equal to the target value of the performance criterion . The difference between and is due to the fact that the chosen traffic model is not accurately describing the real network traffic. A simple example of the described comparison approach is illustrated in Figure 5, in which we plot the captured data rate for a MAWI trace ( msec)^{8}^{8}8Note that in all subsequent figures we have also included results for a Weibull model to get insights about bandwidth provisioning using a heavytailed distribution.. The calculated capacity values from each approach when the target is are Mbps and Mbps (represented by the horizontal lines in Figure 5). The empirical value can be calculated by using Equation 7, which gives and . Obviously, with the first approach the network operator would not be able to meet the target , while with the second approach the empirical value is close to the target.
We next compare results of bandwidth provisioning calculations based on the (a) Meent’s formula, (b) Weibull model and (c) proposed lognormal model. Figure 6(a)(d) shows the average of the empirical value (avg) for all traces in each dataset at sec, sec and sec. The value of is chosen to be sufficiently small so that the fluctuations in the traffic can be modelled as well. Each model is tested for four different values of the performance criterion: , , and . In Figure 6(a)(d) we clearly see that the lognormal model is able to satisfy the required performance criterion at different aggregation timescales for all datasets. In contrast, Meent’s formula failed to allocate sufficient bandwidth, which results in missing the target performance criterion for all datasets and target performance values, as depicted in Figure 6(i)(l) (see horizontal red line). The Weibull distribution performs better comparing to Meent’s formula, but bandwidth provisioning using the lognormal model is far superior, as can be seen from Figures 6(a)(d) and 6(e)(h).
V 95th percentile pricing scheme based on lognormal model
Traffic billing is typically based on the 95th percentile method [22]. Traffic volume is measured at border network devices (typically aggregated at time intervals of 5 minutes) and bills are calculated according to the 95percentile of the distribution of measured volumes; i.e. network operators calculate bills by disregarding occasional traffic spikes. Forecasting future bills, which is important for ISPs and clients, can be done using a model of the traffic calculated through previously sampled traffic. In this section, we apply our findings on Internet traffic modelling in predicting the cost of traffic according to the 95th percentile method.
For each network trace we calculate the actual 95th percentile of the traffic volume. The majority of the studied traffic traces were 15minute long but operators typically use measurements traffic volumes for much longer periods, therefore we scale down the calculation of the 95th percentile by dividing each trace (900 seconds) into 90 groups (10 seconds length each). The authors appreciate that by using 15minute rather than day long traces we omit any study of diurnal effects in the distribution. We note that the sum of several lognormal distributions is itself very accurately represented by a lognormal distribution [23]. Hypothetically, therefore, if 96 consecutive 15minute traces fit a lognormal distribution (with different parameters for each) then the resulting 24 hour trace is also likely to be a good fit to a lognormal. We also note that the distributions tested were on a level playing field in that they would all be affected equally by the shorter duration of the data sets.
We calculate the 95th percentile for the observed traffic. We then fit a Gaussian, Weibull and lognormal distribution to each trace (for msec) and calculate the 95th percentile of the fitted distribution. We plot the actual 95th percentile against the three predictions in Figure 7 with a red reference line to show where perfect predictions would be located. It is clear that the lognormal model provides much more accurate predictions of the 95th percentile than the Gaussian model. As with the bandwidth dimensioning case discussed in Section IV, the Weibull is better than the Gaussian model but worse than the proposed lognormal model.
We employ the normalised root mean squared error (NRMSE) as a goodness of fit to the results in Figure 7. NRMSE measures the differences between values predicted by a hypothetical model and the actual values. In other words, it measures the quality of the fit between the actual data and the predicted model. Table I shows the NRMSE for all datasets and the three considered models. It is clear that the lowest NRMSE value is for the lognormal model, which is the best model compared to the Gaussian and Weibull ones.
Model/Dataset  CAIDA  Waikato  Auckland  Twente  MAWI 
Lognormal  0.0399  0.0401  0.1058  0.0979  0.1528 
Weibull  0.2410  0.1148  0.2984  0.2123  0.4145 
Gaussian  0.5544  0.4193  0.6866  0.5741  0.9828 
Vi Related work
Reliable traffic modelling is important for network planning, deployment and management; e.g. for traffic billing and network dimensioning. Historically, network traffic has been widely assumed to follow a Gaussian distribution. In [5, 7], the authors studied network traces and verified that the Gaussianity assumption was valid (according to simple goodnessoffit tests they used) at two different timescales. In [24], the authors studied traffic traces during busy hours over a relatively long period of time and also found that the Gaussian distribution is a good fit for the captured traffic. Schmidt et al. [8] found that the degree of Gaussianity is affected by short and intensive activities of single network hosts that create sudden traffic bursts. All the above mentioned works agreed on the Gaussian or ‘fairly Gaussian’ traffic at different levels of aggregations in terms of timescale and number of users. The authors in [19, 25] examined the levels of aggregation required to observe Gaussianity in the modelled traffic, and concluded that this can be disturbed by traffic bursts. The work in [26, 9] reinforces the argument above, by showing existence of large traffic spikes at short timescales which result in high values in the tail. Compared to existing literature, our findings are based on a modern, principled statistical methodology, and traffic traces that are spatially and temporally diverse. We have tested several hypothesised distributions and not just Gaussianity.
An early work drawing attention to the presence of heavy tails in Internet file sizes (not traffic) is that of Crovella and Bestavros [2]. Deciding whether Internet flows could be heavytailed became important as this implies significant departures from Gaussianity. The authors in [27] provided robust evidence for the presence of various kinds of scaling, and in particular, heavytailed sources and long range dependence in a large dataset of traffic spanning a duration of 14 years.
Understanding the traffic characteristics and how these evolve is crucial for ISPs for network planning and link dimensioning. Operators typically overprovision their networks. A common approach to do so is to calculate the average bandwidth utilisation [6] and add a safety margin. As a rule of thumb, this margin is defined as a percentage of the calculated bandwidth utilisation [21]. Meent et al. [20] proposed a new bandwidth provisioning formula, which calculates the minimum bandwidth that guarantees the required performance, according to an underlying SLA. This approach relies on the statistical parameters of the captured traffic and a performance parameter. The underlying fundamental assumption for this to work is that the traffic the network operator sees follows a Gaussian distribution. Same approach has been used in [18].
The 95th percentile method is used widely for network traffic billing. Dimitropoulos et al. [22] have found that the computed 95th percentile is significantly affected by traffic aggregation parameters. However, in their approach they do not assume any underlying model of the traffic; instead, they base their study on specific captured traces. Stanojevic et al. [4] proposed the use of Shapley value for computing the contribution of each flow to the 95th percentile price of interconnect links. Works [28, 29, 30, 31] propose calculating the 95th percentile using experimental approaches. Xu et al. [32] assume that network traffic follows a Gaussian distribution“through reasonable aggregation” and propose a cost efficient data centre selection approach based on the 95th percentile.
Vii Conclusion
The distribution of traffic on Internet links is an important problem that has received relatively little attention. We use a wellknown, stateoftheart statistical framework to investigate the problem using a large corpus of traces. The traces cover several network settings including home user access links, tier 1 backbone links and campus to Internet links. The traces are from times from 2002 to 2018 and are from a number of different countries. We investigated the distribution of the amount of traffic observed on a link in a given (small) aggregation period which we varied from msec to sec. The hypotheses compared were that the traffic volume was heavytailed, that the traffic was lognormal and that the traffic was normal (Gaussian). The vast majority of traces fitted the lognormal assumption best and this remained true all timescales tried. Where no distribution tested was a good fit this could be attributed either to the link being saturated (at capacity) for a large part of the observation or exhibiting signs of linkfailure (no or very low traffic for part of the observation).
We investigate the impact of the distribution on two sample traffic engineering problems. Firstly, we looked at predicting the proportion of time a link will exceed a given capacity. This could be useful for provisioning links or for predicting when SLA violation is likely to occur. Secondly, we looked at predicting the 95th percentile transit bill that ISP might be given. For both of these problems the lognormal distribution gave a more accurate result than a heavytailed distribution or a Gaussian distribution. We conclude that the lognormal distribution is a good (best) fit for traffic volume on a normally functioning internet links in a variety of settings and over a variety of timescales, and further argue that this assumption can make a large difference to statistically predicted outcomes for applied network engineering problems.
In future work, we plan to test the stationarity of the traffic traces.
References
 [1] P. Pruthi and A. Erramilli, “Heavytailed on/off source behavior and selfsimilar traffic,” in Proc. of ICC, 1995.
 [2] M. E. Crovella and A. Bestavros, “Selfsimilarity in world wide web traffic: evidence and possible causes,” IEEE/ACM ToN, 1997.
 [3] P. Loiseau, P. Goncalves, G. Dewaele, P. Borgnat, P. Abry, and P. V. B. Primet, “Investigating selfsimilarity and heavytailed distributions on a largescale experimental facility,” IEEE/ACM ToN, 2010.
 [4] R. Stanojevic and et. al., “On economic heavy hitters: Shapley value analysis of 95thpercentile pricing,” in Proc. of ACM IMC, 2010.
 [5] R. V. D. Meent, M. Mandjes, and A. Pras, “Gaussian traffic everywhere?” in Proc. of IEEE ICC, 2006.
 [6] R. d. O. Schmidt, H. van den Berg, and A. Pras, “Measurementbased network link dimensioning,” in Proc. of IFIP/IEEE, 2015.
 [7] R. d. O. Schmidt, R. Sadre, and A. Pras, “Gaussian traffic revisited,” in Proc. of IFIP Networking, 2013.
 [8] R. d. O. Schmidt, R. Sadre, N. Melnikov, J. Schönwälder, and A. Pras, “Linking network usage patterns to traffic gaussianity fit,” in Proc. of IFIP Networking, 2014.
 [9] X. Yang, “Designing traffic profiles for bursty Internet traffic,” in Proc. of IEEE GLOBECOM, 2002.
 [10] A. Clauset, C. S. Rohilla, and M. Newman, “Powerlaw distributions in empirical data,” arXiv:0706.1062v2, 2009.
 [11] “The caida ucsd anonymized internet traces,” 2016. [Online]. Available: http://www.caida.org/data/passive/passive_dataset.xml
 [12] “Mawi archive,” 2018. [Online]. Available: http://mawi.wide.ad.jp/
 [13] R. R. R. Barbosa, R. Sadre, A. Pras, and R. van de Meent, “Simpleweb/university of twente traffic traces data repository,” http://eprints.eemcs.utwente.nl/17829/, Tech. Rep., 2010.
 [14] “Wits: Waikato internet traffic storage,” 2013. [Online]. Available: https://wand.net.nz/wits/waikato/8/
 [15] “Wits: Auckland x,” 2009. [Online]. Available: https://wand.net.nz/wits/auck/10/
 [16] J. Alstott, E. Bullmore, and D. Plenz, “powerlaw: a python package for analysis of heavytailed distributions,” arXiv:1305.0215, 2014.
 [17] M. Mandjes and R. van de Meent, “Resource dimensioning through buffer sampling,” IEEE/ACM Transactions on Networking, 2009.
 [18] R. d. O. Schmidt, R. Sadre, A. Sperotto, H. van den Berg, and A. Pras, “Impact of packet sampling on link dimensioning,” IEEE Transactions on Network and Service Management, 2015.
 [19] J. Kilpi and I. Norros, “Testing the gaussian approximation of aggregate traffic,” in Proc. of SIGCOMM, 2002.
 [20] A. Pras, L. Nieuwenhuis, R. van de Meent, and M. Mandjes, “Dimensioning network links: a new look at equivalent bandwidth,” IEEE Network, 2009.
 [21] “Best practices in core network capacity planning,” online, accessed July 2018. [Online]. Available: https://www.cisco.com/c/en/us/products/collateral/routers/wanautomationengine/white_paper_c11728551.pdf
 [22] X. Dimitropoulos, P. Hurley, A. Kind, and M. P. Stoecklin, “On the 95Percentile Billing Method,” in Proc. of PAM, 2009.
 [23] R. Mitchell, “Permanence of the lognormal distribution.” J. Optical Society of America, 1968.
 [24] J. L. GarcíaDorado, J. A. Hernández, J. Aracil, J. E. López de Vergara, and S. LopezBuedo, “Characterization of the busyhour traffic of IP networks based on their intrinsic features,” Computer Networks, 2011.
 [25] A. B. Downey, “Evidence for Longtailed Distributions in the Internet,” in Proc. of ACM SIGCOMM Workshop on Internet Measurement, 2001.
 [26] H. Abrahamsson, B. Ahlgren, P. Lindvall, J. Nieminen, and P. Tholin, “Traffic characteristics on 1gbit/s access aggregation links,” in Proc. of IEEE ICC, 2017.

[27]
R. Fontugne and et. al., “Scaling in internet traffic: A 14 year and 3 day longitudinal study, with multiscale analyses and random projections,”
IEEE/ACM Transactions on Networking, 2017.  [28] L. Golubchik and et. al., “To send or not to send: Reducing the cost of data transmission,” in Proc. of IEEE INFOCOM, 2013.
 [29] N. Laoutaris, M. Sirivianos, X. Yang, and P. Rodriguez, “Interdatacenter bulk transfers with netstitcher,” in Proc. of ACM SIGCOMM, 2011.
 [30] I. Castro, R. Stanojevic, and S. Gorinsky, “Using Tuangou to Reduce IP Transit Costs,” IEEE/ACM Transactions on Networking, 2014.
 [31] H. Xu and B. Li, “Joint request mapping and response routing for geodistributed cloud services,” in Proc. of IEEE INFOCOM, 2013.
 [32] ——, “Cost efficient datacenter selection for cloud services,” in Proc. of IEEE ICCC, 2012.
Comments
There are no comments yet.