Structured Nonnegative Matrix Factorization for Traffic Flow Estimation of Large Cloud Networks

Network traffic matrix estimation is an ill-posed linear inverse problem: it requires to estimate the unobservable origin destination traffic flows, X, given the observable link traffic flows, Y, and a binary routing matrix, A, which are such that Y = AX. This is a challenging but vital problem as accurate estimation of OD flows is required for several network management tasks. In this paper, we propose a novel model for the network traffic matrix estimation problem which maps high-dimension OD flows to low-dimension latent flows with the following three constraints: (1) nonnegativity constraint on the estimated OD flows, (2) autoregression constraint that enables the proposed model to effectively capture temporal patterns of the OD flows, and (3) orthogonality constraint that ensures the mapping between low-dimensional latent flows and the corresponding link flows to be distance preserving. The parameters of the proposed model are estimated with a training algorithm based on Nesterov accelerated gradient and generally shows fast convergence. We validate the proposed traffic flow estimation model on two real backbone IP network datasets, namely Internet2 and G'EANT. Empirical results show that the proposed model outperforms the state-of-the-art models not only in terms of tracking the individual OD flows but also in terms of standard performance metrics. The proposed model is also found to be highly scalable compared to the existing state-of-the-art approaches.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

10/14/2018

Dimension Reduction for Origin-Destination Flow Estimation: Blind Estimation Made Possible

This paper studies the problem of estimating origin-destination (OD) flo...
01/22/2020

Estimation of Latent Network Flows in Bike-Sharing Systems

Estimation of latent network flows is a common problem in statistical ne...
11/06/2018

A Novel Compressed Sensing Technique for Traffic Matrix Estimation of Software Defined Cloud Networks

Traffic Matrix estimation has always caught attention from researchers f...
06/24/2021

The Associative Multifractal Process: A Novel Model for Computer Network Traffic Flows

A novel constructive mathematical model based on the multifractal formal...
08/19/2016

Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking

This paper addresses network anomography, that is, the problem of inferr...
04/24/2020

CFR-RL: Traffic Engineering with Reinforcement Learning in SDN

Traditional Traffic Engineering (TE) solutions can achieve the optimal o...
02/12/2021

Jacobian Determinant of Normalizing Flows

Normalizing flows learn a diffeomorphic mapping between the target and b...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since the emergence of Internet, network parameter estimation has emerged to be an important but challenging research topic whose goal is to infer the network parameters that cannot be measured, such as end-to-end traffic volume and link delays. The significance of this research domain is due to its vital role in the network engineering tasks such as capacity planning Xie et al. (2018), network optimization Xiao et al. (2019), expansion and monitoring of the network Hu et al. (2008); Tune and Veitch (2011), congestion avoidance Pan et al. (2019)

, and anomaly detection 

Li et al. (2019). In this paper, we focus on the network traffic matrix estimation problem that estimates the origin-to-destination traffic volumes using readily available routing matrix and easily observable link traffic volume.

Consider a network which consists of links and origin-destination (OD) pairs. Let the network be observed at time , and let us denote and

the OD flow and link flow vectors at time

, respectively. Further, let be the binary routing matrix of the network such that if the routing path for the th OD pair of the network passes through the th link of the network. Then, the following relationship between link flow vector and OD flow vector holds:

(1)

When the network is monitored for a time period , (1) becomes:

(2)

where the link flow matrix and traffic matrix are formed by stacking the column vectors ’s and ’s, for .

The traffic estimation problem is the linear inverse problem of recovering from and using (2). More precisely, the goal is to estimate the OD flows at the future timestamps (for ) provided that the corresponding link flows , the binary routing matrix , and the historical OD flows and link flows measurements are given. It is an ill-posed problem for all practical networks because the number of links in the networks is typically much smaller than the OD pairs in the network, that is, . Hence, finding a solution for this problem is challenging because of the following two reasons: (1) Due to the rapid advancement in the network technology and its abundance at an affordable price, the size of modern networks are growing day by day Tune et al. (2013). Hence, the difference between the number of links and that of OD pairs is becoming more and more prominent. (2) Due to the increase in diversity of applications running on the modern Internet, the statistical features of traffic are being further complicated Wu et al. (2011).

Dimensionality reduction techniques play vital rule in network traffic estimation problem. It is because OD flows when observed over a period of time , that is, the OD flow matrix , have diurnal cycles hence high-dimensional OD flows can be mapped to low-dimensional latent flows using a dimensionality reduction technique such as the truncated SVD. Let us consider the following low-rank approximation

(3)

where and are the left and right low-rank factors, and is the factorization rank. The matrix encodes the spatial features of OD flows whereas encodes the latent flows associated with the OD flow matrix . Plugging (3) in (2), we obtain

(4)

For a backbone Internet network, it can be safely assumed that the routing matrix and spatial feature matrix remain stable over a long period of time. Thus, the highly ill-posed problem of estimating OD flows using (2) is transformed with the help of dimensionality reduction into the well-posed problem of estimating latent flows using (4), given that has been estimated using historical data. In fact, given a new link flow observation , we can estimate by solving whose solution is given by where is the (left) pseudoinverse of , and obtain .

1.1 Related work

Several approaches have been proposed recently for traffic matrix estimation. Among them dimensionality reduction techniques are the most popular and they are the focus of this paper. The reason why dimensionality reduction techniques can be employed in network traffic estimation is because (1) the backbone Internet traffic is highly concentrated, that is, most of the OD flows use only few links of the network, and (2) OD flows have spatial and temporal similarities, that is, nearby OD pairs have similar traffic and OD flows have diurnal cycles. Soule et al. Soule et al. (2005)

proposed to employ the singular value decomposition (SVD) 

Golub and Van Loan (2013), a well-known unconstrained dimensionality reduction technique, to develop a traffic estimation model. A disadvantage of their model is that it uses the Moore-Penrose pseudoinverse hence requiring an additional step to suppress possible negative OD flows in the initial estimate of the model. In Kumar et al. (2018), Kumar et al. proposed a traffic estimation model based on a relatively new unconstrained dimensionality reduction technique known as CUR Mahoney and Drineas (2009). Its key benefits over SVD are mainly computational efficiency and the interpretability of the low-rank factors that are directly derived from the given data. Recently, Qazi et al. Qazi et al. (2018)

proposed to use the demand matrix and the traffic probability matrix pair to transform the ill-posed traffic estimation problem into an equivalent well-posed problem. However, the two proposed methods 

Kumar et al. (2018); Qazi et al. (2018) suffer from the limitations similar to that of Soule et al. (2005) due to the use of the Moore-Penrose pseudoinverse.

Some researchers have shown keen interest in different neural network architectures to design an effective model for traffic flow estimation. For example, Jiang et al. in 

Jiang et al. (2011)

proposed to employ feed forward neural network (FFNN) for traffic flow estimation modeling. The proposed model BPTME takes link flows as input to FFNN whereas OD flows are yielded as output by the network. The network is trained by back error propagation algorithm. Zhou et al. in 

Zhou et al. (2016) proposed an enhancement over BPTME Jiang et al. (2011) by injecting the routing information into the FFNN as input for improved performance. Due to the architectural simplicity, FFNN does not scale well with size, complexity and dynamics of traffic flows in the modern networks, hence Nie et al Nie et al. (2016)

proposed to a traffic flow estimation model based on deep belief network (DBN). In 

Zhao et al. (2018)

, Zhao et al. used long short term memory recurrent neural network (LSTM-RNN). DLTMP 

Zhao et al. (2018) is found to perform better compared to the contemporary approaches in capturing spatiotemporal dependencies of traffic flows because LSTM-RNN has cyclic connections over time.

Genetic or evolutionary algorithms are also utilized by some researchers in the context of traffic flow estimation 

Zhang et al. (2016); Zhang and Zhang (2016)

. These typically employ quantum-behaved particle swarm optimization (QPSO). In 

Lu et al. (2019), Lu et al. used multi-fractal discrete wavelet transform (MDWT) to split traffic matrix into different frequency component then train the neural network to predict low and high frequency component of traffic matrix. Kumar et al. in Kumar et al. (2020) proposed a multi-view subspace learning technique for traffic flow estimation. They proposed a novel robust approach to obtain traffic flows from multiple traffic views yielded from rather inexpensive existing methods.

In this paper, we consider graph embedding and nonnegative matrix factorization (NMF) to perform traffic flow estimation. Graph embedding has been successfully applied in various research fields such as computer vision and recommender systems. Interested readers may refer to the recent survey paper 

Cai et al. (2018) by Cai et al. on graph embedding for further details. Emami et al. Emami et al. (2019)

have recently used graph embedding in the context of traffic flow estimation. Their approach blends graph embedding with convolution neural network (CNN) for better traffic flow estimation. Moreover, graph embedding has also been used by several researchers in combination with matrix factorization. A seminal research work in the context of NMF along with graph embedding is 

Cai et al. (2011) by Cai et al. Roughan et al. Roughan et al. (2012) have used classical NMF in the scenario of network traffic flow estimation. An autoregression based approach has been proposed recently for capturing temporal dependencies in Yu et al. (2016) by Yu et al. This autoregressive approach maps the high-dimensional time series data into a low-dimensional latent time series. To avoid overfitting, each latent timeseries is autoregressed independently. Further, the authors of Yu et al. (2016) proved that their proposed autoregressive approach has an equivalent graph representation, thus conventional solvers used for graph embedding approaches can be used for this autoregression approach as well.

1.2 Contribution and outline of the paper

Motivated from the benefits of NMF and autoregression approaches, and keeping in mind the limitations of the existing traffic flow estimation models, we propose a novel model for traffic flow estimation that is developed using a unique combination of NMF, autoregression and orthogonality.

NMF is a constrained dimensionality reduction technique which ensures nonnegativity hence is the natural choice for nonnegative network traffic flow estimation. It has already been successfully applied in many engineering applications, including but not limited to, computer vision, signal processing Leplat et al. (2020), hyperspectral sensing Ang and Gillis (2019), and background subtraction Chen and Li (2016). However to the best of our knowledge, we are the first to conduct a comprehensive research with conclusive results for the use of NMF in the context of network traffic flow estimation. Along with the nonnegativity constraint, we also impose two additional constraints on the proposed model for effectively capturing the spatial and temporal features in the traffic flows of the underlying network. The key contribution of this paper is threefold:

  1. A novel multi-constrained nonnegative matrix factorization model is proposed for network traffic flow estimation. The proposed model consists of a dimensionality reduction for capturing spatial features of the network traffic flows with a novel combination of three additional constraints: a) nonnegativity, b) autoregression, and c) orthogonality. The autoregression constraint allows a dynamic learning of autoregression weights and an effective representation of negative correlation between different OD flows which are prominent features of the proposed model and are not available in the conventional graph embedding approaches. The orthogonality constraint provides a distance preserving transformation from link flows to latent flows enabling effective clustering of the given link flows according to their temporal similarities.

  2. We propose an efficient training algorithm using the fast gradient method of Nesterov, while we use a noise resilient initialization strategy that provides a deterministic seeding point to the training algorithm Atif et al. (2019).

  3. We illustrate the effectiveness of the proposed model and algorithm on two publicly available Internet backbone traffic data sets, namely Internet2 Yin Zhang (2004) and GÉANT Uhlig et al. (2006). The empirical results show that the new model performs competitively with the current state-of-the-art models.

The rest of the paper is organized as follows. Section 2 introduces the proposed model in detail, highlighting the key components and their advantages along with differences with existing approaches. Section 3 presents our proposed algorithm to tackle our new model. In Section 4, the proposed model is extensively evaluated on two publicly available Internet traffic data sets, and we show that our proposed model competes favorably with state-of-the-art algorithms.

2 New model for network traffic flow estimation

Let us formulate our proposed NMF-based network traffic estimation model for a network that is monitored for time period , and consists of links and OD pairs. For this time period, we are given , and our goal is to construct a model allowing us to predict for , given and ; see Section 1. More precisely, we are given

  • the routing matrix ,

  • the OD flow matrix ,

  • a factorization rank , and

  • a lag set with . The lag set contains the indices indicating a dependency between the th and th time points, such as in autoregressive models; see Section 2.2 for more details.

The model aims at computing by solving the following optimization problem:

(5a)
(5b)
(5c)
(5d)

where denotes the Frobenius norm, and the objective (5a) is the data fitting term of the model. Let us discuss the constraints in the above model:

  • (5b) ensures nonnegative entries in , and .

  • (5c) imposes that each entry of matrix is the nonnegative weighted sum of entries in the th row of matrix preceding . This is motivated by the autoregression model; see Section 2.2 for the details.

  • (5d) defines a rank- compact routing matrix that has orthonormal columns as stands for a identity matrix; see Section 2.1 for the details.

In the context of dimensionality reduction, the low-rank matrices and are generally called the basis and embedding (encoding) matrix, respectively. However, in the context of traffic matrix estimation, contains the spatial features of the OD flow matrix in low dimension, while can be interpreted as a latent flow matrix defining the latent flows of the underlying network.

In order to incorporate the constraints (5b)-(5d), we will resort to regularization so that the constraints will not be strictly satisfied, but will tend to be. Using regularization, the optimization task in (5) can be formulated as a regularized NMF problem; see Section 3.1.

Let us now discuss the two key constraints in (5), namely orthogonality (5d) and autoregression (5c).

2.1 Orthogonality constraint

The constraint can be equivalently written as , which imposes constraints on . Note that is also nonnegative since and are, and hence this orthogonality constraint requires the columns of to have disjoint supports (that is, the set of nonzero entries of the columns of do not intersect). This is related to the so-called orthogonal NMF model; see Pompili et al. (2014) and the references therein. Hence this constraint requires the support of the columns of to be disjoint as well, imposing to learn different features from the data set. Equivalently, it implies that there is at most a single non-zero entry in each row of and .

Looking back at the model , this means that each row of is approximated as an scaling of a single row of  Pompili et al. (2014). Because we will not enforce (5d) strictly, but use a regularization, our minimization problem is related to soft clustering, which can be effectively used to capture diurnal similarity patterns (temporal dependencies) present in the link count matrix .

Estimation of

Another salient aspect of the proposed model is that the estimate of an OD flows, at time , using the observed link count , is purely nonnegative. In fact, as explained in Section 1, is estimated from the model

and by taking . We have since , so that which is nonnegative since . This desirable feature of the proposed method is due to the additional nonnegativity and orthogonality constraints, (5b) and (5d). To the best of the authors’ knowledge, this aspect has been missing in current state-of-the-art methods that generally employ an additional step of setting negative entries to zero.

2.2 Temporal modeling using autoregression

The constraint (5c) defines independent autoregression models for timeseries of latent flows corresponding to the rows of the matrix . For the th timeseries of latent flows, the autoregression model approximates every element in that timeseries as a weighted sum of its previous elements, that is,

When computing the weighted sum for an element , not all elements preceding it are taken into account but only the elements for , where is the lag set. In our model (5c), all autoregressive models share a common lag set , but there are weight vectors gathered in the matrix . The lag have indices to indicate positive correlation among OD flows. In particular, the OD flows and are assumed to be correlated if . Equivalently, this means that the OD flow vectors and are assumed to be correlated, since our model assumes for all .

Modeling temporal dependencies via (5c), that is, using autoregressive models sharing a common lag set , has several advantages over using a multivariate autoregressive model or over conventional graph embedding approach:

  1. It requires only nonnegative weights in contrast to multivariate autoregressive models which require nonnegative weights. Hence, it is less prone to overfitting and noise.

  2. There is no restriction on defining indices of the lag set . In contrast, conventional graph embedding approaches (e.g. Cai et al. (2011)) either use a lag set with few elements with short dependencies, or require a prior knowledge of temporal dependencies which is generally not available.

  3. Unlike conventional graph embedding approaches (e.g. Cai et al. (2011)), weights of autoregression models (that is, ) can be learnt dynamically.

  4. Like conventional graph embedding approaches such as Cai et al. (2011), each of the autoregression models have an equivalent graph representation as explained in Yu et al. (2016); see Figure 1(a).

    (a)
    (b)
    Figure 1: (a) A nonnegative weighted graph defined by a typical graph embedding approach and associated to the th row of . (b) A weighted signed graph defined by the th autoregressive model associated to the th row of . In both cases the lag set is supposed to be . In (b) gray colored arrows indicates negatively weighted edges due to negative correlation between the two nodes. This figure is inspired from (Yu et al., 2016, Figures 2-3).

    Hence, the constraint (5c) can be incorporated into the data fitting term (5a) using exiting techniques like Laplacian regularizer.

  5. Unlike conventional graph embedding approaches such as Cai et al. (2011), the graph associated with the th autoregression model may contain negatively weighted edges to indicate negative correlation between two nodes of the graph; see Figure 1(b) for an illustration.

We refer the interested readers to Yu et al. (2016) and the references for more details.

3 Training our traffic estimation model (5)

In this section, we first provide our regularized NMF model to tackle (5) in Section 3.1. In fact, in practice, because of noise and model misfit, it is not reasonable to strictly enforce the constraints (5c) and (5d) (in fact, the orthogonality constraint could even make the problem infeasible), and hence it makes more sense to only penalize the solutions that violate these constraints. Then we propose a fast gradient method to solve it in Section 3.2.

3.1 Incorporation of constraints into the model

Let us consider the following regularized NMF-based network traffic estimation model, which replaces the constraints (5c) and (5d) in (5) with regularization terms:

(6)

where is the temporal regularizer to incorporate the constraint (5c) with (resp. ) the th row of (resp. ), and is the orthogonality regularizer to incorporate the constraint (5d); see below for more details. The positive numbers and are the penalty parameters for the regularizers and respectively and, and are the th rows of and respectively.

Let us briefly discuss the two regularizers.

Orthogonality

The orthogonality regularizer in (6) corresponds to the orthogonality constraint (5d) and is defined as:

This is standard least-squares penalty for orthogonality constraints; see for example Ahookhosh et al. (2019).

Autoregression

We use the th temporal regularizer from Yu et al. (2016), which is equivalent to the following Laplacian regularizer:

(7)

where is the weight matrix of the graph and is the diagonal matrix such that:

where and , and

The graph Laplacian is defined as:

The first term in (7) is standard for Laplacian regularizers of conventional graph embedding approaches (see, e.g., Cai et al. (2011)), whereas the second term is specific to the autoregression model Yu et al. (2016) due to negative edges in the associated graph indicating negative correlation between nodes; see  Figure 0(b).

3.2 Fast gradient algorithm

In this section, we describe a training algorithm for the network traffic estimation model (6). The algorithm is iterative and is based on block coordinate descent method, as most NMF algorithms Gillis (2020). Such iterative algorithms require an initialization. The initial iterates are denoted , and which we compute as follows:

  1. For the given training OD flow matrix and the rank of factorization , the matrices and are initialized by (, ) = NNSVD-LRC(, ) where NNSVD-LRC is an effective initialization for NMF based on the SVD Atif et al. (2019).

  2. The th row of is initialized as the projection of the unconstrained solution:

    where is the projection onto the nonnegative orthant, that is,

    where denotes the entry of at position , is the (left) pseudoinverse of and is defined as:

1: is an -by- training OD flow matrix, , and are the initial values for , and respectively, is the maximum number of iterations, and is a threshold for early stopping when the differences between two consecutive errors is small enough.
2:Final values for , and and solving (6)
3:Initialize: ; ; ; ; ; ; ;
4:repeat
5:    FastGradientUpdate();
6:    FastGradientUpdate();
7:    FastGradientUpdate();
8:   ; ; ;
9:until 
Algorithm 1 Solving the traffic matrix estimation model (6), MCST-NMF

Given the initial estimate , Algorithm 1 further improves the solution via an iterative process. We use a standard strategy in NMF, that is, alternatively updating the block variables . For each block, we use a first-order accelerated gradient descent method with optimal convergence rate Nesterov (1983); as done in Guan et al. (2012) for the standard NMF model. In a nutshell, such methods take a gradient step from an extrapolated sequence.

Algorithm 1 describes the proposed training algorithm in detail. It takes as as input the initial values of three free parameters , and , the training OD flows , the routing matrix , the rank of factorization , the lag set , and the maximum number of iterations . In each iteration Algorithm 1 alternatively optimizes , and using the accelerated gradient method as described in Algorithm 2 with help of Table 1. Note that we use a restarting mechanism in the fast gradient method which typically performs well in practice; see the discussion in O’donoghue and Candes (2015).

variable quantity formula
, for .
Table 1: Computations needed for Algorithm 2, FastGradientUpdate(), depending on the updated variable, is either , or .
1: is an -by- training OD flow matrix, , , and are the three free parameters, the maximum iterations for loop, and the minimum differences between two consecutive errors w.r.t. initial error for early stopping before iterations respectively.
2:Updated by solving (6) for (where can be , or )
3:Initialize: ; ; ; ;
4:Compute as described in Table 1
5:;
6:repeat
7:   ;
8:   Compute and as described in Table 1.
9:   ;
10:   ;
11:   Compute as described in Table 1
12:   ;
13:   if  then
14:      ; ;
15:   end if
16:   ; ; ;
17:until 
Algorithm 2 FastGradientUpdate()

Because accelerated gradient methods do not ensure the objective function to decrease at every iteration, and because we optimize alternatively , we embed Algorithm 2 with a restarting scheme: if the objective function increases, the algorithm abandon the extrapolated sequence and takes a standard gradient step, as done for example in Xu and Yin (2013). This ensures the objective function will decrease.

Choice of the penalty parameters

It is not an easy task to tune the two penalty parameters and for the regularization terms in (6). Given the initial iterate , we will use

(8a)
(8b)

where . This choice allows to balance the importance of the penalty terms compared to the data fitting term, at initialization. Like other existing regularized models, e.g., Roughan et al. (2012), the tuning of the penalty parameters, and in our model, is a difficult task and typically problem dependent (nature of the given network, noise level, etc.). In practice, a useful way to tune parameters is to use cross validation (see, e.g., Berrar (2018)), that is, use training sets to train a model with different parameter values, and then select the values of the parameters that lead to the best results on the test sets. For our problem, a value of and around 10-20% typically works well in practice.

Handling the missing entries

It is usual that the traffic matrix used for training a given traffic estimation model contains missing entries, even in the presence of a good measurement system Tune et al. (2013). To the best of our knowledge, all the existing state-of-the-art algorithms assume that the training data is full and complete. To address the issue of an incomplete training dataset, we propose two approaches. The first approach preprocesses the given training dataset to fill in the missing entries using a weighted NMF model trained using a fast gradient method Dorffer et al. (2017). The second approach modifies Algorithm 1

using the expectation maximization strategy proposed in 

Zhang et al. (2006).

Denoting the binary mask matrix so that if and only if is observed, the second approach for handling the missing entries modifies Algorithm 1 to use the data matrix instead of at iteration , where is defined as

where is the element-wise matrix multiplication and is the all one matrix.

3.3 Estimation of future traffic flows

1: an -by- routing matrix, the -by- matrix, the new observed link flow , , and the maximum number of iterations for fast gradient descent and expectation maximization iteration steps respectively, are the minimum difference between two consecutive errors for early stopping before and iterations of fast gradient descent expectation maximization iteration steps, respectively.
2:Final estimated OD flow for timestamp .
3:Initialization for fast gradient steps:
4:; ; ; ;
5:; ; ;
6:; ;
7:repeat
8:   ;
9:   ;
10:   ;
11:   ; ;
12:   if  then
13:      ; ; ;
14:   end if
15:   ; ; ;
16:until 
17:Initialization for expectation maximization steps:
18:; ;
19:; ; ; ;
20:repeat
21:   for  do
22:      
23:   end for
24:   ; ; ;
25:until 
26:;
Algorithm 3 Estimating OD flows

It is a common practice to improve the solution obtained by the traffic estimation model using an Expectation Maximization (EM) iterative algorithm or iterative proportional fitting (IPF) algorithm; see for example Soule et al. (2005); Jiang et al. (2011); Zhou et al. (2016). In this paper, we use the EM approach from Vardi (1996b) combined with a fast gradient method, which is empirically found to be more effective than applying the EM algorithm alone. Algorithm 3 provides the exact details of our traffic estimation procedure.

Once the proposed model (6) has been trained using Algorithm 1 over a given network, let us explain how the new unobserved OD flows can be estimated as using the estimated parameters , the routing matrix , and the observed link flow ; see Algorithm 3. The algorithm first estimates the latent flow as ; see Section 2.1. It is then refined using a few steps of projected fast gradient descent applied on the following minimization problem:

Using the final estimated latent flow after th iteration of the projected fast gradient method, initialize and . The th iteration of EM for solving (1) is as follows Vardi (1996b, a):

where and are the th and th OD flows of respectively, is the th link flow of and is the th element of the binary routing matrix . The output of the EM algorithm after th iteration will be the final estimated OD flow and be the output of Algorithm 3.

4 Experimental Results

In this section, we evaluate the performance of our proposed traffic estimation model which we refer to as MCST-NMF (if the training dataset has missing entries, we use W-NeNMF Dorffer et al. (2017) by default to fill in the missing entries), otherwise as MCST-NMC when missing entries are dealt with using EM. Moreover, we consider the following three state-of-the-art methods for the performance comparison a) PCA Soule et al. (2005), b) MNETME Zhou et al. (2016), and c) CS-DME Qazi et al. (2018). We consider these state-of-the-art because they are all based on linear dimensionality reduction, like MCST-NMF; see Section 1.

Performance metrics

Given the true the true OD flow matrix of a test data and the corresponding estimated OD flow matrix by an algorithm, we use the two performance metrics SRE and TRE. They are row and column vectors of dimension and respectively. The th and th elements of SRE and TRE vectors are defined as follows:

(9)
(10)

The scalar SRE is the normalized mean squared error of th OD flow among all test timestamps, whereas TRE is the normalized mean squared error of the th test timestamp among all OD flows.

These two performance metrics, SRE and TRE, are widely used in the literature to measure the performance a given network traffic estimation model. SRE primarily describes how well a given model is able to estimate a given OD flow over a given time period . Hence, it provides us the insight about the performance of the given model w.r.t. a given OD flow. In contrast, TRE describes the efficiency of a given model to estimate all OD flows in the given network at a particular timestamp . Hence, it provides us with the insight about the performance of the given model w.r.t. a given timestamp.

Tables 2 and 4 report the statistical properties of these two vectors, that is, of the SRE and TRE on the test dataset.

Datasets

The performance of different models is evaluated over two publicly available and widely used Internet traffic datasets, namely Internet2 Yin Zhang (2004) and GÉANT Uhlig et al. (2006). Internet2 (a.k.a. Abilene) is the high-speed backbone network of US. It consists of routers, OD flows and, and bidirectional internal and external links respectively. The topology of Internet2 network is depicted in Figure 2-(a). The dataset Yin Zhang (2004) provides the measurements of weeks of OD flows and the routing matrix of the network. The traffic is measured at an interval of min and reported in the unit of bytes. GÉANT is a research and educational network for Europe which is almost twice as large as Internet2. Figure 2-(b) illustrates the topology of the network. It has routers, OD flows and, and bidirectional internal and external links respectively. GÉANT Uhlig et al. (2006) is a collection of 4 months of OD flows measurements and the routing information of the network. The corresponding link flow matrix can be obtained using (2). Traffic measurements are in unit of kpbs and were performed at an interval of minutes. Note that, in both cases, the link flow matrix can be calculated using (2).

(a) Internet2 topology
(b) GÉANT topology
Figure 2: (a) Internet2 topology. (b) GÉANT topology. Routers are numbered green coloured circles. Links are red coloured arrows. Only internal links are shown.
Parameter setting for experiments

For experiments on the two datasets Internet2 and GÉANT, we use the following specifications:

  1. For Internet2, the initial days time slots ( time slots) of traffic data are used during experiments. The initial days ( time slots) of traffic data in this time slot are used for training, and the subsequent days ( time slots) traffic data are used for testing. For GÉANT, the three weeks of traffic data ( time slots) are used during experiments. The initial two weeks of traffic data ( time slots) are used for training of models, and the subsequent one week of traffic data ( time slots) is used for testing.

  2. The factorization rank is set to for all models.

  3. The models other than the proposed model are run with their default parameters.

  4. During training MCST-NMF, the penalty parameters and for regularization terms are obtained using (8) by setting . For Internet2, . For GÉANT, and respectively.

  5. During training MCST-NMF, the maximum number of iterations is set to , while , and are set to . The parameter is set to , and are set to , and is set to .

  6. For testing MCST-NMF, Algorithm 3 parameters are set as follows: a) and are set to and , respectively, and b) and are set to .

Choice of the lag set

For training MCST-NMF using Algorithm 1 requires a lag set. We strategically choose the lag set during experiments by keeping the general temporal behaviour of backbone computer network in our mind i.e. a) network traffic behaviour persist for short period of time, b) network traffic is likely to repeat its behaviour on a hourly basis, c) network traffic may be similar after hours at the beginning and end of working hours and d) network traffic exhibit diurnal pattern. Based on the described strategy, the chosen lag sets are and for Internet2 and GÉANT respectively.

All tests are preformed using MATLAB R2018b (Student License) under Windows 10 environment on a laptop Intel CORE i5-3YY6U768Y0M CPU @2.60GHz 4GB RAM. The code of our proposed methods is available from https://github.com/5y3datif/MCST-NMF.

4.1 Experiments on Internet2

MCST-NMC MCST-NMF MNETME CS-DME PCA
minimum 0.07 0.05 0.07 0.81 0.07
maximum 1.36 1.34 11.86 1.03 2.37
mean 0.40 0.39 0.69 0.92 0.51
median 0.36 0.36 0.43 0.92 0.43
standard
deviation
0.25 0.25 1.23 0.03 0.35
(a) Statistical properties of the SRE.
MCST-NMC MCST-NMF MNETME CS-DME PCA
minimum 0.05 0.06 0.13 0.80 0.08
maximum 0.30 0.31 0.37 0.97 0.54
mean 0.18 0.18 0.26 0.91 0.30
median 0.18 0.19 0.26 0.91 0.32
standard
deviation
0.04 0.05 0.04 0.01 0.08
(b) Statistical properties of the TRE.
Table 2: Statistical properties of the traffic estimation errors, SRE (9) and TRE (10), on the test set for the Internet2 data set. The lowest (best) values are highlighted in bold.

Let us examine the results on the Internet2 dataset to assess the performance of our two proposed methods compared to the state of the art. Table 2 presents the statistical properties of SRE and TRE of the different methods.

SRE of our proposed methods, MCST-NMF and MCST-NMC, is the best in terms of the minimum, mean, median and standard deviation with values

, , and , respectively. The closest competitor in terms of the minimum, mean and median of SRE has values larger by , and , respectively. In terms of the maximum value of SRE, our methods are the second best ( and ). Similarly, TRE of MCST-NMC and MCST-NMF have the lowest values w.r.t. all four criteria, namely the minimum ( and), maximum ( and ), mean ( and ) and median ( and ). Its nearest competitor with respect to corresponding criteria has values larger by , , and , respectively.

To better visualize the differences, Figure 3

displays the cumulative distribution functions (CDF) of the SRE and TRE of the different methods. We observe that

of OD flows estimated by our proposed methods (MCST-NMC and MCST-NMF) have SRE of at most . In contrast, the nearest competitor, PCA, has SRE of at most for of OD flows that is higher comparing to that of MCST-NMF. Results for the TRE for MCST-NMC and MCST-NMF are even more striking. All the models show low TRE but of OD flows estimated by our proposed methods (MCST-NMC and MCST-NMF) has TRE of at most and , respectively, which is and smaller than the second best model, namely CS-DME (with value ) as shown on Figure 2(b).

(a) CDFs of SRE
(b) CDFs of TRE
Figure 3: Cumulative distributed function (CDF) of SRE (left) and TRE (right) resulting from comparing traffic estimation methods over Internet2.

4.1.1 Impact of the regularization parameters

This section analyzes the impact of the regularization parameters, and , on the performance of the proposed methods when tested over the Internet2 dataset. To do so, we consider four cases by setting and to their default value or to , with a total of four cases. All the remaining parameters are kept as described in the paragraph Parameter setting for experiments during all the conducted experiments. The obtained results are reported in Table 3.

SRE TRE
MCST-NMF MCST-NMC MCST-NMF MCST-NMC
,
,
,
,
Table 3: Impact of the regularization parameters with respect to SRE and TRE for the Internet2 dataset. Lowest values are in bold.

We observe that our two methods have the best performance when both the regularization terms are used. Using only one regularization term or none of the two regularization terms deteriorate SRE and TRE upto and respectively. This illustrates the effectiveness of the two regularization terms. Note that the case represent the case where the lag set which has a significant impact on the TRE; see Table 3.

4.2 Experiment on GÉANT

We now perform the same results as for Internet2 on the GÉANT data set. Table 4 presents the values of SRE and TRE for the different methods.

MCST-NMC MCST-NMF MNETME CS-DME PCA
minimum 0.00 0.00 0.00 0.12 0.02
maximum 23 46 59 30 37
mean 5.71 14.21 14.87 2.53 10.47
median 0.82 0.86 0.87 1.27 0.91
standard
deviation
6.84 22.34 14.76 3.75 86.99
(a) Statistical properties of SRE
MCST-NMC MCST-NMF MNETME CS-DME PCA
minimum 0.05 0.05 0.06 0.41 0.06
maximum 0.26 0.26 0.60 1.13 0.38
mean 0.09 0.09 0.13 0.64 0.12
median 0.08 0.08 0.10 0.61 0.11
standard
deviation
0.03 0.03 0.10 0.12 0.05
(b) Statistical properties of TRE
Table 4: Statistical properties of SRE and TRE for PCA, MNETME, CS-DME and MCST-NMF over GÉANT test data. Lowest values are in bold.

The SRE of our proposed methods is the best in terms of the median. Moreover, the matrix completion variant of our proposed model, MCST-NMC, is the best in terms of all criteria except except for the mean and standard deviation. The closest competitors of MCST-NMC in terms of the minimum, maximum and median of SRE has values larger by , and , respectively. The TREs of our proposed methods are the same. The closest competitor in terms of the minimum, maximum, mean, median and standard deviation has value larger by , , , and , respectively. To further illustrate the differences, Figure 4 shows the cumulative distribution functions (CDF) of the different methods. We observe that of OD flows estimated by our proposed methods, MCST-NMC and MCST-NMF, have SRE of value at most and , respectively. The second best performing model (MNETME) has SRE of at most for of OD flows. This value is larger by compared to MCST-NMC; see Figure 3(a). Results of TRE for MCST-NMC and MCST-NMF are very similar. All the models show low TRE but of OD flows estimated by MCST-NMC and MCST-NMF have TRE of at most whereas the second best model, PCA, has TRE of at most for of OD flows. This value is larger by ; see Figure 3(b).

(a) CDFs of SRE
(b) CDFs of TRE
Figure 4: Cumulative distributed function (CDF) of SRE (left) and TRE (right) resulting from comparing traffic estimation models over GÉANT test data. (On the right plot, the TRE of the two proposed variants, MCST-NMF and MCST-NMC, overlap.)

4.2.1 Impact of the regularization parameters

This section analyzes the impact of the regularization parameters, and on the performance of proposed solutions when tested over GÉANT dataset, exactly as for the Internet2 dataset. Table 5 reports the values of the SRE and TRE for the different values of the regularization parameters.

SRE TRE
cases MCST-NMF MCST-NMC MCST-NMF MCST-NMC
,
,
,
,
Table 5: Impact of the regularization parameters with respect to SRE and TRE over GÉANT. Lowest values are in bold.

We observe that both proposed methods exhibit the best performance when the regularization terms are used. This is particularly true for the SRE, with a significant deterioration for both methods; namely from 1.20 to 4.25 for MCST-NMF, and from 1.12 to 1.80 for MCST-NMC. Moreover, we observe that the use of the lag set is particularly important to have low SRE values, as using significantly impacts the SRE.

5 Conclusion

In this paper, we have proposed an NMF-based approach to tackle the network traffic flow estimation problem. To the best of our knowledge, it is the first time NMF is used for this specific task. A notable shortcoming of previously explored dimensionality-reduction approaches (e.g., based on PCA) was to solve these problems by ignoring the nonnegativity constraints. Moreover, our approach uses two regularizers, namely orthogonality to better cluster the data, and autoregression to take the temporal correlations into account, which further improves its performance.

We proposed their two different variants of our model: (1) MCST-NMF that is most suitable when the training dataset does not have missing entries, and (2) MCST-NMC that is designed specifically to handle missing entries. We have shown on two real-world data sets, namely GÉANT and Internet2, that our methods outperform existing techniques based on linear dimensionality reduction.

Further Works

Recently, deep matrix factorization models have emerged; in particular deep NMF has been shown to be able to capture several layers of meaningful features; see, e.g., Huang et al. (2019, 2020b); De Handschutter and Gillis (2021). It would be interesting to adapt our NMF based traffic matrix estimation model into a deep NMF model. Other directions of research include the use of other NMF models to perform the traffic flow estimation, such as Xu et al. (2021); Nie et al. (2018), or to improve prediction using data coming from other sources than the traffic flows and then use multi-view techniques such as Kumar et al. (2020); Huang et al. (2020a).

Acknowledgement

The authors would like to thank the reviewers for the insightful feedback that helped improve the paper significantly.

The financial support of HEC Pakistan is highly acknowledged for granting PhD scholarship to the first author. NG acknowledges the support by the ERC starting grant No 679515, and by the Fonds de la Recherche Scientifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) under EOS project O005318FRG47.

References

  • M. Ahookhosh, L. T. K. Hien, N. Gillis, and P. Patrinos (2019) Multi-block bregman proximal alternating linearized minimization and its application to sparse orthogonal nonnegative matrix factorization. arXiv preprint arXiv:1908.01402. Cited by: §3.1.
  • A. M. S. Ang and N. Gillis (2019) Algorithms and Comparisons of Nonnegative Matrix Factorizations With Volume Regularization for Hyperspectral Unmixing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (12), pp. 4843–4853. External Links: ISSN 1939-1404, 2151-1535, Link, Document Cited by: §1.2.
  • S. M. Atif, S. Qazi, and N. Gillis (2019) Improved SVD-based initialization for nonnegative matrix factorization using low-rank correction. Pattern Recognition Letters 122, pp. 53–59 (en). External Links: ISSN 01678655, Link, Document Cited by: item 2, item 1.
  • D. Berrar (2018) Cross-validation. Encyclopedia of Bioinformatics and Computational Biology 1, pp. 542–545. Cited by: §3.2.
  • D. Cai, X. He, J. Han, and T. S. Huang (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (8), pp. 1548–1560. External Links: ISSN 0162-8828, Link, Document Cited by: §1.1, item 2, item 3, item 4, item 5, §3.1.
  • H. Cai, V. W. Zheng, and K. C. Chang (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30 (9), pp. 1616–1637. Cited by: §1.1.
  • R. Chen and H. Li (2016) Online algorithm for foreground detection based on incremental nonnegative matrix factorization. In 2016 2nd International Conference on Control, Automation and Robotics (ICCAR), Hong Kong, Hong Kong, pp. 312–317. External Links: ISBN 978-1-4673-9858-9 978-1-4673-9859-6, Link, Document Cited by: §1.2.
  • P. De Handschutter and N. Gillis (2021)

    Deep orthogonal matrix factorization as a hierarchical clustering technique

    .
    In 2021 28th European Signal Processing Conference (EUSIPCO), Cited by: §5.
  • C. Dorffer, M. Puigt, G. Delmaire, and G. Roussel (2017) Fast nonnegative matrix factorization and completion using nesterov iterations. In International Conference on Latent Variable Analysis and Signal Separation, pp. 26–35. Cited by: §3.2, §4.
  • M. Emami, R. Akbari, R. Javidan, and A. Zamani (2019) A new approach for traffic matrix estimation in high load computer networks based on graph embedding and convolutional neural network. Transactions on Emerging Telecommunications Technologies 30 (6), pp. e3604. Cited by: §1.1.
  • N. Gillis (2020) Nonnegative matrix factorization. SIAM, Philadelphia. External Links: Link Cited by: §3.2.
  • G. H. Golub and C. F. Van Loan (2013) Matrix computations. Fourth edition edition, Johns Hopkins studies in the mathematical sciences, The Johns Hopkins University Press, Baltimore. Note:

    OCLC: ocn824733531Matrix multiplication – Matrix analysis – General linear systems – Special linear systems – Orthogonalization and least squares – Modified least squares problems and methods – Unsymmetric eigenvalue problems – Symmetric eigenvalue problems – Functions of matrices – Large sparse eigenvalue problems – Large sparse linear system problems – Special topics

    External Links: ISBN 978-1-4214-0794-4 Cited by: §1.1.
  • N. Guan, D. Tao, Z. Luo, and B. Yuan (2012) NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization. IEEE Trans. Signal Process. 60 (6), pp. 2882–2898. External Links: ISSN 1053-587X, 1941-0476, Link, Document Cited by: §3.2.
  • C. Hu, S. Wang, J. Tian, B. Liu, Y. Cheng, and Y. Chen (2008) Accurate and Efficient Traffic Monitoring Using Adaptive Non-Linear Sampling Method. In IEEE INFOCOM 2008 - IEEE Conference on Computer Communications, Phoenix, AZ, USA, pp. 26–30. External Links: ISBN 978-1-4244-2026-1 978-1-4244-2025-4, Link, Document Cited by: §1.
  • S. Huang, Z. Kang, and Z. Xu (2020a) Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recognition 97, pp. 107015. Cited by: §5.
  • S. Huang, Z. Xu, Z. Kang, and Y. Ren (2020b) Regularized nonnegative matrix factorization with adaptive local structure learning. Neurocomputing 382, pp. 196–209. Cited by: §5.
  • S. Huang, P. Zhao, Y. Ren, T. Li, and Z. Xu (2019) Self-paced and soft-weighted nonnegative matrix factorization for data representation. Knowledge-Based Systems 164, pp. 29–37. Cited by: §5.
  • D. Jiang, X. Wang, L. Guo, H. Ni, and Z. Chen (2011) Accurate estimation of large-scale IP traffic matrix. AEU - International Journal of Electronics and Communications 65 (1), pp. 75–86 (en). External Links: ISSN 14348411, Link, Document Cited by: §1.1, §3.3.
  • A. Kumar,