1 Introduction
Many realworld data ranging from physical systems, social interactions, network flows, knowledge graphs to biological and chemical reactions are often represented as graphs, especially for anomaly or community detection
[1, 2, 3, 4, 5, 6, 7, 8, 9] and graph signal processing [10, 11, 12, 13, 14]. Dimensionality reduction methods on graphs allow one to decompose a graph into principal components using a spectral decomposition of the graph adjacency or graph Laplacian matrix. In this paper we propose a general framework to dimensionality reduction based on spectral decomposition of a matrix composed of many different graph centrality statistics. This general framework leads to a singlegraph decomposition method that extends graph principal components analysis (PCA) and a graphensemble decomposition method that extends dictionary learning. These methods are applicable to both directed and undirected graphs with edge weights and are based on a spectral decomposition, specifically the singular value decomposition (SVD), of a matrix composed of multiple graph centrality statistics. The proposed methods are denoted multicentrality graph PCA (MCGPCA) and multicentrality graph dictionary learning (MCGDL), respectively. By integrating multiple descriptions of graph centrality, the proposed methods provide graph community detection and graph structure learning that are significantly more robust to noise and variation affecting graph connectivity structures.
In [15], a kind of graph PCA is performed on the distance matrix of average commute time between nodes. In [16], PCA can also be performed on the graph Laplacian matrix of nodal similarities. In [17], the graph Laplacian matrix is used as a smooth regularization function for robust PCA on graphs. In [18, 19], PCA is performed on the matrix of origindestination traffics. In [20, 21, 22], dictionary learning methods for graph signals are proposed based on the graph Laplacian matrix. Dictionary learning, also known as sparse coding, linear unmixing and matrix factorization, has been applied to collections of images, audio, and graph signals to learn low dimensional representations that give a sparse approximation to the entire collection. Dictionary learning finds a low rank factoredmatrix approximation to the observation matrix, whose columns span this collection. Many different methods for this approximation problem have been proposed [23, 24]. Among the simplest methods is the KSVD approach [25] which uses a spectral decomposition to determine the best low rank approximation to the observed matrix. For the purposes of illustration, in this paper we adopt this latter spectral approach for learning a dictionary spanning an ensemble of graphs.
More often graph PCA and graph dictionary learning approaches start with a set of raw multivariate data samples, create a similarity (or dissimilarity) graph of the data samples, and aim to learn a lowdimensional or sparse representation of the original multivariate dataset. When applied to graph data, these methods are often limited to graphs that are weighted, undirected and connected, which may not be feasible for applications such as cyber network data analysis. Furthermore, these methods often accomplish graph decomposition based on a single measure of centrality, e.g., betweeness centrality [26], closeness centrality [27], ego centrality [28]
, or eigenvector centrality
[1]. In this paper we introduce graph spectral decomposition methods that combine multiple centrality measures such as graph walk statistics and graph distances as structural features and apply them to different graph types including weighted, directed and disconnected graphs. The proposed MCGPCA method decomposes a single graph utilizing multiple centrality features, achieving dimensionality reduction and feature decorrelation of the graph. The proposed MCGDL performs dictionary learning across a population of graphs using multiple centrality features to learn the atoms of the dictionary and the corresponding coefficients to represent each individual graph in terms of its projection onto the dictionary. Applying our approach to cyber intrusion detection, we use MCGPCA to define a structural difference score (SDS) that reflects structural variations within a graph and we use MCGDL to learn discriminative structural atoms for classifying the presence of cyber attacks.
2 Structural Feature Extraction on Graphs
Here we describe three categories of generic structural features that can be extracted from a graph, namely graph walk statistics, centrality measures and internode distances. The utility of the introduced features with respect to different graph types, including weighted, directed and disconnected graphs, is summarized in Table 1. While not investigated in this paper, applicationspecific features such as website hit rates, social interaction frequency, sourcedestination traffics can also be leveraged as structural features. Without loss of generality a graph can be characterized by two matrices and representing the adjacency and weight matrix, respectively, where () is the set of nodes (edges), and is the total number of nodes (i.e., graph size). is a binary matrix such that its entry if there is an edge connecting from node to node , and otherwise. Throughout this paper we consider graphs with nonnegative edge weights such that is a nonnegative matrix, where its entry if , and otherwise.
2.1 Graph walk statistics
Graph walk statistics include commute time and cover time[29], graph diffusion [30], hitting times [31], and hop walks. In this paper we focus on hop walk statistics. An hop walk of a node on a graph is a path starting from the node and traversing through (possibly repeated) edges. An hop walk weight is defined as the sum of edge weights of the corresponding path.
We consider the number and total weight of hop walks of each node as features since they entail the structural information of nodal reachability relative to its hop vicinity.
In principle one should extract graph walk statistics from to at least hops as structural features, where graph diameter is the largest shortest path hop count between any node pairs in all connected components of a graph.
We propose an efficient iterative computation method to incrementally computes these two structural features with respect to the hop count number :
Iterative computation of number of hop walks
Let denote the matrix product of copies of . Observe that the entry of ,
, is the number of hop walks from to . Extending this result to we have
being the number of hop walks from to . Let be a column vector where its entry is the number of hop walks starting from and denotes the column vector of ones. Then can be computed by the matrixvector product iteration
(1) 
Iterative computation of total hop walk weight
Let be an matrix such that its entry is the sum of all hop walk weights from node to node .
Then we have
(2) 
where we use . Let denote a column vector such that its entry is the total hop walk weight starting from node . Then can be computed by
(3) 
Feature / Graph Type  Weighted  Directed  Disconnected 

# of hop graph walks  ✓  ✓  ✓ 
total hop walk weight  ✓  ✓  ✓ 
degree  ✓  ✓  ✓ 
betweenness  ✓  ✓  
closeness  ✓  ✓  
eigenvector centrality  ✓  ✓  ✓ 
ego  ✓  ✓  ✓ 
LFVC  ✓  ✓  
graph distance  ✓  ✓ 
2.2 Centrality measures
A centrality measure is a quantity that evaluates the level of importance or influence of a node in a graph and it reflects certain topological characteristics.
Here we introduce several centrality measures, which will be used in the sequel to define feature sets associated with a graph or a set of graphs.
Degree. Degree is defined as the number of edges associated with a node. It can be extended to directed graphs by considering the number of edges connecting to (from) a node as indegree (outdegree).
Betweenness [26]. Betweenness is the fraction of shortest paths passing through a node relative to the total number of shortest paths in the graph. It is infeasible for disconnected graphs since it is based on shortest path distance.
The betweenness of node
is defined as
(4) 
where is the total number of shortest paths from to and is the number of such shortest paths passing through .
Closeness [27]. Closeness is associated with the shortest path distances of a node to all other nodes.
Let denote the shortest path distance between node and node in a connected graph. Then
(5) 
Eigenvector centrality [1]. Eigenvector centrality of node is the
th entry of the eigenvector associated with the largest eigenvalue of the weight matrix
. It is defined as(6) 
where , is the largest right eigenpair of .
Ego centrality [28]. Ego centrality can be viewed as a local version of betweenness that computes the shortest paths between its neighboring nodes. Let denote the degree of node , denote the local weight matrix of node ,
be the identity matrix, and let
denote entrywise matrix product. Ego centrality is defined as(7) 
Local Fiedler Vector Centrality (LFVC) [32]. LFVC is a centrality measure that evaluates the structural importance of a node regarding graph connectivity. Let denote the eigenvector associated with the smallest nonzero eigenvalue of the graph Laplacian matrix. LFVC is defined as
(8) 
where is the set of nodes connecting to or from (i.e., neighbors).
2.3 Graph distances to a set of reference nodes
We propose to use graph distances of each node to a set of reference nodes as structural features that compensate the insufficiency of graph walk statistics and centrality measures when one performs MCGPCA on graphs with high structural symmetry. For example, consider a starlike graph where the central node is a singleton and each leaf node is an identical clique (i.e., a complete graph). All edges in the graph are undirected and have identical weight. Therefore this graph has high structural symmetry and apparently the nodes of identical structural property (e.g., connected to the central node or not) have the same graph walk statistics and centrality measures. To resolve the ambiguity of graph walk statistics and centrality measures due to high structural symmetry in graphs we use the shortest path distance of each node to the selected reference nodes as the additional structural features. In the example of the starlike graph with high structural symmetry, if then selecting any but the central node as a reference node can yield distinguishable structural features due to difference in shortest path distance to the reference node. The reference nodes are selected according to a user specified criterion, e.g. the nodes of maximal degrees.
3 Methodology
The extracted centrality features introduced in Sec. 2 can be represented as an matrix , where is the graph size, is the number of extracted features and each column of corresponds to a particular centrality feature that is normalized to have unit norm. The multicentrality feature matrix is then centered by subtracting the rowwise empirical average from each row.
3.1 Multicentrality graph PCA (MCGPCA)
In analogy to standard graph PCA, which is applied to the graph Laplacian matrix, MCGPCA is PCA applied to . PCA can be formulated as finding an orthonormal transformation on such that after transformation the multicentrality feature matrix is represented by an matrix
that maximally preserves the total data variance
, where denotes the sum of diagonal entries of a matrix and is a matrix such that . Such a matrix can be obtained by solving the right singular vectors associated with the largest singular values of , which is denoted by a matrix . Moreover, the total variance of is equivalent to the sum of the squared largest singular values of divided by . Therefore using MCGPCA we obtain dimensional coordinates representing structural scores with respect to the principal components (i.e., columns of ). The algorithm for MCGPCA is summarized in Algorithm 1.3.2 Structural difference score (SDS)
We use these structural coordinates (i.e., each row of ) to define a structural difference score (SDS) for each node in a graph. The SDS of node is associated with the total squared Euclidean distance to its neighboring nodes and its number of edges (i.e., degree ), which is defined as
(9) 
where denotes the th row of , denotes Euclidean distance, and the denominator is such that the SDS of a singleton node is welldefined.
3.3 Multicentrality graph dictionary learning (MCGDL)
Consider the case where a set of graphs is available, each possibly being of different graph size and connectivity pattern, e.g., data from a cyber network at different time instances. Multiplecentrality graph dictionary learning (MCGDL) is proposed to learn a sparse structure representation of by finding a dictionary consisting of atoms (columns of ) and an associated sparse coefficient matrix such that the representation error is minimized while satisfying the columnwise sparsity constraints on that the number of nonzero entries of each column can not exceed a specified value , where the columns in are structural features of and
denotes the Frobenious norm. Many different methods exist for solving the dictionary learning problem of estimating
and , often called the sparse coding problem [23, 24]. In this paper, we focus on a spectral method (KSVD) of dictionary learning introduced in [25]. The proposed MCGDL selects the highest SDS from each graph as one column of and applies KSVD to find the dictionary and the corresponding coefficient matrix. The algorithm is summarized in Algorithm 2.4 Experiments and Cyber Intrusion Detection
4.1 Illustration of sensitivity to structural changes on graphs
Here we consider four similar graphs with different structural characteristics as displayed in Fig. 1 (a). From top to bottom, these four graphs represent high structural symmetry, reduced structural symmetry due to edge removal, increase of the weight of edge (3,4), and change in edge direction. The extracted multicentrality features are 1) graph walk statistics from 1 to 4 hops, and 2) the graph distance to node 1 (the reference node). It can be observed from Fig. 1 (b) that MCGPCA can reflect structural perturbations, and total data variance is explained by one or two principal components. Moreover, the first principal component is shown to completer describe the network flow pattern for the directed example graph. Fig. 1 (c) shows that the graph distance feature adds discrimination power as the MCGPCA scores are better differentiated.
Dataset  # nodes  # edges  Description  

Day 1  5357  12887  Normal activity  
Day 2  2631  5614  Normal activity  
Day 3  3052  5406 


Day 4  8221  12594 


Day 5  24062  32848 


Day 6  5638  13958  Normal activity  
Day 7  4738  11492 

4.2 Cyber intrusion detection
The UNB intrusion detection evaluation dataset [33] described in Table 2 is a collection of directed cyber network graphs where each node is a host (machine) in a cyber system and an edge indicates the existence of communication between hosts. No information beyond graph topology is used for analysis. The extracted multicentrality features are 1) graph walk statistics from 1 to 20 hops, 2) all centrality measures introduced in Sec. 2.2 (edge directions are omitted for computing LFVC), and 3) graph distances to 10 reference nodes of highest degree, resulting in features (columns of ). Fig. 2 (a) shows that the proposed SDS statistic (Eqn. (9)) with principal components from MCGPCA. The SDS statistics are similar over days without attacks, whereas they are significantly higher in days under attacks that induce anomalous connectivity patterns (i.e. Days 3, 4 and 5). On the other hand degree statistic (Fig. 2 (b)) fails to be a valid indicator of cyber attacks. The SDS statistic fails to detect the SSH attack (Day 7) since it is a password attack that takes place only between a single host and a single server.
We applied MCGDL to the entire UNB database of graphs to learn a dictionary that spans the dataset. For this implementation of MCGDL we select atoms, SDS features and sparsity level. The two learned structural atoms in Fig. 2 (c) can be interpreted as a normal activity atom consisting of identical SDS features except for one spike accounting for the main router and an attack activity atom of higher variance in SDS features. The corresponding coefficients in Fig. 2 (d) reflect the mixture portion of these atoms and they can be used for attack classification. For instance, means clustering with clusters identifies Days 3, 4 and 5 as being anomalous and thus under attack.
5 Conclusion
This paper proposes PCA and dictionary learning graph decomposition methods that are based on multicentrality features of the graph. The proposed methods can reflect structural perturbations in graph symmetry, edge weight and edge direction. When applied to cyber intrusion detection, our experiments show that MCGPCA and MCGDL can effectively detect attacks on the network.
References
 [1] M. E. J. Newman, Networks: An Introduction. Oxford University Press, Inc., 2010.

[2]
C. C. Noble and D. J. Cook, “Graphbased anomaly detection,” in
ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 631–636.  [3] E. Hogan, P. Hui, S. Choudhury, M. Halappanavar, K. Oler, and C. Joslyn, “Towards a multiscale approach to cybersecurity modeling,” in IEEE International Conference on Technologies for Homeland Security (HST), 2013, pp. 80–85.
 [4] P.Y. Chen and A. O. Hero, “Assessing and safeguarding network resilience to nodal attacks,” IEEE Commun. Mag., vol. 52, no. 11, pp. 138–143, Nov. 2014.
 [5] C. Joslyn, S. Choudhury, D. Haglin, B. Howe, B. Nickless, and B. Olsen, “Massive scale cyber traffic analysis: A driver for graph database research,” in International Workshop on Graph Data Management Experiences and Systems (GRADES), 2013, pp. 3:1–3:6.

[6]
P.Y. Chen and A. Hero, “Phase transitions in spectral community detection,”
IEEE Trans. Signal Process., vol. 63, no. 16, pp. 4339–4347, Aug 2015.  [7] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data Mining and Knowledge Discovery, vol. 29, no. 3, pp. 626–688, 2015.
 [8] B. Miller, M. Beard, P. Wolfe, and N. Bliss, “A spectral framework for anomalous subgraph detection,” IEEE Trans. Signal Process., vol. 63, no. 16, pp. 4191–4206, Aug. 2015.
 [9] K. Oler and S. Choudhury, “Graph based role mining techniques for cyber security,” in FloCon, 2015.

[10]
D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,”
IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013.  [11] A. Bertrand and M. Moonen, “Seeing the bigger picture: How nodes can learn their place within a complex ad hoc network topology,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 71–82, 2013.
 [12] A. Anis, A. Gadde, and A. Ortega, “Towards a sampling theorem for signals on arbitrary graphs,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 3864–3868.
 [13] X. Wang, P. Liu, and Y. Gu, “Localsetbased graph signal reconstruction,” IEEE Trans. Signal Process., vol. 63, no. 9, pp. 2432–2444, May 2015.
 [14] S. Chen, A. Sandryhaila, J. Moura, and J. Kovacevic, “Signal recovery on graphs: Variation minimization,” IEEE Trans. Signal Process., vol. 63, no. 17, pp. 4609–4624, Sept. 2015.

[15]
M. Saerens, F. Fouss, L. Yen, and P. Dupont, “The principal components analysis of a graph, and its relationships to spectral clustering,” in
Machine Learning: ECML. Springer, 2004, pp. 371–383. 
[16]
B. Jiang, C. Ding, B. Luo, and J. Tang, “Graphlaplacian PCA: Closedform
solution and robustness,” in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2013, pp. 3492–3498.  [17] N. Shahid, V. Kalofolias, X. Bresson, M. M. Bronstein, and P. Vandergheynst, “Robust principal component analysis on graphs,” CoRR, vol. abs/1504.06151, 2015. [Online]. Available: http://arxiv.org/abs/1504.06151
 [18] A. Lakhina, M. Crovella, and C. Diot, “Diagnosing networkwide traffic anomalies,” in ACM SIGCOMM Computer Communication Review, vol. 34, no. 4, 2004, pp. 219–230.
 [19] J. Terrell, K. Jeffay, F. D. Smith, L. Zhang, H. Shen, Z. Zhu, and A. Nobel, “Multivariate SVD analyses for network anomaly detection,” in ACM SIGCOMM Conference Poster Session, 2005.
 [20] D. Thanou, D. Shuman, and P. Frossard, “Learning parametric dictionaries for signals on graphs,” IEEE Trans. Signal Process., vol. 62, no. 15, pp. 3849–3862, Aug. 2014.
 [21] X. Zhang, X. Dong, and P. Frossard, “Learning of structured graph dictionaries,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2012, pp. 3373–3376.
 [22] D. Thanou and P. Frossard, “Multigraph learning of spectral graph dictionaries,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2015, pp. 3397–3401.
 [23] H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding algorithms,” in Advances in neural information processing systems (NIPS), 2006, pp. 801–808.
 [24] R. Jenatton, J. Mairal, F. R. Bach, and G. R. Obozinski, “Proximal methods for sparse hierarchical dictionar learning,” in International Conference on Machine Learning (ICML), 2010, pp. 487–494.
 [25] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, 2006.
 [26] L. Freeman, “A set of measures of centrality based on betweenness,” Sociometry, vol. 40, pp. 35–41, 1977.
 [27] G. Sabidussi, “The centrality index of a graph,” Psychometrika, vol. 31, no. 4, pp. 581–603, 1966.
 [28] M. Everett and S. P. Borgatti, “Ego network betweenness,” Social Networks, vol. 27, no. 1, pp. 31–38, 2005.
 [29] L. Lovász, “Random walks on graphs: A survey,” Combinatorics, Paul erdos is eighty, vol. 2, no. 1, pp. 1–46, 1993.
 [30] M. Gomez Rodriguez, J. Leskovec, and A. Krause, “Inferring networks of diffusion and influence,” in ACM SIGKDD international conference on Knowledge discovery and data mining, 2010, pp. 1019–1028.
 [31] L. Galluccio, O. Michel, P. Comon, M. Kliger, and A. O. Hero, “Clustering with a new distance measure based on a dualrooted tree,” Information Sciences, vol. 251, pp. 96–113, 2013.
 [32] P.Y. Chen and A. O. Hero, “Local Fiedler vector centrality for detection of deep and overlapping communities in networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 1120–1124.
 [33] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward developing a systematic approach to generate benchmark datasets for intrusion detection,” Computers & Security, vol. 31, no. 3, pp. 357–374, 2012.
Comments
There are no comments yet.