We live in a world where many things form network structures, including social networks, citation networks and word cooccurrence networks. Under the circumstances, network embedding is proposed to process and exploit the network data, which facilitates downstream machine learning tasks such as network reconstruction, link prediction and node classification.
Most information contained in ubiquitous networks is reflected by the proximity between nodes, or weights of edges, which are stored in adjacency matrices. Hence NE is to map every network vertex into a low-dimensional vector that preserves adjacency matrix information as much as possible. If there is some similarity between two objects, NE would represents them by similar representations, such as vectors that have low Euclidean distance [Luo et al.2011], or certain dot product values [Tang et al.2015, Ou et al.2016, Liu et al.2019]. Meanwhile, to emphasize the similarity, methods such as negative sampling are proposed to dispart the representations of objects that seem to be different. In a word, similar things would possess similar representations, and vice versa [Liu et al.2019]. In this work, network vertex embeddings are learned in two steps: first defining and calculating the similarity, then preserving it into the node embeddings.
The core problem we focus on is how to search and define the similarity between nodes. First-order proximity acts as a straightforward approach that merely considers two endpoints of an edge to be similar, and was researched by previous graph embedding methods [Belkin and Niyogi2003]. To supplement this, higher-order proximity was proposed. LINE [Tang et al.2015] has defined second-order proximity as relationships that there exist common neightbors between two unconnected nodes. Noting that common neighbors are also the intermediate vertices of length-two shortest paths. Intuitively, -order proximity could be analogously defined as the relations that there are shortest paths of length between nodes, and higher-order indicates a weaker relationship.
power of adjacency matrices or probability transition matrices as-order proximity matrices. It’s worth noting that in unweighted networks, the values of nonzero elements in matrices state the number of -step walks between vertices, which may contain repetitive edges. As shown in Figure 1, through a two- or three-step walk, one can return back to the start point, or reach immediate neighbor. Therefore, defining -order proximity as -step walks is not accurate and well-designed enough. Actually, the power of adjacency matrices contains a mixture of proximity no more than order.
Consequently, we propose Enhanced Proximity Information Network Embedding (EPINE), a novel approach to redefine the high-order proximity, and determine the strength of -order proximity by the number of length- shortest paths between nodes and weights along with the paths. With regard to calculation, we develope a novel algorithm that has the same complexity as the powers of adjacency matrices, which alleviates the scalability problem. Due to the close relation to edge weights in calculation, for unweighted networks, we propose to assess every edge weight by the degrees of two endpoints.
In addition, weight information carried by edges can be regarded as another metric to evaluate the node similarity. However, due to the lack of weight information stored in network datasets, we might have to treat every edge equally, which is out of line with the reality. Take social networks as an example, friends may fall into five categories: bosom friends, good friends, ordinary friends, acquaintances and prequaintances. The strength of friendships among them decrease progressively, but edges are of equal importance in most social network datasets.
Eventually, the similarity measurement in EPINE is distance-based (and structure-based for unweighted networks, where node degrees as structural information), and we name it as EPINE similarity.
In summary, our contributions are as follows:
We redefine the high-order proximity in a more accurate and intuitive manner, and propose a novel approach for calculation that alleviates the scalability problem.
We conduct comprehensive experiments on real-world network datasets. Experimental results demonstrate the effectiveness of the proposed EPINE.
2 Preliminaries and Related Work
2.1 Notations and Definitions
A network is denoted as , where is the node set and is the edge set. in has a binary value that indicates the existence of an edge from node to node . The weight is equal to in unweighted networks and a non-negative value in weighted ones.
The adjacency matrix is defined as . is the degree of node , and the diagonal degree matrix has the element . The (one-step) probability transition matrix, also called normalized adjacency matrix, is obtained by .
In graph theory, a walk consists of an alternating sequence of vertices and edges that begins and ends with a vertex. A path is a walk without repeated vertices.
We differentiate the general -order () proximity formally defined in [Zhang et al.2018a] and our redefined one in the following.
(Vanilla -order Proximity). It would be a vanilla -order proximity relationship between two nodes if and only if there exists at least one walk of length between them.
We depict an unweighted and undirected ego network in Figure 1, and denote the power of adjacency matrix as . Note that node are all immediate neighbors of node , but , , , and . Actually, each nonzero element of denotes a vanilla -order proximity, which is shown to be not intuitive and accurate enough. Consequently, we proposed to redefine it as follow:
(Rectified -order Proximity). Two nodes have a rectified -order proximity relationship if and only if there exists at least one shortest path of length between them.
Accordingly, we denote the rectified -order proximity matrix as , where positive elements represent rectified -order proximity between nodes.
In a sense, vanilla proximity is an approximate form of rectified proximity, because every shortest path of length is also a -step walk, which also consists in .
2.2 Related Work
2.2.1 Preserving High-order Proximity
Almost every network embedding method would preserve first-order proximity, while higher-order proximity acts as complementary and global information of networks, and is explored by a bunch of methods.
DeepWalk [Perozzi et al.2014] implicitly preserved proximity no more than order, where is the window size, and lower orders have higher weights. It can also be interpreted as factorizing a matrix [Yang et al.2015], where
There are extensions and improvements of DeepWalk. Node2vec [Grover and Leskovec2016] substituted random walks by breadth-first and depth-first walks. Walklets [Perozzi et al.2017] replaced the adjacency matrix used in DeepWalk by one or more different powers of . GraRep [Cao et al.2015] obtained vertex embeddings by separately factorizing and concatenating the results at last. HOPE [Ou et al.2016] constructed a framework that built node embeddings according to high-order proximity measurements, including Katz Index, Rooted PageRank, Common Neighbors and Adamic-Adar, which are all determined by matrix-chain multiplications of or . AROPE [Zhang et al.2018b] proposed to exploit arbitrary-order proximity by factorizing a matrix , where
and allow if the summation converges.
Algorithms mentioned above derive high-order proximity more or less from the power of or , which is actually the vanilla high-order proximity.
Besides, SDNE [Wang et al.2016] preserved rectified second-order proximity by reconstructing adjacency matrices, and retained first-order proximity by a regularization term derived from laplacian eigenmaps [Belkin and Niyogi2003]. LINE [Tang et al.2015] proposed to represent and restore first-order and second-order proximity in content and context representations respectively. Compared with our proposed method, they can only calculate fixed-order proximity, rather than arbitrary-order proximity.
2.2.2 Capturing Edge Information
Existing NE methods that consider edge information [Tu et al.2017, Goyal et al.2018, Chen et al.2018] generally rely on the intrinsic edge information stored in network datasets. Instead, we derive edge weights from node degrees, which is independent of intrinsic edge information, and applicable to multifarious networks.
3 Enhanced Proximity Information Network Embedding
In this section, we describe how to calculate accurate -order proximity matrix in detail.
Suppose where is a length- walk between and , then
is the chain multiplication of edge weights along the walk .
In unweighted networks, every edge weight is set to 1, that is, and states the number of -step walks between and .
We discover that -step walks contain all -length shortest paths, hence we can extract the latter through a second-order deterministic process.
3.1 Calculating the Rectified Proximity
For the convenience of statement, we first define the -reachable relationship.
(). Node is -reachable from if and only if there exists a shortest path of length from to .
In the -th row of , each positive element stands for a -reachable node of . Then we can calculate the rectified -order proximity based on the following theorem:
The matrix product contains and only contains rectified proximity of order (k-1), k and (k+1). Such (k+1)-order proximity composes the .
Proof. Post-multiplying by actually performs one-step walks starting from every node. As illustrated in Figure 2(a), the grey, blue and yellow circle denote the reachability of , and , respectively. Suppose node is -reachable from node . Moving one step from node would only result in three categories of situations: moving backwards to the grey circle, staying in the blue, or walking farther to the yellow, which all generate -length walks and are actually rectified proximity of order , and respectively.
On the other hand, any -length shortest path consists of a -length shortest path and another one edge, hence walking farther from the blue circle to the yellow forms all -length shortest paths that compose .
Consequently, we can extract -order proximity by removing rectified proximity of order and from .
By Definition 2, any two rectified proximity matrices are mutually disjoint, which means if or is nonzero, then . That is, if we have and , we could accurately calculate via applying masking to , which can be calculated by
Such a calculating method could be regarded as a second-order deterministic process, which is described in Algorithm 1. As stated in line 1, we extend the Definition 2 and define as , which treats self-loops as rectified -order proximity. Then we can obtain through
where represents matrix multiplication (discussed later in Section 3.2), means Hadamard product, and calculates the mask based on Equation (5). It’s worth noting that in practice, the above-mentioned always has a finite value, which depends on the longest shortest paths in networks. Consequently, the algorithm should early stop if and only if
in line 8 is a zero matrix.
3.2 Rethinking the Matrix Multiplication
Note that the cost is obtained by the chain multiplication of edge weights along the path. By definition, edge weights are always larger than 1, which would cause very large path costs. Such large path costs might not be discriminative and effective enough in practice.
Among the sum, mean and max aggregators, sum has the best discriminative power [Xu et al.2019]. Hence we resort to additive operation, that is, we could calculate path costs through
which also alleviates the explosion of path costs. For the sake of calculating Equation (9), we propose to adopt additive matrix multiplication to , which could be formalized as (suppose ):
where is the column and row number of the matrix and respectively. is a indicator function that
3.3 Calculating Edge Weights
Due to the close relation to edge weights in Equation (9), we calculate edge weights as inputs to Algorithm 1 for unweighted networks. Intuitively, edges connected to low-degree nodes would be more decisive, which is in accordance with the degree penalty principle [Feng et al.2018]. Hence we could evaluate edge weights simply by
3.4 The Proposed EPINE Similarity
Eventually, we define the EPINE similarity matrix as
As discussed in Section 1, higher-order proximity indicates a weaker relationship, which is in line with the exponentially decaying weights of Katz similarity [Katz1953]. Hence we propose to decay weights of rectified high-order proximity by
where is the decay coefficient and returns the maximum element of the input matrix.
As researched by previous work [Perozzi et al.2014, Feng et al.2018], the degree distribution of networks probably follows the power law. It suggests there might be only a few elements of have overlarge values, and divided by them in Equation (14) would degenerate the information carried by . Consequently, before calculating Equation (13), we truncate the largest elements of to the value of the element right smaller than them.
3.5 Learning the Network Embedding
As could be regarded as a weighted adjacency matrix, we apply LINE [Tang et al.2015] — a scalable method suitable for undirected, directed, and/or weighted networks — to preserving this similarity information into vertex embeddings. To be specific, we input the EPINE similarity matrix into the LINE(1st) and LINE(2nd) so as to learn node embeddings.
Complexity. In consideration of efficiency problems, we adopt sparse implementation for EPINE. In Algorithm 1, matrix multiplication has a time complexity of , where is the average node degree of networks. Masking step (line 6-7) takes time. The time cost for edge weight calculation and LINE are both . Eventually, the overall time complexity of EPINE is , where is usually set to 2 in practice.
Online learning. For any specific network, we only have to calculate once. When a new node arrives, we can calculate its similarity with existing nodes in time, and obtain its embedding through LINE in time, with embeddings of existing nodes unchanged.
In this section, we demonstrate the effectiveness of our method in three downstream machine learning tasks: network reconstruction, link prediction and node classification.
We conduct experiments on four networks. The statistics of them are listed in Table 1. Wikipedia is weighted, others are unweighted.
Wikipedia [Mahoney2011]: A language network extracted from Wikipedia. The weight of each edge represents the number of co-occurrences between two words. Labels represent the Part-of-Speech (POS) tags inferred using the Stanford POS-Tagger.
4.2 Baselines and Parameter Settings
In experiments, we compare our method with several baselines that are competitive or preserve high-order proximity.
DeepWalk [Perozzi et al.2014] implicitly preserves high-order proximity and is a competitive method applicable to diverse networks.
LINE [Tang et al.2015] contains two methods that preserves first-order and second-order proximity, which are called LINE(1st) and LINE(2nd) respectively. Besides, LINE(1st+2nd) concatenates the result embeddings of them, and LINE(rc) reconstructs networks by adding vanilla second-order neighbors to nodes’ neighbors. In experiments, we report the best results of them.
GraRep [Cao et al.2015] accurately calculate the vanilla high-order proximity.
node2vec [Grover and Leskovec2016] extends DeepWalk by breadth-first and depth-first walk strategies.
AROPE [Zhang et al.2018b] preserves vanilla arbitrary-order proximity via efficient matrix factorization.
For all methods except GraRep, the dimension of learned embeddings is set to 128. The remaining unspecified parameters are set to the values recommended by paper authors or manually finetuned to the best.
Similar to LINE, we have EPINE(1st), EPINE(2nd) and EPINE(1st+2nd). For all experiments, we set and for BlogCatalog, for others. It is not easy to select the best automatically, but always yields best performance.111We use for BlogCatalog and Flickr, for Wikipedia and YouTube respectively. and number of training samples are positively related to the density and edge number of networks respectively. Most of the rest of parameters are the same as LINE.
4.3 Network Topological Information Preserving
Tasks of network reconstruction and link prediction evaluate if node embeddings preserve the network topological structure information, which is the most basic goal of NE. As in [Shi et al.2019], we represent each edge by concatenating the embeddings of two endpoints.
For network reconstruction, we randomly sample 80% connected edges and the same number of unconnected edges to train the LIBLINEAR classifiers. The rest of connected edges and the same number of unconnected edges are utilized as test samples.
For link prediction, we randomly remove 40% of the edges without breaking the connectivity of the network. After network representation learning, we use the existing edges and the same number of originally unconnected edges as training samples, the removed links and the same number of originally unconnected links as test samples.
The results on BlogCatalog are reported in Table 2. EPINE outperforms others in network reconstruction and reaches comparable performance with GraRep in link prediction. Compared with GraRep, EPINE is more scalable.
4.4 Network Semantic Information Preserving
The task node classification is to predict the node categories based on node representations. It evaluates to what extent node representations preserve the high-level semantic information of networks. In experiments, node embeddings are fed directly into the LIBLINEAR classifiers. We use 90% labeled nodes as training samples, and 10% as test ones. The results are reported in table 4.222We exclude some of the baseline results due to efficiency problems or memory errors. The server we use has two Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz and 320G memory. All the values are the average of several runs for the sake of result stability.333500, 250, 20, 50 runs for Wikipedia, BlogCatalog, Flickr and YouTube respectively. EPINE achieves comparable performance on YouTube and outperforms all baselines on others.
4.5 Different Orders of Rectified Proximity
We set different for node classification and record the results in Figure 3. The and for Wikipedia and BlogCatalog are both all zeros. The s for Flickr and YouTube are too dense to efficiently learn node embeddings. In spite of this, we can see that only second order rectified proximity brings significant improvements, hence we could simply set in practice.
|2||+ rectified second-order||0.5852||0.4358|
|4||+ truncating (EPINE)||0.5950||0.4453|
4.6 Ablilation Studies
Take LINE as base, we construct EPINE step by step, and report Micro-F1 results of the node classification at each step in Table 4. Row 2a and 3a does not belong to the construction. Reweighting (row 1) or rectified second-order proximity alone (row 2a) would degenerate the performance, but once we combine them (row 2), the Micro-F1 would slightly exceed LINE. Then we substitute normal matrix multiplication by the additive one, evident improvement occurs on Wikipedia (comparing row 2 and 3). At this step, if we remove masking, that is, substitute rectified proximity by the vanilla one (row 3a), performance will get worse on Wikipedia, but become better on BlogCatalog. This is because BlogCatalog is a social network and vanilla second-order proximity strengthen the importance of edges that form triangular structures (see Figure 1(a)), which is a special case.
In this work, we propose EPINE, a novel approach that further exploits the information carried by adjacency matrices. To be specific, EPINE provides a feasible way for preserving edge weight information into node embeddings, and a scalable way to accurately calculate the high-order proximity, which allows studying the effect of specific -order proximity. Comprehensive experiments demonstrate the effectiveness of our method. Enhanced proximity information makes improvements.
In the future, we will focus on searching better methods for edge reweighting and EPINE similarity preserving.
We are grateful to Chongjun Wang for his fruitful comments and advice.
- [Belkin and Niyogi2003] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
- [Cao et al.2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 891–900. ACM, 2015.
- [Chen et al.2018] Haochen Chen, Xiaofei Sun, Yingtao Tian, Bryan Perozzi, Muhao Chen, and Steven Skiena. Enhanced network embeddings via exploiting edge labels. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1579–1582. ACM, 2018.
[Feng et al.2018]
Rui Feng, Yang Yang, Wenjie Hu, Fei Wu, and Yueting Zhang.
Representation learning for scale-free networks.
Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- [Goyal et al.2018] Palash Goyal, Homa Hosseinmardi, Emilio Ferrara, and Aram Galstyan. Capturing edge attributes via network embedding. IEEE Transactions on Computational Social Systems, 5(4):907–917, 2018.
- [Grover and Leskovec2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864. ACM, 2016.
- [Katz1953] Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, 1953.
- [Liu et al.2019] Xin Liu, Tsuyoshi Murata, Kyoung-Sook Kim, Chatchawan Kotarasu, and Chenyi Zhuang. A general view for network embedding as matrix factorization. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 375–383. ACM, 2019.
- [Luo et al.2011] Dijun Luo, Feiping Nie, Heng Huang, and Chris H Ding. Cauchy graph embedding. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 553–560, 2011.
- [Mahoney2011] Matt Mahoney. Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. html, 2011.
- [Ou et al.2016] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1105–1114. ACM, 2016.
- [Perozzi et al.2014] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
- [Perozzi et al.2017] Bryan Perozzi, Vivek Kulkarni, Haochen Chen, and Steven Skiena. Don’t walk, skip!: online learning of multi-scale network embeddings. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pages 258–265. ACM, 2017.
[Shi et al.2019]
Wei Shi, Ling Huang, Chang-Dong Wang, Juan-Hui Li, Yong Tang, and Chengzhou Fu.
Network embedding via community based variational autoencoder.IEEE Access, 7:25323–25333, 2019.
- [Tang and Liu2009a] Lei Tang and Huan Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817–826. ACM, 2009.
- [Tang and Liu2009b] Lei Tang and Huan Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107–1116. ACM, 2009.
- [Tang et al.2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, pages 1067–1077. International World Wide Web Conferences Steering Committee, 2015.
- [Tu et al.2017] Cunchao Tu, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. Transnet: Translation-based network representation learning for social relation extraction. In IJCAI, pages 2864–2870, 2017.
- [Wang et al.2016] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1225–1234. ACM, 2016.
[Xu et al.2019]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka.
How powerful are graph neural networks?In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
- [Yang et al.2015] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. Network representation learning with rich text information. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
- [Zhang et al.2018a] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Network representation learning: A survey. IEEE transactions on Big Data, 2018.
- [Zhang et al.2018b] Ziwei Zhang, Peng Cui, Xiao Wang, Jian Pei, Xuanrong Yao, and Wenwu Zhu. Arbitrary-order proximity preserved network embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2778–2786. ACM, 2018.