1 Introduction
We live in a world where many things form network structures, including social networks, citation networks and word cooccurrence networks. Under the circumstances, network embedding is proposed to process and exploit the network data, which facilitates downstream machine learning tasks such as network reconstruction, link prediction and node classification.
Most information contained in ubiquitous networks is reflected by the proximity between nodes, or weights of edges, which are stored in adjacency matrices. Hence NE is to map every network vertex into a lowdimensional vector that preserves adjacency matrix information as much as possible. If there is some similarity between two objects, NE would represents them by similar representations, such as vectors that have low Euclidean distance [Luo et al.2011], or certain dot product values [Tang et al.2015, Ou et al.2016, Liu et al.2019]. Meanwhile, to emphasize the similarity, methods such as negative sampling are proposed to dispart the representations of objects that seem to be different. In a word, similar things would possess similar representations, and vice versa [Liu et al.2019]. In this work, network vertex embeddings are learned in two steps: first defining and calculating the similarity, then preserving it into the node embeddings.
The core problem we focus on is how to search and define the similarity between nodes. Firstorder proximity acts as a straightforward approach that merely considers two endpoints of an edge to be similar, and was researched by previous graph embedding methods [Belkin and Niyogi2003]. To supplement this, higherorder proximity was proposed. LINE [Tang et al.2015] has defined secondorder proximity as relationships that there exist common neightbors between two unconnected nodes. Noting that common neighbors are also the intermediate vertices of lengthtwo shortest paths. Intuitively, order proximity could be analogously defined as the relations that there are shortest paths of length between nodes, and higherorder indicates a weaker relationship.
However, many previous network embedding methods [Cao et al.2015, Perozzi et al.2017, Zhang et al.2018b] proposed to treat the
power of adjacency matrices or probability transition matrices as
order proximity matrices. It’s worth noting that in unweighted networks, the values of nonzero elements in matrices state the number of step walks between vertices, which may contain repetitive edges. As shown in Figure 1, through a two or threestep walk, one can return back to the start point, or reach immediate neighbor. Therefore, defining order proximity as step walks is not accurate and welldesigned enough. Actually, the power of adjacency matrices contains a mixture of proximity no more than order.Consequently, we propose Enhanced Proximity Information Network Embedding (EPINE), a novel approach to redefine the highorder proximity, and determine the strength of order proximity by the number of length shortest paths between nodes and weights along with the paths. With regard to calculation, we develope a novel algorithm that has the same complexity as the powers of adjacency matrices, which alleviates the scalability problem. Due to the close relation to edge weights in calculation, for unweighted networks, we propose to assess every edge weight by the degrees of two endpoints.
In addition, weight information carried by edges can be regarded as another metric to evaluate the node similarity. However, due to the lack of weight information stored in network datasets, we might have to treat every edge equally, which is out of line with the reality. Take social networks as an example, friends may fall into five categories: bosom friends, good friends, ordinary friends, acquaintances and prequaintances. The strength of friendships among them decrease progressively, but edges are of equal importance in most social network datasets.
Eventually, the similarity measurement in EPINE is distancebased (and structurebased for unweighted networks, where node degrees as structural information), and we name it as EPINE similarity.
In summary, our contributions are as follows:

We redefine the highorder proximity in a more accurate and intuitive manner, and propose a novel approach for calculation that alleviates the scalability problem.

We conduct comprehensive experiments on realworld network datasets. Experimental results demonstrate the effectiveness of the proposed EPINE.
2 Preliminaries and Related Work
2.1 Notations and Definitions
A network is denoted as , where is the node set and is the edge set. in has a binary value that indicates the existence of an edge from node to node . The weight is equal to in unweighted networks and a nonnegative value in weighted ones.
The adjacency matrix is defined as . is the degree of node , and the diagonal degree matrix has the element . The (onestep) probability transition matrix, also called normalized adjacency matrix, is obtained by .
In graph theory, a walk consists of an alternating sequence of vertices and edges that begins and ends with a vertex. A path is a walk without repeated vertices.
We differentiate the general order () proximity formally defined in [Zhang et al.2018a] and our redefined one in the following.
Definition 1.
(Vanilla order Proximity). It would be a vanilla order proximity relationship between two nodes if and only if there exists at least one walk of length between them.
We depict an unweighted and undirected ego network in Figure 1, and denote the power of adjacency matrix as . Note that node are all immediate neighbors of node , but , , , and . Actually, each nonzero element of denotes a vanilla order proximity, which is shown to be not intuitive and accurate enough. Consequently, we proposed to redefine it as follow:
Definition 2.
(Rectified order Proximity). Two nodes have a rectified order proximity relationship if and only if there exists at least one shortest path of length between them.
Accordingly, we denote the rectified order proximity matrix as , where positive elements represent rectified order proximity between nodes.
In a sense, vanilla proximity is an approximate form of rectified proximity, because every shortest path of length is also a step walk, which also consists in .
2.2 Related Work
2.2.1 Preserving Highorder Proximity
Almost every network embedding method would preserve firstorder proximity, while higherorder proximity acts as complementary and global information of networks, and is explored by a bunch of methods.
DeepWalk [Perozzi et al.2014] implicitly preserved proximity no more than order, where is the window size, and lower orders have higher weights. It can also be interpreted as factorizing a matrix [Yang et al.2015], where
(1) 
There are extensions and improvements of DeepWalk. Node2vec [Grover and Leskovec2016] substituted random walks by breadthfirst and depthfirst walks. Walklets [Perozzi et al.2017] replaced the adjacency matrix used in DeepWalk by one or more different powers of . GraRep [Cao et al.2015] obtained vertex embeddings by separately factorizing and concatenating the results at last. HOPE [Ou et al.2016] constructed a framework that built node embeddings according to highorder proximity measurements, including Katz Index, Rooted PageRank, Common Neighbors and AdamicAdar, which are all determined by matrixchain multiplications of or . AROPE [Zhang et al.2018b] proposed to exploit arbitraryorder proximity by factorizing a matrix , where
(2) 
and allow if the summation converges.
Algorithms mentioned above derive highorder proximity more or less from the power of or , which is actually the vanilla highorder proximity.
Besides, SDNE [Wang et al.2016] preserved rectified secondorder proximity by reconstructing adjacency matrices, and retained firstorder proximity by a regularization term derived from laplacian eigenmaps [Belkin and Niyogi2003]. LINE [Tang et al.2015] proposed to represent and restore firstorder and secondorder proximity in content and context representations respectively. Compared with our proposed method, they can only calculate fixedorder proximity, rather than arbitraryorder proximity.
2.2.2 Capturing Edge Information
Existing NE methods that consider edge information [Tu et al.2017, Goyal et al.2018, Chen et al.2018] generally rely on the intrinsic edge information stored in network datasets. Instead, we derive edge weights from node degrees, which is independent of intrinsic edge information, and applicable to multifarious networks.
3 Enhanced Proximity Information Network Embedding
In this section, we describe how to calculate accurate order proximity matrix in detail.
Suppose where is a length walk between and , then
(3) 
where
(4) 
is the chain multiplication of edge weights along the walk .
In unweighted networks, every edge weight is set to 1, that is, and states the number of step walks between and .
We discover that step walks contain all length shortest paths, hence we can extract the latter through a secondorder deterministic process.
3.1 Calculating the Rectified Proximity
For the convenience of statement, we first define the reachable relationship.
Definition 3.
(). Node is reachable from if and only if there exists a shortest path of length from to .
In the th row of , each positive element stands for a reachable node of . Then we can calculate the rectified order proximity based on the following theorem:
Theorem 1.
The matrix product contains and only contains rectified proximity of order (k1), k and (k+1). Such (k+1)order proximity composes the .
Proof. Postmultiplying by actually performs onestep walks starting from every node. As illustrated in Figure 2(a), the grey, blue and yellow circle denote the reachability of , and , respectively. Suppose node is reachable from node . Moving one step from node would only result in three categories of situations: moving backwards to the grey circle, staying in the blue, or walking farther to the yellow, which all generate length walks and are actually rectified proximity of order , and respectively.
On the other hand, any length shortest path consists of a length shortest path and another one edge, hence walking farther from the blue circle to the yellow forms all length shortest paths that compose .
Consequently, we can extract order proximity by removing rectified proximity of order and from .
By Definition 2, any two rectified proximity matrices are mutually disjoint, which means if or is nonzero, then . That is, if we have and , we could accurately calculate via applying masking to , which can be calculated by
(5) 
Such a calculating method could be regarded as a secondorder deterministic process, which is described in Algorithm 1. As stated in line 1, we extend the Definition 2 and define as , which treats selfloops as rectified order proximity. Then we can obtain through
(6)  
where represents matrix multiplication (discussed later in Section 3.2), means Hadamard product, and calculates the mask based on Equation (5). It’s worth noting that in practice, the abovementioned always has a finite value, which depends on the longest shortest paths in networks. Consequently, the algorithm should early stop if and only if
in line 8 is a zero matrix.
3.2 Rethinking the Matrix Multiplication
If we apply normal matrix multiplication as the implementation of the function , similar to Equation (3) and (4), the weight (or cost) of a path would be
(7) 
then
(8) 
Note that the cost is obtained by the chain multiplication of edge weights along the path. By definition, edge weights are always larger than 1, which would cause very large path costs. Such large path costs might not be discriminative and effective enough in practice.
Among the sum, mean and max aggregators, sum has the best discriminative power [Xu et al.2019]. Hence we resort to additive operation, that is, we could calculate path costs through
(9) 
which also alleviates the explosion of path costs. For the sake of calculating Equation (9), we propose to adopt additive matrix multiplication to , which could be formalized as (suppose ):
(10) 
where is the column and row number of the matrix and respectively. is a indicator function that
(11) 
3.3 Calculating Edge Weights
Due to the close relation to edge weights in Equation (9), we calculate edge weights as inputs to Algorithm 1 for unweighted networks. Intuitively, edges connected to lowdegree nodes would be more decisive, which is in accordance with the degree penalty principle [Feng et al.2018]. Hence we could evaluate edge weights simply by
(12) 
3.4 The Proposed EPINE Similarity
Eventually, we define the EPINE similarity matrix as
(13) 
As discussed in Section 1, higherorder proximity indicates a weaker relationship, which is in line with the exponentially decaying weights of Katz similarity [Katz1953]. Hence we propose to decay weights of rectified highorder proximity by
(14) 
where is the decay coefficient and returns the maximum element of the input matrix.
As researched by previous work [Perozzi et al.2014, Feng et al.2018], the degree distribution of networks probably follows the power law. It suggests there might be only a few elements of have overlarge values, and divided by them in Equation (14) would degenerate the information carried by . Consequently, before calculating Equation (13), we truncate the largest elements of to the value of the element right smaller than them.
3.5 Learning the Network Embedding
As could be regarded as a weighted adjacency matrix, we apply LINE [Tang et al.2015] — a scalable method suitable for undirected, directed, and/or weighted networks — to preserving this similarity information into vertex embeddings. To be specific, we input the EPINE similarity matrix into the LINE(1st) and LINE(2nd) so as to learn node embeddings.
3.6 Discussions
Complexity. In consideration of efficiency problems, we adopt sparse implementation for EPINE. In Algorithm 1, matrix multiplication has a time complexity of , where is the average node degree of networks. Masking step (line 67) takes time. The time cost for edge weight calculation and LINE are both . Eventually, the overall time complexity of EPINE is , where is usually set to 2 in practice.
Online learning. For any specific network, we only have to calculate once. When a new node arrives, we can calculate its similarity with existing nodes in time, and obtain its embedding through LINE in time, with embeddings of existing nodes unchanged.
4 Experiments
In this section, we demonstrate the effectiveness of our method in three downstream machine learning tasks: network reconstruction, link prediction and node classification.
4.1 Datasets
We conduct experiments on four networks. The statistics of them are listed in Table 1. Wikipedia is weighted, others are unweighted.
Name  #Nodes  #Edges 

#Labels  

Wikipedia  4,777  184,812  38.69  40  
BlogCatalog  10,312  333,983  64.78  39  
Flickr  80,513  5,899,882  146.56  195  
YouTube  1,138,499  2,990,443  5.25  47 

Wikipedia [Mahoney2011]: A language network extracted from Wikipedia. The weight of each edge represents the number of cooccurrences between two words. Labels represent the PartofSpeech (POS) tags inferred using the Stanford POSTagger.

BlogCatalog, Flickr [Tang and Liu2009a], YouTube [Tang and Liu2009b]: Social networks that edges indicate friendships between users, labels represent blogger interests, user groups and user groups respectively.
4.2 Baselines and Parameter Settings
In experiments, we compare our method with several baselines that are competitive or preserve highorder proximity.

DeepWalk [Perozzi et al.2014] implicitly preserves highorder proximity and is a competitive method applicable to diverse networks.

LINE [Tang et al.2015] contains two methods that preserves firstorder and secondorder proximity, which are called LINE(1st) and LINE(2nd) respectively. Besides, LINE(1st+2nd) concatenates the result embeddings of them, and LINE(rc) reconstructs networks by adding vanilla secondorder neighbors to nodes’ neighbors. In experiments, we report the best results of them.

GraRep [Cao et al.2015] accurately calculate the vanilla highorder proximity.

node2vec [Grover and Leskovec2016] extends DeepWalk by breadthfirst and depthfirst walk strategies.

SDNE [Wang et al.2016]
simultaneously preserve rectified firstorder and secondorder proximity with the help of deep learning.

AROPE [Zhang et al.2018b] preserves vanilla arbitraryorder proximity via efficient matrix factorization.
For all methods except GraRep, the dimension of learned embeddings is set to 128. The remaining unspecified parameters are set to the values recommended by paper authors or manually finetuned to the best.
Similar to LINE, we have EPINE(1st), EPINE(2nd) and EPINE(1st+2nd). For all experiments, we set and for BlogCatalog, for others. It is not easy to select the best automatically, but always yields best performance.^{1}^{1}1We use for BlogCatalog and Flickr, for Wikipedia and YouTube respectively. and number of training samples are positively related to the density and edge number of networks respectively. Most of the rest of parameters are the same as LINE.
Method 




DeepWalk  0.9513  0.9434  
LINE  0.9511  0.9491  
GraRep  0.9555  0.9556  
node2vec  0.9467  0.9420  
SDNE  0.9510  0.9484  
AROPE  0.9488  0.9437  
EPINE(1st)  0.7743  0.7745  
EPINE(2nd)  0.9605  0.9547  
EPINE(1st+2nd)  0.9609  0.9556 
Method  Wikipedia  BlogCatalog  Flickr  YouTube  
MicroF1  MacroF1  MicroF1  MacroF1  MicroF1  MacroF1  MicroF1  MacroF1  
DeepWalk  0.5030  0.1030  0.4274  0.2865  0.4216  0.3035  0.3028  0.2115 
LINE  0.5831  0.1731  0.4327  0.2919  0.4273  0.3162  0.3086  0.2285 
GraRep  0.5422  0.1244  0.4265  0.2875  —  —  —  — 
node2vec  0.5488  0.1243  0.4133  0.2769  0.4119  0.2872  0.3083  0.2209 
SDNE  0.4351  0.0649  0.3047  0.1319  0.3480  0.2054  —  — 
AROPE  0.5439  0.1488  0.3391  0.1723  0.3136  0.1587  —  — 
EPINE(1st)  0.5465  0.1224  0.4467  0.3068  0.4162  0.2914  0.3021  0.2016 
EPINE(2nd)  0.5971  0.1863  0.4396  0.3087  0.4273  0.3071  0.3106  0.2223 
EPINE(1st+2nd)  0.5950  0.1773  0.4450  0.3143  0.4350  0.3237  0.3089  0.2262 
4.3 Network Topological Information Preserving
Tasks of network reconstruction and link prediction evaluate if node embeddings preserve the network topological structure information, which is the most basic goal of NE. As in [Shi et al.2019], we represent each edge by concatenating the embeddings of two endpoints.
For network reconstruction, we randomly sample 80% connected edges and the same number of unconnected edges to train the LIBLINEAR classifiers. The rest of connected edges and the same number of unconnected edges are utilized as test samples.
For link prediction, we randomly remove 40% of the edges without breaking the connectivity of the network. After network representation learning, we use the existing edges and the same number of originally unconnected edges as training samples, the removed links and the same number of originally unconnected links as test samples.
The results on BlogCatalog are reported in Table 2. EPINE outperforms others in network reconstruction and reaches comparable performance with GraRep in link prediction. Compared with GraRep, EPINE is more scalable.
4.4 Network Semantic Information Preserving
The task node classification is to predict the node categories based on node representations. It evaluates to what extent node representations preserve the highlevel semantic information of networks. In experiments, node embeddings are fed directly into the LIBLINEAR classifiers. We use 90% labeled nodes as training samples, and 10% as test ones. The results are reported in table 4.^{2}^{2}2We exclude some of the baseline results due to efficiency problems or memory errors. The server we use has two Intel(R) Xeon(R) CPU E52630 v4 @ 2.20GHz and 320G memory. All the values are the average of several runs for the sake of result stability.^{3}^{3}3500, 250, 20, 50 runs for Wikipedia, BlogCatalog, Flickr and YouTube respectively. EPINE achieves comparable performance on YouTube and outperforms all baselines on others.
4.5 Different Orders of Rectified Proximity
We set different for node classification and record the results in Figure 3. The and for Wikipedia and BlogCatalog are both all zeros. The s for Flickr and YouTube are too dense to efficiently learn node embeddings. In spite of this, we can see that only second order rectified proximity brings significant improvements, hence we could simply set in practice.
Calculation Step 



LINE  0.5831  0.4327  
1  + reweighting  —  0.3861  
2  + rectified secondorder  0.5852  0.4358  
2a  w/o reweighting  —  0.4200  
3  + adddot  0.5942  0.4359  
3a  rectified vanilla  0.5827  0.4416  
4  + truncating (EPINE)  0.5950  0.4453  

4.6 Ablilation Studies
Take LINE as base, we construct EPINE step by step, and report MicroF1 results of the node classification at each step in Table 4. Row 2a and 3a does not belong to the construction. Reweighting (row 1) or rectified secondorder proximity alone (row 2a) would degenerate the performance, but once we combine them (row 2), the MicroF1 would slightly exceed LINE. Then we substitute normal matrix multiplication by the additive one, evident improvement occurs on Wikipedia (comparing row 2 and 3). At this step, if we remove masking, that is, substitute rectified proximity by the vanilla one (row 3a), performance will get worse on Wikipedia, but become better on BlogCatalog. This is because BlogCatalog is a social network and vanilla secondorder proximity strengthen the importance of edges that form triangular structures (see Figure 1(a)), which is a special case.
5 Conclusions
In this work, we propose EPINE, a novel approach that further exploits the information carried by adjacency matrices. To be specific, EPINE provides a feasible way for preserving edge weight information into node embeddings, and a scalable way to accurately calculate the highorder proximity, which allows studying the effect of specific order proximity. Comprehensive experiments demonstrate the effectiveness of our method. Enhanced proximity information makes improvements.
In the future, we will focus on searching better methods for edge reweighting and EPINE similarity preserving.
Acknowledgments
We are grateful to Chongjun Wang for his fruitful comments and advice.
References
 [Belkin and Niyogi2003] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
 [Cao et al.2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 891–900. ACM, 2015.
 [Chen et al.2018] Haochen Chen, Xiaofei Sun, Yingtao Tian, Bryan Perozzi, Muhao Chen, and Steven Skiena. Enhanced network embeddings via exploiting edge labels. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1579–1582. ACM, 2018.

[Feng et al.2018]
Rui Feng, Yang Yang, Wenjie Hu, Fei Wu, and Yueting Zhang.
Representation learning for scalefree networks.
In
ThirtySecond AAAI Conference on Artificial Intelligence
, 2018.  [Goyal et al.2018] Palash Goyal, Homa Hosseinmardi, Emilio Ferrara, and Aram Galstyan. Capturing edge attributes via network embedding. IEEE Transactions on Computational Social Systems, 5(4):907–917, 2018.
 [Grover and Leskovec2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864. ACM, 2016.
 [Katz1953] Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, 1953.
 [Liu et al.2019] Xin Liu, Tsuyoshi Murata, KyoungSook Kim, Chatchawan Kotarasu, and Chenyi Zhuang. A general view for network embedding as matrix factorization. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 375–383. ACM, 2019.
 [Luo et al.2011] Dijun Luo, Feiping Nie, Heng Huang, and Chris H Ding. Cauchy graph embedding. In Proceedings of the 28th International Conference on Machine Learning (ICML11), pages 553–560, 2011.
 [Mahoney2011] Matt Mahoney. Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. html, 2011.
 [Ou et al.2016] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1105–1114. ACM, 2016.
 [Perozzi et al.2014] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
 [Perozzi et al.2017] Bryan Perozzi, Vivek Kulkarni, Haochen Chen, and Steven Skiena. Don’t walk, skip!: online learning of multiscale network embeddings. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pages 258–265. ACM, 2017.

[Shi et al.2019]
Wei Shi, Ling Huang, ChangDong Wang, JuanHui Li, Yong Tang, and Chengzhou Fu.
Network embedding via community based variational autoencoder.
IEEE Access, 7:25323–25333, 2019.  [Tang and Liu2009a] Lei Tang and Huan Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817–826. ACM, 2009.
 [Tang and Liu2009b] Lei Tang and Huan Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107–1116. ACM, 2009.
 [Tang et al.2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In Proceedings of the 24th international conference on world wide web, pages 1067–1077. International World Wide Web Conferences Steering Committee, 2015.
 [Tu et al.2017] Cunchao Tu, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. Transnet: Translationbased network representation learning for social relation extraction. In IJCAI, pages 2864–2870, 2017.
 [Wang et al.2016] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1225–1234. ACM, 2016.

[Xu et al.2019]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka.
How powerful are graph neural networks?
In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 69, 2019, 2019.  [Yang et al.2015] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. Network representation learning with rich text information. In TwentyFourth International Joint Conference on Artificial Intelligence, 2015.
 [Zhang et al.2018a] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Network representation learning: A survey. IEEE transactions on Big Data, 2018.
 [Zhang et al.2018b] Ziwei Zhang, Peng Cui, Xiao Wang, Jian Pei, Xuanrong Yao, and Wenwu Zhu. Arbitraryorder proximity preserved network embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2778–2786. ACM, 2018.
Comments
There are no comments yet.