1 Introduction
Network or graph is usually used to model the complex interaction relations in realworld data and systems [86, 24], e.g., transportation, social pattern, cooperative behavior and metabolic phenomenon. By convention, the network (or graph) is usually abstracted as some nodes and their complicated and elusive links. To understand such data, network analysis can help to explore the organization, analyze the structure, predict the missing links and control the dynamics in complex systems. For a long time, researchers have proposed specially designed methods and models for different graph mining tasks, such as preference mechanism, hierarchical structure and latent space model for link predication [58]; grouping or aggregation, bit compression and influence based for network summarization [52]; generalized threshold model, independent cascade model and linear Influence Model for information diffusion [105]. Among the core issues and applications of network analysis and graph mining, clustering [83], dividing the nodes into distinct or overlapping groups, has attracted attracted the most interest from different domains including machine learning and complex networks.
The field of network clustering has two main branches: community detection [26, 81, 12] and role discovery [78]. Community detection, the currently dominant clustering branch, is devoted to find common groups in which nodes interact more intensively than outside [23]. However, role discovery, which has a long research history in sociology [59] but had been inconspicuous in network science, groups the nodes based on the similarity of their structural patterns[51], such as the bridge or hub nodes [25]. In general, nodes in the same community are likely to be connected to each other, while nodes in the same role may be unconnected and often far away form each other. Since their rules on dividing nodes are fundamentally different, the two branches are usually considered as orthogonal problems. A variety of algorithms and models are proposed for both of the two branches. For community detection, the modularity optimization [10, 49], statistical model [85, 13], nonnegative matrix factorization [97, 65, 57, 56]
and deep learning methods
[87, 50] are developed and show crucial influence for other tasks and applications, such as recommendation[21, 106] and identifying criminal gangs [54]. Some surveys on community detection can be seen in [23, 22, 43]. For role discovery, traditional methods are usually graph based and related to some equivalence, such as the structural [53], regular [92] and stochastic equivalence [38, 62]. Blockmodels[20] and mixedmembership stochastic blockmodels [2]are the important and influential methods are based on the graph. Besides, there are also some combinatorial or heuristic methods
[3] for this problem.Here we take the Brazilian airtraffic network as an example. As shown in Fig. 1 (a), the nodes and edges denote the airports and their direct flights, respectively. The clustering labels are marked based on the activity of nodes [75]. The size and color of the circle represent the degree and label of the node, respectively. It can be observed that, the nodes with the same label (color), i.e., they are structurally similar, are usually not connected. To better illustrate these two clustering tasks, we choose two typical methods, Louvain [5] and RolX [34], which are specially designed for community detection and role discovery, respectively. The clustering results are presented in Fig. 1 (b) and (c). Community detection divides this network into some tightly connected groups, which means the airports with more flights among them belong to the same community. However, the clustering result detected by the role discovery, usually related to the flow and scale of the airport, is closer to the truth.
In recent years, network embedding (NE) has become has become the focus of studying graph structure and been demonstrated to achieve promising performance in many downstream tasks, e.g., node classification and link prediction. The motivation of NE is to transform the network data into independently distributed representations in a latent space and these representations are capable of preserving the topological structure and properties of the original network. On the whole, current methods for NE can be categorized into two types: shallow and deep learning (here we focus on unsupervised NE approaches without explicit mentioning). The former includes the matrix factorization and random walk based methods. The goal of matrix factorization methods is to learn node embedding via the lowrank approximation and approaching the adjacency matrix or higherorder similarity of the network, such as the singular value decomposition, nonnegative matrix factorization and NetSMF
[74]. With the different random walk strategies, a series of methods have been proposed to optimize the cooccurrence probability of nodes and learn effective embeddings. The later is mainly rooted in autoencoder and graph convolutional networks. These methods generally consist of the encoder, similarity function and decoder. There are also some attributes, characteristics and constraints that can be combined to enhance the embedding, and VGAE, GAT and GraphSAGE are some representatives of such methods. There are also some other types of embedding approaches, e.g., the latent feature model, but discussion on general NE methods is out of our scope. We refer the interested readers to some recent survey papers on NE
[14, 101, 104, 94, 27, 6, 32, 9].However, most of these methods, whether or not for network clustering, are designed for modeling the proximity
, i.e. the embedding vectors are community oriented. They fail to capture the structural similarity, or the role information
[79], Therefore, it raises several inherent challenges in the research of roleoriented network embedding. Firstly, the most and important is that two nodes with structural similarity have nothing to do with their distance, which makes it difficult to define the loss function effectively. Secondly, strict role definitions, such as some definitions based on equivalence, are difficult to be implemented in realworld networks especially largescale networks. Thirdly, the distribution of nodes with same role in the network is very complex and interaction patterns between different roles are unknown.
In essence, there are still some scattered methods being proposed one after another recently years. These methods uses various embedding mechanisms. Struc2vec [75] leverages random walks on graphs in which edges are weighted based on structural distances. DRNE develops a deep learning framework with layernormalized [88] LSTM model to learn regular equivalence. REACT [67], generating embeddings via matrix factorization, focus on capturing both community and role properties. Though the number of diverse roleoriented embedding methods is gradually increasing, there is still a lack of systematic understanding of roleoriented network embedding. Besides, we also lack a taxonomy for deep thinking of this problem. Meanwhile, there is short of performance and efficiency comparison of currently methods. All these limit the development and applications of role embedding.
So in this survey, we systematically analyze roleoriented network embedding and the analysis can help to understand the internal mechanism of currently methods. First, we propose a twolevel categorization scheme for existing methods which is based on embedding mechanism of currently methods and models. Further more, we evaluate selected embedding methods from the perspectives of both efficiency and effectiveness on different tasks related to role discovery. In specific, we conduct comprehensive experiments on some representative methods on running time (efficiency), node classification and clustering (role discovery), topk similarity search and visualization with widely used benchmark networks. Last, we summarize the applications, challenges and future directions of roleoriented network embedding.
Some surveys on network embedding, community detection, role discovery, and deep learning on graph have been conducted. Our survey has several essential differences compared to these works. [23, 22, 43] mainly study the problem of community detection with different focuses from the perspective of network analysis and machine learning. [78] is a seminal work in reviewing the development and methodology of role discovery. However, this survey is relatively outdated where more advanced methods, e.g., deep learning based methods, have not been discussed. Besides, these surveys focus on the methods specific for community or role task, while our work studies roles with a focus on network embedding approaches which can preserve the role information. The surveys [32, 14, 101, 9] are influential works on network embedding from different principles. However, they all focus on communityoriented methodology. Similarly, some graph embedding^{1}^{1}1We do not distinguish the difference between network embedding and graph embedding. reviews [27, 6], however, except for some technologies, have nothing to do with the roleoriented embedding. Meanwhile, surveys such as [94] and [104] introduce the effective deep learning framework and methods on graph or networks. They focus on general problems of how to use machine learning on networks and are less relevant to our problem. One relevant work is [79], it clarifies the difference between the communityoriented and roleoriented network embedding for the first time, and proposes the normal mechanisms which can help to understand if a method is designed for community or role. However, it does not systematically discuss the series of roleoriented NE methods: some advanced methods have been ignored, some introduced works are not used for role discovery or role related tasks. Moreover, it does not evaluate methods empirically by analyzing the relevant data, tasks and performance. Another recent work [44], which introduces some structural node embedding methods and evaluate them empirically, is the most relevant to our work. In analysis, they mainly focus on analyzing the relationships between NE methods and equivalence. In evaluation, they evaluate the discovered roles on direct tasks such as role classification and clustering. In contrast, we concentrate on analyzing advantages and disadvantages of different roleoriented approaches using a new twolevel categorization from the analysis perspective. We conduct more comprehensive experiments to evaluate different methods w.r.t. both efficiency and effectiveness in role discovery and downstream tasks including running time, classification, clustering, visualization, and topk similarity search.
To sum up, our survey has several contributions as follows.

We first show the summary of roleoriented network embedding and discuss the relationship and differences of it and community oriented.

We propose a two levels categorization schema of currently roleoriented embedding methods and briefly describe their formalization, mechanism, task, connection and difference.

We provide full experiments of popular methods of each type on different roleoriented tasks and detailed comparison on effectiveness and efficiency.

We share all the opensource code and widely used network datasets on Github
and point out the development and questions of roleoriented network embedding.
2 Notations and Framework
In this section, we give formal definitions of basic graph concepts and roleoriented network embeddings. In Table I, we summarize the main notations used throughout this paper. Then, we propose a unified framework for understanding the process of role oriented network embedding.
Definition 1 (Network)
A network is denoted as , where is the set of nodes and is the set of edges. An edge denotes the link between node and .
In usual, a network is represented by an weight matrix . If , ( for an unweighted network and for an undirected network), otherwise . Some networks may have an attribute matrix whose th row represents attributes of . For an undirected network, denote the degree of node as , and we have the degree matrix . is called the Laplacian matrix, it can be decomposed as where
is the matrix of eigenvalues satisfying
.Denote the hop () reachable neighbor set of node as ( is omitted when ), where the shortest path between and each node is less than or equal to . For a directed network, use , , and to represent the out/indegree and hop reachable out/inneighborhood of respectively. Unless otherwise stated, a model is discussed on unweighted undirected networks without attributes in later part of this paper.
Notation  Definition 
the network/graph with node set and edge set  
the set of ’s hop reachable neighbors  
the subgraph induced by and  
the degree of node  
the shortest path between and  
attribute matrix  
identity matrix  
embedding matrix  
the feature matrix extracted by or in method  
the similarity matrix obtained by or in method  
,  the concatenation operator 
*For conveniece, method notation is omitted in some descriptions.
Definition 2 (Motif/Graphlet)
A motif/graphlet is a small connected subgraph representing particular patterns of edges on several nodes. The pattern can be repeated in or across networks, i.e., many subgraphs can be sampled from networks and isomorphic to it. Nodes automorphic to each other, i.e., having the same connectivity patterns, are in the same orbits.
For unweighted networks, there are 9 motifs and 15 orbits with size 24 nodes as shown in Fig. 3. Because of their ability to model the smallest but most fundamental connectivity patterns, motifs are wildly used for capturing structural similarities and discovering roles.
Definition 3 (Network Clustering)
A clustering of network is a group of node sets satisfying and . In this paper, we discuss about hard clustering, i.e., . If , it is usually called overlapping or soft clustering.
For the community detection, each set is a tightly interconnected collection of nodes. And for role discovery, it usually composed of unconnected nodes which have similar structural patterns or functions. So every network clustering algorithm is committed to achieve the clustering results under different goal constraints. However, there is no common understanding of role equivalence or similarity, which leads to multifarious definitions of equivalence and designs of similarity computation. For example, two nodes are automorphic equivalent [37] as the subgraphs of their neighborhood are isomorphic, while regular equivalence [91] means that if two nodes have the same roles, there neighbors have the same roles.
Definition 4 (Network Embedding)
Network Embedding is a process to map nodes of network to lowdimensional embeddings so that . In general, for nodes and , if they are similar in the network, their embedding vectors and will be close in the low dimensional space.
With the node embedding, we can take it for different network tasks. If we focus on the community detection or link predication, we want to discriminate by embeddings whether the nodes are connected or likely to be connected. However, for role discovery, the embeddings should reflect some structural patterns including local properties like subgraph isomorphism, global properties like regular equivalence and higherorder properties like motifs.
Based on the discussion and notations above, here we firstly propose a unified framework for understanding the roleoriented network embedding. To our knowledge, it can cover almost all the existing methods and models in a unified way. The framework is illustrated in Fig. 2. Structure of networks is discrete, but embeddings are usually designed to lie in continuous space. Thus, roleoriented embedding methods always take two steps to capture structrual properties and generate embeddings respectively to bridge the gulf between two spaces:

Structure Property Extraction. The ways to extract structural information are diverse. Some methods such as RolX [34] and DRNE [88] leverages some primary structural features including node degree, triangle numbers. Part of these methods such as SPINE [29] will continue to transform the features into distances or similarities. There are also some methods captureing similarity between nodecentric subgraphs. For example, struc2vec [75] compute structural distances between hop subgraphs based on degree sequences of the subgraphs. SEGK [61] employs one graph isomorphism test skill called graph kernel on subgraphs. As the result of these handcraft process, these structural properties are contained in interim features or pairwise similarities.

Embedding. The extracted properties then are mapped into embedding space via different mechanisms. In the process of embedding, the structural properties are used as inputs or training guidance. For example, RolX [34] and SEGK [61] apply lowrank matrix factorization on feature matrix and similarity matrix respectively, as they implicitly or explicitly reflect whether nodes are structurally similar. Struc2vec[75, 29] leverages word embedding methods to the similaritybiased random walks. DRNE [88] utilizes LSTM [36] on degreeordered node embedding sequences to capture regular equivalence with a degreeguided regularizer.
As the embeddings capture crucial structural properties, they can be used on the downstream tasks such as rolebased node classification and visualization. With this framework, we generalize the process of roleoriented embedding. However, as we can learn from Fig. 2, the core of designing roleoriented embedding methods is the way to extract structural properties. In contrast, the embedding mechanisms for mapping structural features/similarities into lowdimensional continuous vector space are much more regular. Thus, We introduce the popular methods in the next section from the perspective of embedding mechanisms.
3 Algorithm Taxonomy
In this section, we introduce these approaches categorized according to their embedding mechanisms. In detail, we propose a twolevel classification ontology for these popular methods. Similar to the taxonomy of community oriented network embedding, we divide these into three categories, lowrank matrix factorization, random walk based and deep learning methods from the first level. Further more, with there embedding mechanisms and constraint information, we give a more refined classification taxonomy as shown in TABLE II. At the same time, we also list the tasks which can be served by different methods. Next, we will introduce these methods in detail.
Method  Embedding Mechanism  Conducted Tasks  Year  
Vis  CLF/CLT  ER/NA/SS  LP  
RolX[34] 


2012  
GLRD[25]  2013  
RIDRs[31]  2017  
GraphWave[16]  2018  
HONE[77]  2020  
xNetMF[33] 

2018  
EMBER[42]  2019  
SEGK[61]  2019  
REACT[67]  2019  
SPaE[84]  2019  
struc2vec[75] 


2017  
SPINE[29]  2019  
struc2gauss[66]  2020  
Role2Vec[1] 

2019  
RiWalk[96]  2019  
NODE2BITS[41]  2019  
DRNE[88] 


2018  
GAS[30]  2020  
RESD[103]  2021  
GraLSP[46]  2020  
GCC[73]  2020 
3.1 Lowrank Matrix Factorization
Lowrank matrix factorization is the most commonly used method for roleoriented embeddding methods. They generate embeddings by factorizing matrices preserving the role similarities between nodes implicitly (i.e. feature matrices) or explicitly (i.e. similarity matrices).
3.1.1 Structural Feature Matrix Factorization
RolX [34]
. RolX takes the advantages of feature extraction method ReFeX
[35] by decomposing the ReFeX feature matrix . ReFeX firstly computes some primary features such as degree and clustering coefficient for each node. Then it aggregates neighbors’ features with sum and meanaggregator recursively. In recursive steps, it can capture very thorough features to express the structure of hop reachable neighborhood. Nonnegative Matrix Factorization (NMF) is used for generating embeddings as it is efficient compared with other matrix decomposition methods. The nonnegative constraints are adapted to interpretation of roles. Thus, RolX aims to obtain two lowrank matrices as follows:(1) 
where is the embedding matrix (or role assignment matrix) and the matrix (role definition matrix) describes the contributions of each role to structural features. is the number of hidden roles which is determined by Minimum Description Length (MDL) [76].
GLRD [25]. GLRD extends RolX by adding different optional constraints to objective function (1). Sparsity constraint () is defined for more definitive role assignments and definitions while diversity constraint () is for reducing the overlapping. and are previously discovered role assignments and definitions with which alternativeness constraint () can be used for mining roles unknown.
RIDRs [31]. RIDRs uses equitable refinement (ER) to partition nodes into different cells and compute graphbased features. An equitable refinement partition of satisfies the following rule:
(2) 
where denotes the number of nodes in cell connected to node . As nodes in the same cell have similar number of connections to the nodes in another cell, ERs could capture some connectivity patterns.
Based on the cells partitioned by ERs with an relaxation parameter , the feature matrix is defined as . After prunning and binning process, the feature matrices for all are concatenated as the final feature matrix . Finally, like RolX and GLRD, NMF is applied for embedding generation while right sparsity constraint (on ) is optional for more definitive role representations.
GraphWave [16]
. GraphWave treats graph diffusion kernels as probability distributions over networks and gets embeddings by using characteristic functions of the distributions. Specifically, take the heat kernel
with scaling parameter as an example, the spectral graph wavelets are defined as:(3) 
where
is the onehot encoding matrix on
and the scaling parameter is omitted. The th row represents the resulting signal from a Dirac signal around node . Considering the empirical characteristic function:(4) 
where denotes the imaginary number, ’s embedding vector is generated by concatenating pairs of and at evenly spaced points .
HONE [77]. HONE constructs weighted motif graphs in which the weight of an edge is the count of the cooccurrences of the two endpoints in a specific motif. For a motif represented by its weighted motif adjacency matrix , HONE characters the higherorder structure by deriving matrices from its kstep matrices . These new matrices are designed by imitating some popular matrices based on normal adjacency matrix such as transition matrix and Laplacian matrix . Here we use to denote the derived matrices. Then the kstep embeddings can be learned as:
(5) 
where is the Bregman divergence and is a matching function. The global embeddings are generated by minimizing the following objective:
(6) 
where is obtained by concatenating the with all the considered motifs and steps. If necessary, attributes diffused by transition matrix based on different motifs and steps can be added into .
Remark. Aforementioned methods assume that nodes in similar roles have similar structural features. Thus, they apply matrix factorization on the feature matrices to obtain rolebased representations. RolX, GLRD and RIDRs directly get embeddings which give soft role assignment by factorizing feature matrices. As the feature matrices are usually lower dimension, these methods are quite efficient. GraphWave uses eigendecomposition and empirical characteristic function to characterize the structural patterns of each node, which leads to robust embeddings but high computation cost. The weighted motif adjacency matrices in HONE capture higherorder proximities actually, while they can obtain structural information because each matrix represents one motif.
3.1.2 Structural Similarity Matrix Factorizaiton
xNetMF [33]. xNetMF is an embedding method designed for an embeddingbased network alignment approach REGAL. It firstly obtains a nodetonode similarity matrix based on both structures and attributes:
(7) 
where and are structurebased and attributebased distance between node and node while and are balance parameters of the two distances. is the Euclidean distance between node features. And counts different attributes between nodes, i.e., . The feature matrix is defined by counting nodes with the same logarithmically binned degree in each node’s hop reachable neighborhood as follows:
(8)  
where is a discount factor for lessening the importance of higherhop neighbors.
Then on computed , matrix factorization methods can be applied for obtaining embedding matrix satisfying . As the high dimension and rank of lead to high computation, an implicit matrix factorization approach extending Nyström method [17] is proposed as follows:

Select nodes as landmarks randomly or based on node centralities.

Compute a nodetolandmark similarity matrix with Eq.(7) and extract a landmarktolandmark similarity matrix from .

Obtain embedding matrix by computing and normailize .
With above method, embeddings are actually generated by factorizing a lowrank approximation of , i.e., . Meanwhile, the computation can be reduced, as only a small matrix is decomposed.
EMBER [42]. EMBER is designed for mining professional roles in weighted directed email networks. It defines node outgoing feature matrix as:
(9)  
where denotes the product of all edge weights in a step shortest outgoing path . The incoming feature matrix is defined similarly. By concatenating the incoming and outgoing feature matrices, the final feature matrix is obtained. The nodetonode similarities are computed through Eq.(7) without attributebased distance, i.e., . EMBER uses the same implicit matrix factorization approach to generate embeddings. Note that if the feature extraction part of EMBER is applied on an undirected unweight network, EMBER will be equivalent to xNetMF without attributes.
SEGK [61]. SEGK leverages graph kernels to compute node structural similarities. To compare the structure more carefully, it computes node similarities with different scales of neighborhood as follows:
(10) 
where and denotes the normalized kernel which is defined as:
(11) 
SEGK chooses the shortest path kernel, WeisfeilerLehman subtree kernel, or graphlet kernel for practical use of . Then Nyström method [93] is employed on the factorization of for efficient computation and low dimensions of embeddings as follows:
(12) 
where denotes the matrix of first eigenvectors and is the diagonal matrix of corresponding eigenvalues.
REACT [67]. REACT aims to detect communities and discover roles by applying nonnegative matrix trifactorization on RoleSim [45] matrix and adjacency matrix , respectively. RoleSim matrix is developed with the idea of regular equivalence and is a pairwise similarity matrix computed by iteratively updating the following scores:
(13) 
where is a matching between the neighborhoods of and , and () is a decay factor. In addition, norm is leveraged as the regularization to make the distribution of roles within communities as diverse as possible. Thus, the objective function of REACT is:
(14)  
where / denotes the embedding matrix for roles/communities, and / denotes the interaction between roles/communities. is the weight of regularization. Orthogonal constraint on embedding matrices is added for increased interpretability.
SPaE [67]
. SPaE also tries to capture communities and roles simultaneously. For node structural similarity, it computes cosine similarity between the standardized Graphlet Degree Vectors of nodes, and generates rolebased embeddings via Laplacian eigenmaps method as follows:
(15) 
where is the symmetric normalized matrix of structural similarity matrix . SPaE obtains communitybased embeddings similarly as follows:
(16) 
where is the symmetric normalized adjacency matrix. To map and into a unified embedding space, SPaE generates hybrid embeddings by maximizing the following objective function:
(17)  
where denotes the hybrid embedding matrix and is the balance parameter. and .
Remark. These methods all explicitly compute structural similarities based on features, e.g., graph kernels, role equivalence, and so on. Most of them have considered the similarities between multiple hops of neighborhoods. Their effectiveness on role discovery depends on the quality of the similarity matrices. One major problem of this kind of methods is the issue of efficiency: computing pairwise similarity and factorizing the highdimensional similarity matrix are timeconsuming. So xNetMF, EMBER and SEGK apply Nyström method to improve the efficiency as their similarity matrices are Gram matrices [17].
3.2 Shallow Models Using Random Walks
Random walk is a common way to capture node proximity used by network embedding methods [72, 28]. Recently, two strategies have been proposed to adapt random walks to roleoriented tasks: (1) structural similaritybiased random walks makes structurally similar nodes more likely to appear in the same sequence (as shown in Fig. 4(b)). (2) structural featurebased random walks, e.g., attributed random walks [1], map nodes with similar structural features to the same role indicator and replace ids in random walk sequences with the indicators (see Fig. 4(c)). The first way can preserve structural similarity into cooccurrence relations of nodes in the walks. While the second way preserves structural similarity into role indicators and may capture the proximity between roles through the cooccurrence relations of the indicators as well.
Usually, language models such as SkipGram
[60] are applied on generated random walks to map the similarities into embedding vectors [72, 28, 75, 1]. However, some different mapping mechanisms are also employed such as the SimHash [8] used in NODE2BITS [41].3.2.1 Structural Similaritybiased Random Walks
struc2vec [75]. Struc2vec generates structurally biased idbased contexts via random walks on a hierarchy of constructed complete graphs.
In detail, it firstly computes structural distances between a pair of nodes as follows:
(18)  
where denotes Dynamic Time Warping (DTW). is adopted as the distance function for DTW. is the diameter of the . is the ordered degree sequence of nodes at the exact distance from . Note that and is set to the constant 0.
Then a multilayer weighted context graph is built. Each layer is an undirected complete graph . where the corresponding node of in layer is denoted as . The weight of edge is defined as follows:
(19) 
The neighboring layers are connected through directed edges between the corresponding nodes. Thus and . The edge weights between layers are defined as follows:
(20)  
where counts the edge whose weight is larger than the average edge weight of . That is:
(21)  
Then idbased random walks can be employed on and started in layer for context generation of each node. In detail, the walk stays in the current layer with a given probability . In this situation, the probability of a walk from to is:
(22) 
With probability , the walk steps across layers with the following stepping probability:
(23)  
Note that with different have the same id in the context.
On the structural context, struc2vec leverages SkipGram with Hierarchical Softmax to learn embeddings.
SPINE [29]. SPINE uses largest values of th row of Rooted PageRank Matrix as ’s feature . is the probability of stepping to a neighbor, while with probability , a walk steps back to the start node. For the inductive setting, SPINE computes via a Monte Carlo approximation.
To simultaneously capture structural similarity and proximity, SPINE designs a biased random walk method. With probability , the walk steps to a structural similar node based on the following transition matrix:
(24) 
Here can be computed via DTW or other methods based on node features. With probability , normal random walks are applied. Thus, with larger , SPINE can be more roleoriented. The embeddings are learned through SkipGram with Negative Sampling (SGNS). To leverage attributes, the embeddings are generated as:
(25) 
where represents the attribute matrix of which the rows correspond to largest values of .
is the weight matrix of the multilayer perceptron (MLP).
struc2gauss [66]. For each node, struc2gauss generates a Gaussian distribution:
to model both structural similarity and uncertainty. After calculating structural similarity via existing methods such as RoleSim [45], it samples the top most similar nodes for a node as its positive set . The positive sampling of struc2gauss could be regarded as special random walks with mandatory restart on a starshaped graph where the star center is the target node and star edges are the most similar nodes. The negative sample set has the same size of and is generated as in the normal randomwalk based methods. To push the Gaussian embeddings of similar nodes closer and those of dissimilar nodes farther, struc2gauss uses the following maxmargin ranking objective:(26) 
where is the margin parameter to push dissimilar distributions apart, and is the similarity measure between distributions of and . There are different similarity measures that can be used such as logarithmic inner product and KL divergence. For normal tasks, the mean vectors of those Gaussian distributions can be treated as embeddings, i.e., .
Remark. These methods reconstruct the edges between nodes based on the structural similarities so that the context nodes obtained by random walks are structurally similar to the central nodes. Compared with SPINE and struc2gauss, struc2vec clearly construct edges that better represent role information in the multilayer complete graphs, which leads to better embeddings but higher time and space complexities.
3.2.2 Structural Featurebased Random Walks
Role2Vec [1]
. Role2Vec firstly maps nodes into several disjoint roles. Logarithmically binning, Kmeans with lowrank factorization and other methods on features and attributes can be chosen for the role mapping
. Motifbased features, such as Graphlet Degree Vectors, are recommended since motif can better capture the highorder structural information.Then random walks are performed but the ids in generated sequences are replaced with role indicators. With featurebased role context, the language model CBOW can be used for obtaining embeddings of roles. Nodes partitioned into the same role have the same embeddings
RiWalk [96]. RiWalk designs structural node indicators approximating graph kernels. In a given subgraph , the indicator approximating shortest path kernel for node is defined as the concatenation of the degrees of and and the shortest path length between them:
(27) 
where is a logarithmically binning function. The indicator approximating WeisfeilerLehman subtree kernel is defined as:
(28) 
where is a vector of length whose th element is the count of ’s neighbers at distance to in , i.e.:
(29) 
Then the random walks starting from are performed on each . The nodes are relabeled indicated by Eq.(27) or Eq.(28) while only is not relabeled. And embeddings are learned via SGNS on the generated sequences.
NODE2BITS [41]. NODE2BITS is designed for entity resolution on temporal networks. Here we use to denote the the timestamp of edge . To integrate temporal information, NODE2BITS utilizes temporal random walks in which edges are sampled with nondecreasing timestamps. The following stepping probability is defined to capture shortterm transitions in temporal walks:
(30) 
where is the maximal duration between all timestamps. The stepping probability in longterm policy is defined similarly with positive signs. Multiple walks are generated for each edge and the temporal context of different hops for a node can be extracted from the walks. Then structual features and attributes are fused in temporal walks. For each node with a specific , histograms are applied on multidimensional features (and node types if the network is heterogeneous) to aggregate information in the neighborhood and they are concatenated as a vector . SimHash [8] is applied by projecting the histogram
to several random hyperplanes for generating binary hashcode
. The final embeddings are obtained via concatenation on across different s.Remark. The above three methods are very different on their motivations of utilizing structural featurebased random walks. Role2vec assigns roles firstly and then employs random walks with role indicators. It essentially captures proximity between assigned roles. RiWalk relabels the walks in subgraphs to approximate graph kernels. NODE2BITS uses random walks as neighbor feature aggregators.
3.3 Deep Learning Models
Recently, a few works focus on leveraging deep learning techniques to roleoriented network representation learning. Though deep learning can provide more varied and powerful mapping mechanisms, it needs to be trained with more carefully designed structural information guidance.
3.3.1 Structural Information Reconstruction/Guidance
DRNE [88]. DRNE is proposed to capture regular equivalence in networks, so it learns node embeddings in a recursive way with the following loss function:
(31) 
where
is the aggregation of the neighbors’ embeddings via a layer normalized Long ShortTerm Memory. To make the neighbor information available for LNLSTM, for each node
, it downsamples a fixed number of neighbors with large degrees and orders them based on the degrees. Denoting their embeddings as the aggregating process is and finally .Additionally, DRNE proposes a degreeguided regularizer to avoid the trivial solution where all embeddings are . The regularizer is as follows:
(32) 
The regularizer with a parameter is weighed and the whole model is trained via the combined loss:
(33) 
GAS [30]
. Graph Neural Networks have the power to capture structure as they are closely related to WeisfeilerLehman (WL) test in some ways
[95]. GAS applies a layer graph convolutional encoder, in which each layer is :(34) 
where and is the parameter matrix in the th layer. The input could be or an embedding lookup table. Here the sumpooling propagation rule is applied instead of the original GCN [48] to better distinguish local structures. In fact, more powerful GNNs such as Graph Isomorphic Network [95] may further improve the performance. The key idea for GAS is that using a few critical structural features as the guidance information to train the model. The features are extracted in a similar way proposed in ReFeX but aggregated only once, normalized and not binned. With a MLP model as the decoder to approximate the features, i.e., . The loss function is:
(35) 
RESD [103]. RESD also adopts ReFeX [35] to extract appropriate features . It uses a Variational AutoEncoder [47] architecture to learn the lownoise and robust representations:
(36)  
The VAE model is trained via feature reconstruction. A degreeguided regularizer Eq.(32) designed in DRNE [88] is introduced in RESD for preserving topological characteristics. The combined objective is as follows:
(37) 
GraLSP [46]. GraLSP is a GNN framework integrating local structural patterns that can be employed on roleoriented tasks. For a node , it captures structural patterns by generating random walks starting from with length : , and then anonymizes them [39]. Each anonymous walk is represented as an embedding lookup table . Then the aggregation of neighborhood representation is designed as follows:
(38)  
where and are trainable parameter matrices. is learned attention values based on their local structure:
(39) 
denotes a singlelayer perceptron. is the amplification coefficients:
(40) 
To preserve proximities between nodes, the loss function in DeepWalk [72] is leveraged:
(41)  
After aggregations, the embeddings are . To capture structural similarities between nodes, GraLSP designs the following loss:
(42)  