DeepAI
Log In Sign Up

A Survey on Role-Oriented Network Embedding

07/18/2021
by   Pengfei Jiao, et al.
Tianjin University
TU Eindhoven
0

Recently, Network Embedding (NE) has become one of the most attractive research topics in machine learning and data mining. NE approaches have achieved promising performance in various of graph mining tasks including link prediction and node clustering and classification. A wide variety of NE methods focus on the proximity of networks. They learn community-oriented embedding for each node, where the corresponding representations are similar if two nodes are closer to each other in the network. Meanwhile, there is another type of structural similarity, i.e., role-based similarity, which is usually complementary and completely different from the proximity. In order to preserve the role-based structural similarity, the problem of role-oriented NE is raised. However, compared to community-oriented NE problem, there are only a few role-oriented embedding approaches proposed recently. Although less explored, considering the importance of roles in analyzing networks and many applications that role-oriented NE can shed light on, it is necessary and timely to provide a comprehensive overview of existing role-oriented NE methods. In this review, we first clarify the differences between community-oriented and role-oriented network embedding. Afterwards, we propose a general framework for understanding role-oriented NE and a two-level categorization to better classify existing methods. Then, we select some representative methods according to the proposed categorization and briefly introduce them by discussing their motivation, development and differences. Moreover, we conduct comprehensive experiments to empirically evaluate these methods on a variety of role-related tasks including node classification and clustering (role discovery), top-k similarity search and visualization using some widely used synthetic and real-world datasets...

READ FULL TEXT VIEW PDF

page 4

page 15

page 20

09/14/2021

Embedding Node Structural Role Identity Using Stress Majorization

Nodes in networks may have one or more functions that determine their ro...
10/15/2019

RiWalk: Fast Structural Node Embedding via Role Identification

Nodes performing different functions in a network have different roles, ...
05/25/2019

Is a Single Vector Enough? Exploring Node Polysemy for Network Embedding

Networks have been widely used as the data structure for abstracting rea...
10/15/2021

Role Similarity Metric Based on Spanning Rooted Forest

As a fundamental issue in network analysis, structural node similarity h...
01/08/2021

Twitch Gamers: a Dataset for Evaluating Proximity Preserving and Structural Role-based Node Embeddings

Proximity preserving and structural role-based node embeddings became a ...
08/22/2019

From Community to Role-based Graph Embeddings

Roles are sets of structurally similar nodes that are more similar to no...
12/19/2012

Role Mining with Probabilistic Models

Role mining tackles the problem of finding a role-based access control (...

1 Introduction

Network or graph is usually used to model the complex interaction relations in real-world data and systems [86, 24], e.g., transportation, social pattern, cooperative behavior and metabolic phenomenon. By convention, the network (or graph) is usually abstracted as some nodes and their complicated and elusive links. To understand such data, network analysis can help to explore the organization, analyze the structure, predict the missing links and control the dynamics in complex systems. For a long time, researchers have proposed specially designed methods and models for different graph mining tasks, such as preference mechanism, hierarchical structure and latent space model for link predication [58]; grouping or aggregation, bit compression and influence based for network summarization [52]; generalized threshold model, independent cascade model and linear Influence Model for information diffusion [105]. Among the core issues and applications of network analysis and graph mining, clustering [83], dividing the nodes into distinct or overlapping groups, has attracted attracted the most interest from different domains including machine learning and complex networks.

The field of network clustering has two main branches: community detection [26, 81, 12] and role discovery [78]. Community detection, the currently dominant clustering branch, is devoted to find common groups in which nodes interact more intensively than outside [23]. However, role discovery, which has a long research history in sociology [59] but had been inconspicuous in network science, groups the nodes based on the similarity of their structural patterns[51], such as the bridge or hub nodes [25]. In general, nodes in the same community are likely to be connected to each other, while nodes in the same role may be unconnected and often far away form each other. Since their rules on dividing nodes are fundamentally different, the two branches are usually considered as orthogonal problems. A variety of algorithms and models are proposed for both of the two branches. For community detection, the modularity optimization [10, 49], statistical model [85, 13], non-negative matrix factorization [97, 65, 57, 56]

and deep learning methods 

[87, 50] are developed and show crucial influence for other tasks and applications, such as recommendation[21, 106] and identifying criminal gangs [54]. Some surveys on community detection can be seen in [23, 22, 43]. For role discovery, traditional methods are usually graph based and related to some equivalence, such as the structural [53], regular [92] and stochastic equivalence [38, 62]. Blockmodels[20] and mixed-membership stochastic blockmodels [2]

are the important and influential methods are based on the graph. Besides, there are also some combinatorial or heuristic methods

[3] for this problem.

(a) Truth label
(b) Community detection
(c) Role discovery
Fig. 1: Air Brazil network for understanding the community detection and role discovery in network clustering. Nodes with same clustering label have same color. (a) Brazil network with ground-truth clustering label; (b) Community detection with Louvain [5]; (c) Role discovery with RolX [34].

Here we take the Brazilian air-traffic network as an example. As shown in Fig. 1 (a), the nodes and edges denote the airports and their direct flights, respectively. The clustering labels are marked based on the activity of nodes [75]. The size and color of the circle represent the degree and label of the node, respectively. It can be observed that, the nodes with the same label (color), i.e., they are structurally similar, are usually not connected. To better illustrate these two clustering tasks, we choose two typical methods, Louvain [5] and RolX [34], which are specially designed for community detection and role discovery, respectively. The clustering results are presented in Fig. 1 (b) and (c). Community detection divides this network into some tightly connected groups, which means the airports with more flights among them belong to the same community. However, the clustering result detected by the role discovery, usually related to the flow and scale of the airport, is closer to the truth.

In recent years, network embedding (NE) has become has become the focus of studying graph structure and been demonstrated to achieve promising performance in many downstream tasks, e.g., node classification and link prediction. The motivation of NE is to transform the network data into independently distributed representations in a latent space and these representations are capable of preserving the topological structure and properties of the original network. On the whole, current methods for NE can be categorized into two types: shallow and deep learning (here we focus on unsupervised NE approaches without explicit mentioning). The former includes the matrix factorization and random walk based methods. The goal of matrix factorization methods is to learn node embedding via the low-rank approximation and approaching the adjacency matrix or higher-order similarity of the network, such as the singular value decomposition, non-negative matrix factorization and NetSMF 

[74]

. With the different random walk strategies, a series of methods have been proposed to optimize the co-occurrence probability of nodes and learn effective embeddings. The later is mainly rooted in autoencoder and graph convolutional networks. These methods generally consist of the encoder, similarity function and decoder. There are also some attributes, characteristics and constraints that can be combined to enhance the embedding, and VGAE, GAT and GraphSAGE are some representatives of such methods. There are also some other types of embedding approaches, e.g., the latent feature model, but discussion on general NE methods is out of our scope. We refer the interested readers to some recent survey papers on NE 

[14, 101, 104, 94, 27, 6, 32, 9].

However, most of these methods, whether or not for network clustering, are designed for modeling the proximity

, i.e. the embedding vectors are community oriented. They fail to capture the structural similarity, or the role information 

[79]

, Therefore, it raises several inherent challenges in the research of role-oriented network embedding. Firstly, the most and important is that two nodes with structural similarity have nothing to do with their distance, which makes it difficult to define the loss function effectively. Secondly, strict role definitions, such as some definitions based on equivalence, are difficult to be implemented in real-world networks especially large-scale networks. Thirdly, the distribution of nodes with same role in the network is very complex and interaction patterns between different roles are unknown.

In essence, there are still some scattered methods being proposed one after another recently years. These methods uses various embedding mechanisms. Struc2vec [75] leverages random walks on graphs in which edges are weighted based on structural distances. DRNE develops a deep learning framework with layer-normalized [88] LSTM model to learn regular equivalence. REACT [67], generating embeddings via matrix factorization, focus on capturing both community and role properties. Though the number of diverse role-oriented embedding methods is gradually increasing, there is still a lack of systematic understanding of role-oriented network embedding. Besides, we also lack a taxonomy for deep thinking of this problem. Meanwhile, there is short of performance and efficiency comparison of currently methods. All these limit the development and applications of role embedding.

So in this survey, we systematically analyze role-oriented network embedding and the analysis can help to understand the internal mechanism of currently methods. First, we propose a two-level categorization scheme for existing methods which is based on embedding mechanism of currently methods and models. Further more, we evaluate selected embedding methods from the perspectives of both efficiency and effectiveness on different tasks related to role discovery. In specific, we conduct comprehensive experiments on some representative methods on running time (efficiency), node classification and clustering (role discovery), top-k similarity search and visualization with widely used benchmark networks. Last, we summarize the applications, challenges and future directions of role-oriented network embedding.

Some surveys on network embedding, community detection, role discovery, and deep learning on graph have been conducted. Our survey has several essential differences compared to these works. [23, 22, 43] mainly study the problem of community detection with different focuses from the perspective of network analysis and machine learning. [78] is a seminal work in reviewing the development and methodology of role discovery. However, this survey is relatively outdated where more advanced methods, e.g., deep learning based methods, have not been discussed. Besides, these surveys focus on the methods specific for community or role task, while our work studies roles with a focus on network embedding approaches which can preserve the role information. The surveys [32, 14, 101, 9] are influential works on network embedding from different principles. However, they all focus on community-oriented methodology. Similarly, some graph embedding111We do not distinguish the difference between network embedding and graph embedding. reviews [27, 6], however, except for some technologies, have nothing to do with the role-oriented embedding. Meanwhile, surveys such as [94] and [104] introduce the effective deep learning framework and methods on graph or networks. They focus on general problems of how to use machine learning on networks and are less relevant to our problem. One relevant work is [79], it clarifies the difference between the community-oriented and role-oriented network embedding for the first time, and proposes the normal mechanisms which can help to understand if a method is designed for community or role. However, it does not systematically discuss the series of role-oriented NE methods: some advanced methods have been ignored, some introduced works are not used for role discovery or role related tasks. Moreover, it does not evaluate methods empirically by analyzing the relevant data, tasks and performance. Another recent work [44], which introduces some structural node embedding methods and evaluate them empirically, is the most relevant to our work. In analysis, they mainly focus on analyzing the relationships between NE methods and equivalence. In evaluation, they evaluate the discovered roles on direct tasks such as role classification and clustering. In contrast, we concentrate on analyzing advantages and disadvantages of different role-oriented approaches using a new two-level categorization from the analysis perspective. We conduct more comprehensive experiments to evaluate different methods w.r.t. both efficiency and effectiveness in role discovery and downstream tasks including running time, classification, clustering, visualization, and top-k similarity search.

To sum up, our survey has several contributions as follows.

  • We first show the summary of role-oriented network embedding and discuss the relationship and differences of it and community oriented.

  • We propose a two levels categorization schema of currently role-oriented embedding methods and briefly describe their formalization, mechanism, task, connection and difference.

  • We provide full experiments of popular methods of each type on different role-oriented tasks and detailed comparison on effectiveness and efficiency.

  • We share all the open-source code and widely used network datasets on Github

    and point out the development and questions of role-oriented network embedding.

2 Notations and Framework

In this section, we give formal definitions of basic graph concepts and role-oriented network embeddings. In Table I, we summarize the main notations used throughout this paper. Then, we propose a unified framework for understanding the process of role oriented network embedding.

Definition 1 (Network)

A network is denoted as , where is the set of nodes and is the set of edges. An edge denotes the link between node and .

In usual, a network is represented by an weight matrix . If , ( for an unweighted network and for an undirected network), otherwise . Some networks may have an attribute matrix whose th row represents attributes of . For an undirected network, denote the degree of node as , and we have the degree matrix . is called the Laplacian matrix, it can be decomposed as where

is the matrix of eigenvalues satisfying

.

Denote the -hop () reachable neighbor set of node as ( is omitted when ), where the shortest path between and each node is less than or equal to . For a directed network, use , , and to represent the out/in-degree and -hop reachable out/in-neighborhood of respectively. Unless otherwise stated, a model is discussed on unweighted undirected networks without attributes in later part of this paper.

Notation Definition
the network/graph with node set and edge set
the set of ’s -hop reachable neighbors
the subgraph induced by and
the degree of node
the shortest path between and
attribute matrix
identity matrix
embedding matrix
the feature matrix extracted by or in method
the similarity matrix obtained by or in method
, the concatenation operator

*For conveniece, method notation is omitted in some descriptions.

TABLE I: Main Notations.
Fig. 2: The common framework of role-oriented network embedding methods includes two main step: structural property extraction and embedding. The former one can be accomplished via a variety of ways of which some are feature-based and some are similarity-based methods. On the extracted properties, role-oriented embedding methods then employ some specific embedding mechanisms to generate embeddings. Note that the discussed role-oriented methods are unsupervised. Thus, though the generated embeddings can be applied on some downstream tasks with ground truth, the whole process of embedding generation has no interaction with the target tasks.
Definition 2 (Motif/Graphlet)

A motif/graphlet is a small connected subgraph representing particular patterns of edges on several nodes. The pattern can be repeated in or across networks, i.e., many subgraphs can be sampled from networks and isomorphic to it. Nodes automorphic to each other, i.e., having the same connectivity patterns, are in the same orbits.

Fig. 3: Motifs and orbits (denoted by numbers) with size 2-4 nodes.

For unweighted networks, there are 9 motifs and 15 orbits with size 2-4 nodes as shown in Fig. 3. Because of their ability to model the smallest but most fundamental connectivity patterns, motifs are wildly used for capturing structural similarities and discovering roles.

Definition 3 (Network Clustering)

A clustering of network is a group of node sets satisfying and . In this paper, we discuss about hard clustering, i.e., . If , it is usually called overlapping or soft clustering.

For the community detection, each set is a tightly interconnected collection of nodes. And for role discovery, it usually composed of unconnected nodes which have similar structural patterns or functions. So every network clustering algorithm is committed to achieve the clustering results under different goal constraints. However, there is no common understanding of role equivalence or similarity, which leads to multifarious definitions of equivalence and designs of similarity computation. For example, two nodes are automorphic equivalent [37] as the subgraphs of their neighborhood are isomorphic, while regular equivalence [91] means that if two nodes have the same roles, there neighbors have the same roles.

Definition 4 (Network Embedding)

Network Embedding is a process to map nodes of network to low-dimensional embeddings so that . In general, for nodes and , if they are similar in the network, their embedding vectors and will be close in the low dimensional space.

With the node embedding, we can take it for different network tasks. If we focus on the community detection or link predication, we want to discriminate by embeddings whether the nodes are connected or likely to be connected. However, for role discovery, the embeddings should reflect some structural patterns including local properties like subgraph isomorphism, global properties like regular equivalence and higher-order properties like motifs.

Based on the discussion and notations above, here we firstly propose a unified framework for understanding the role-oriented network embedding. To our knowledge, it can cover almost all the existing methods and models in a unified way. The framework is illustrated in Fig. 2. Structure of networks is discrete, but embeddings are usually designed to lie in continuous space. Thus, role-oriented embedding methods always take two steps to capture structrual properties and generate embeddings respectively to bridge the gulf between two spaces:

  • Structure Property Extraction. The ways to extract structural information are diverse. Some methods such as RolX [34] and DRNE [88] leverages some primary structural features including node degree, triangle numbers. Part of these methods such as SPINE [29] will continue to transform the features into distances or similarities. There are also some methods captureing similarity between node-centric subgraphs. For example, struc2vec [75] compute structural distances between -hop subgraphs based on degree sequences of the subgraphs. SEGK [61] employs one graph isomorphism test skill called graph kernel on subgraphs. As the result of these hand-craft process, these structural properties are contained in interim features or pair-wise similarities.

  • Embedding. The extracted properties then are mapped into embedding space via different mechanisms. In the process of embedding, the structural properties are used as inputs or training guidance. For example, RolX [34] and SEGK [61] apply low-rank matrix factorization on feature matrix and similarity matrix respectively, as they implicitly or explicitly reflect whether nodes are structurally similar. Struc2vec[75, 29] leverages word embedding methods to the similarity-biased random walks. DRNE [88] utilizes LSTM [36] on degree-ordered node embedding sequences to capture regular equivalence with a degree-guided regularizer.

As the embeddings capture crucial structural properties, they can be used on the downstream tasks such as role-based node classification and visualization. With this framework, we generalize the process of role-oriented embedding. However, as we can learn from Fig. 2, the core of designing role-oriented embedding methods is the way to extract structural properties. In contrast, the embedding mechanisms for mapping structural features/similarities into low-dimensional continuous vector space are much more regular. Thus, We introduce the popular methods in the next section from the perspective of embedding mechanisms.

3 Algorithm Taxonomy

In this section, we introduce these approaches categorized according to their embedding mechanisms. In detail, we propose a two-level classification ontology for these popular methods. Similar to the taxonomy of community oriented network embedding, we divide these into three categories, low-rank matrix factorization, random walk based and deep learning methods from the first level. Further more, with there embedding mechanisms and constraint information, we give a more refined classification taxonomy as shown in TABLE II. At the same time, we also list the tasks which can be served by different methods. Next, we will introduce these methods in detail.

Method Embedding Mechanism Conducted Tasks Year
Vis CLF/CLT ER/NA/SS LP
RolX[34]
low-rank
matrix
factorization
(Sec.3.1)
on structural feature
matrix (Sec.3.1.1)
2012
GLRD[25] 2013
RIDRs[31] 2017
GraphWave[16] 2018
HONE[77] 2020
xNetMF[33]
on structural similarity
matrix (Sec.3.1.2)
2018
EMBER[42] 2019
SEGK[61] 2019
REACT[67] 2019
SPaE[84] 2019
struc2vec[75]
random
walk-based
methods
(Sec.3.2)
on similarity-biased
random walks (Sec.3.2.1)
2017
SPINE[29] 2019
struc2gauss[66] 2020
Role2Vec[1]
on feature-based
random walks (Sec.3.2.2)
2019
RiWalk[96] 2019
NODE2BITS[41] 2019
DRNE[88]
deep
learning
(Sec.3.3)
via structural information
reconstruction/guidance
(Sec.3.3.1)
2018
GAS[30] 2020
RESD[103] 2021
GraLSP[46] 2020
GCC[73] 2020
TABLE II: A summary of role-oriented embedding methods. The abbreviations of tasks CLF, CLT, LP, ER, NA, SS and Vis denote node classification/clustering, link prediction, entity resolution/Network alignment/Top-k similarity search and visualization, respectively.

3.1 Low-rank Matrix Factorization

Low-rank matrix factorization is the most commonly used method for role-oriented embeddding methods. They generate embeddings by factorizing matrices preserving the role similarities between nodes implicitly (i.e. feature matrices) or explicitly (i.e. similarity matrices).

3.1.1 Structural Feature Matrix Factorization

RolX [34]

. RolX takes the advantages of feature extraction method ReFeX 

[35] by decomposing the ReFeX feature matrix . ReFeX firstly computes some primary features such as degree and clustering coefficient for each node. Then it aggregates neighbors’ features with sum- and mean-aggregator recursively. In recursive steps, it can capture very thorough features to express the structure of -hop reachable neighborhood. Non-negative Matrix Factorization (NMF) is used for generating embeddings as it is efficient compared with other matrix decomposition methods. The non-negative constraints are adapted to interpretation of roles. Thus, RolX aims to obtain two low-rank matrices as follows:

(1)

where is the embedding matrix (or role assignment matrix) and the matrix (role definition matrix) describes the contributions of each role to structural features. is the number of hidden roles which is determined by Minimum Description Length (MDL) [76].

GLRD [25]. GLRD extends RolX by adding different optional constraints to objective function (1). Sparsity constraint () is defined for more definitive role assignments and definitions while diversity constraint () is for reducing the overlapping. and are previously discovered role assignments and definitions with which alternativeness constraint () can be used for mining roles unknown.

RIDRs [31]. RIDRs uses -equitable refinement (ER) to partition nodes into different cells and compute graph-based features. An -equitable refinement partition of satisfies the following rule:

(2)

where denotes the number of nodes in cell connected to node . As nodes in the same cell have similar number of connections to the nodes in another cell, ERs could capture some connectivity patterns.

Based on the cells partitioned by ERs with an relaxation parameter , the feature matrix is defined as . After prunning and binning process, the feature matrices for all are concatenated as the final feature matrix . Finally, like RolX and GLRD, NMF is applied for embedding generation while right sparsity constraint (on ) is optional for more definitive role representations.

GraphWave [16]

. GraphWave treats graph diffusion kernels as probability distributions over networks and gets embeddings by using characteristic functions of the distributions. Specifically, take the heat kernel

with scaling parameter as an example, the spectral graph wavelets are defined as:

(3)

where

is the one-hot encoding matrix on

and the scaling parameter is omitted. The -th row represents the resulting signal from a Dirac signal around node . Considering the empirical characteristic function:

(4)

where denotes the imaginary number, ’s embedding vector is generated by concatenating pairs of and at evenly spaced points .

HONE [77]. HONE constructs weighted motif graphs in which the weight of an edge is the count of the co-occurrences of the two endpoints in a specific motif. For a motif represented by its weighted motif adjacency matrix , HONE characters the higher-order structure by deriving matrices from its k-step matrices . These new matrices are designed by imitating some popular matrices based on normal adjacency matrix such as transition matrix and Laplacian matrix . Here we use to denote the derived matrices. Then the k-step embeddings can be learned as:

(5)

where is the Bregman divergence and is a matching function. The global embeddings are generated by minimizing the following objective:

(6)

where is obtained by concatenating the with all the considered motifs and steps. If necessary, attributes diffused by transition matrix based on different motifs and steps can be added into .

Remark. Aforementioned methods assume that nodes in similar roles have similar structural features. Thus, they apply matrix factorization on the feature matrices to obtain role-based representations. RolX, GLRD and RIDRs directly get embeddings which give soft role assignment by factorizing feature matrices. As the feature matrices are usually lower dimension, these methods are quite efficient. GraphWave uses eigen-decomposition and empirical characteristic function to characterize the structural patterns of each node, which leads to robust embeddings but high computation cost. The weighted motif adjacency matrices in HONE capture higher-order proximities actually, while they can obtain structural information because each matrix represents one motif.

3.1.2 Structural Similarity Matrix Factorizaiton

xNetMF [33]. xNetMF is an embedding method designed for an embedding-based network alignment approach REGAL. It firstly obtains a node-to-node similarity matrix based on both structures and attributes:

(7)

where and are structure-based and attribute-based distance between node and node while and are balance parameters of the two distances. is the Euclidean distance between node features. And counts different attributes between nodes, i.e., . The feature matrix is defined by counting nodes with the same logarithmically binned degree in each node’s -hop reachable neighborhood as follows:

(8)

where is a discount factor for lessening the importance of higher-hop neighbors.

Then on computed , matrix factorization methods can be applied for obtaining embedding matrix satisfying . As the high dimension and rank of lead to high computation, an implicit matrix factorization approach extending Nyström method [17] is proposed as follows:

  1. Select nodes as landmarks randomly or based on node centralities.

  2. Compute a node-to-landmark similarity matrix with Eq.(7) and extract a landmark-to-landmark similarity matrix from .

  3. Apply Singular Value Decomposition on the pseudoinverse of

    so that .

  4. Obtain embedding matrix by computing and normailize .

With above method, embeddings are actually generated by factorizing a low-rank approximation of , i.e., . Meanwhile, the computation can be reduced, as only a small matrix is decomposed.

EMBER [42]. EMBER is designed for mining professional roles in weighted directed email networks. It defines node outgoing feature matrix as:

(9)

where denotes the product of all edge weights in a -step shortest outgoing path . The incoming feature matrix is defined similarly. By concatenating the incoming and outgoing feature matrices, the final feature matrix is obtained. The node-to-node similarities are computed through Eq.(7) without attribute-based distance, i.e., . EMBER uses the same implicit matrix factorization approach to generate embeddings. Note that if the feature extraction part of EMBER is applied on an undirected unweight network, EMBER will be equivalent to xNetMF without attributes.

SEGK [61]. SEGK leverages graph kernels to compute node structural similarities. To compare the structure more carefully, it computes node similarities with different scales of neighborhood as follows:

(10)

where and denotes the normalized kernel which is defined as:

(11)

SEGK chooses the shortest path kernel, Weisfeiler-Lehman subtree kernel, or graphlet kernel for practical use of . Then Nyström method [93] is employed on the factorization of for efficient computation and low dimensions of embeddings as follows:

(12)

where denotes the matrix of first eigenvectors and is the diagonal matrix of corresponding eigenvalues.

REACT [67]. REACT aims to detect communities and discover roles by applying non-negative matrix tri-factorization on RoleSim [45] matrix and adjacency matrix , respectively. RoleSim matrix is developed with the idea of regular equivalence and is a pair-wise similarity matrix computed by iteratively updating the following scores:

(13)

where is a matching between the neighborhoods of and , and () is a decay factor. In addition, norm is leveraged as the regularization to make the distribution of roles within communities as diverse as possible. Thus, the objective function of REACT is:

(14)

where / denotes the embedding matrix for roles/communities, and / denotes the interaction between roles/communities. is the weight of regularization. Orthogonal constraint on embedding matrices is added for increased interpretability.

SPaE [67]

. SPaE also tries to capture communities and roles simultaneously. For node structural similarity, it computes cosine similarity between the standardized Graphlet Degree Vectors of nodes, and generates role-based embeddings via Laplacian eigenmaps method as follows:

(15)

where is the symmetric normalized matrix of structural similarity matrix . SPaE obtains community-based embeddings similarly as follows:

(16)

where is the symmetric normalized adjacency matrix. To map and into a unified embedding space, SPaE generates hybrid embeddings by maximizing the following objective function:

(17)

where denotes the hybrid embedding matrix and is the balance parameter. and .

Remark. These methods all explicitly compute structural similarities based on features, e.g., graph kernels, role equivalence, and so on. Most of them have considered the similarities between multiple hops of neighborhoods. Their effectiveness on role discovery depends on the quality of the similarity matrices. One major problem of this kind of methods is the issue of efficiency: computing pair-wise similarity and factorizing the high-dimensional similarity matrix are time-consuming. So xNetMF, EMBER and SEGK apply Nyström method to improve the efficiency as their similarity matrices are Gram matrices [17].

3.2 Shallow Models Using Random Walks

Random walk is a common way to capture node proximity used by network embedding methods [72, 28]. Recently, two strategies have been proposed to adapt random walks to role-oriented tasks: (1) structural similarity-biased random walks makes structurally similar nodes more likely to appear in the same sequence (as shown in Fig. 4(b)). (2) structural feature-based random walks, e.g., attributed random walks [1], map nodes with similar structural features to the same role indicator and replace ids in random walk sequences with the indicators (see Fig. 4(c)). The first way can preserve structural similarity into co-occurrence relations of nodes in the walks. While the second way preserves structural similarity into role indicators and may capture the proximity between roles through the co-occurrence relations of the indicators as well.

Usually, language models such as Skip-Gram 

[60] are applied on generated random walks to map the similarities into embedding vectors [72, 28, 75, 1]. However, some different mapping mechanisms are also employed such as the SimHash [8] used in NODE2BITS [41].

(a) Normal Random Walks
(b) Structural Similarity-biased Random Walks
(c) Structural Feature-based Random Walks
Fig. 4: Different types of random walks. Note that is the biased transition matrix computed based on node structural similarities. is the role indicator mapped from node structural features.

3.2.1 Structural Similarity-biased Random Walks

struc2vec [75]. Struc2vec generates structurally biased id-based contexts via random walks on a hierarchy of constructed complete graphs.

In detail, it firstly computes structural distances between a pair of nodes as follows:

(18)

where denotes Dynamic Time Warping (DTW). is adopted as the distance function for DTW. is the diameter of the . is the ordered degree sequence of nodes at the exact distance from . Note that and is set to the constant 0.

Then a multi-layer weighted context graph is built. Each layer is an undirected complete graph . where the corresponding node of in -layer is denoted as . The weight of edge is defined as follows:

(19)

The neighboring layers are connected through directed edges between the corresponding nodes. Thus and . The edge weights between layers are defined as follows:

(20)

where counts the edge whose weight is larger than the average edge weight of . That is:

(21)

Then id-based random walks can be employed on and started in layer for context generation of each node. In detail, the walk stays in the current layer with a given probability . In this situation, the probability of a walk from to is:

(22)

With probability , the walk steps across layers with the following stepping probability:

(23)

Note that with different have the same id in the context.

On the structural context, struc2vec leverages Skip-Gram with Hierarchical Softmax to learn embeddings.

SPINE [29]. SPINE uses largest values of th row of Rooted PageRank Matrix as ’s feature . is the probability of stepping to a neighbor, while with probability , a walk steps back to the start node. For the inductive setting, SPINE computes via a Monte Carlo approximation.

To simultaneously capture structural similarity and proximity, SPINE designs a biased random walk method. With probability , the walk steps to a structural similar node based on the following transition matrix:

(24)

Here can be computed via DTW or other methods based on node features. With probability , normal random walks are applied. Thus, with larger , SPINE can be more role-oriented. The embeddings are learned through Skip-Gram with Negative Sampling (SGNS). To leverage attributes, the embeddings are generated as:

(25)

where represents the attribute matrix of which the rows correspond to largest values of .

is the weight matrix of the multi-layer perceptron (MLP).

struc2gauss [66]. For each node

, struc2gauss generates a Gaussian distribution:

to model both structural similarity and uncertainty. After calculating structural similarity via existing methods such as RoleSim [45], it samples the top- most similar nodes for a node as its positive set . The positive sampling of struc2gauss could be regarded as special random walks with mandatory restart on a star-shaped graph where the star center is the target node and star edges are the most similar nodes. The negative sample set has the same size of and is generated as in the normal random-walk based methods. To push the Gaussian embeddings of similar nodes closer and those of dissimilar nodes farther, struc2gauss uses the following max-margin ranking objective:

(26)

where is the margin parameter to push dissimilar distributions apart, and is the similarity measure between distributions of and . There are different similarity measures that can be used such as logarithmic inner product and KL divergence. For normal tasks, the mean vectors of those Gaussian distributions can be treated as embeddings, i.e., .

Remark. These methods reconstruct the edges between nodes based on the structural similarities so that the context nodes obtained by random walks are structurally similar to the central nodes. Compared with SPINE and struc2gauss, struc2vec clearly construct edges that better represent role information in the multi-layer complete graphs, which leads to better embeddings but higher time and space complexities.

3.2.2 Structural Feature-based Random Walks

Role2Vec [1]

. Role2Vec firstly maps nodes into several disjoint roles. Logarithmically binning, K-means with low-rank factorization and other methods on features and attributes can be chosen for the role mapping

. Motif-based features, such as Graphlet Degree Vectors, are recommended since motif can better capture the high-order structural information.

Then random walks are performed but the ids in generated sequences are replaced with role indicators. With feature-based role context, the language model CBOW can be used for obtaining embeddings of roles. Nodes partitioned into the same role have the same embeddings

RiWalk [96]. RiWalk designs structural node indicators approximating graph kernels. In a given subgraph , the indicator approximating shortest path kernel for node is defined as the concatenation of the degrees of and and the shortest path length between them:

(27)

where is a logarithmically binning function. The indicator approximating Weisfeiler-Lehman sub-tree kernel is defined as:

(28)

where is a vector of length whose -th element is the count of ’s neighbers at distance to in , i.e.:

(29)

Then the random walks starting from are performed on each . The nodes are relabeled indicated by Eq.(27) or Eq.(28) while only is not relabeled. And embeddings are learned via SGNS on the generated sequences.

NODE2BITS [41]. NODE2BITS is designed for entity resolution on temporal networks. Here we use to denote the the timestamp of edge . To integrate temporal information, NODE2BITS utilizes temporal random walks in which edges are sampled with non-decreasing timestamps. The following stepping probability is defined to capture short-term transitions in temporal walks:

(30)

where is the maximal duration between all timestamps. The stepping probability in long-term policy is defined similarly with positive signs. Multiple walks are generated for each edge and the temporal context of different hops for a node can be extracted from the walks. Then structual features and attributes are fused in temporal walks. For each node with a specific , histograms are applied on multi-dimensional features (and node types if the network is heterogeneous) to aggregate information in the neighborhood and they are concatenated as a vector . SimHash [8] is applied by projecting the histogram

to several random hyperplanes for generating binary hashcode

. The final embeddings are obtained via concatenation on across different s.

Remark. The above three methods are very different on their motivations of utilizing structural feature-based random walks. Role2vec assigns roles firstly and then employs random walks with role indicators. It essentially captures proximity between assigned roles. RiWalk relabels the walks in subgraphs to approximate graph kernels. NODE2BITS uses random walks as neighbor feature aggregators.

3.3 Deep Learning Models

Recently, a few works focus on leveraging deep learning techniques to role-oriented network representation learning. Though deep learning can provide more varied and powerful mapping mechanisms, it needs to be trained with more carefully designed structural information guidance.

3.3.1 Structural Information Reconstruction/Guidance

DRNE [88]. DRNE is proposed to capture regular equivalence in networks, so it learns node embeddings in a recursive way with the following loss function:

(31)

where

is the aggregation of the neighbors’ embeddings via a layer normalized Long Short-Term Memory. To make the neighbor information available for LNLSTM, for each node

, it downsamples a fixed number of neighbors with large degrees and orders them based on the degrees. Denoting their embeddings as the aggregating process is and finally .

Additionally, DRNE proposes a degree-guided regularizer to avoid the trivial solution where all embeddings are . The regularizer is as follows:

(32)

The regularizer with a parameter is weighed and the whole model is trained via the combined loss:

(33)

GAS [30]

. Graph Neural Networks have the power to capture structure as they are closely related to Weisfeiler-Lehman (WL) test in some ways 

[95]. GAS applies a -layer graph convolutional encoder, in which each layer is :

(34)

where and is the parameter matrix in the -th layer. The input could be or an embedding lookup table. Here the sum-pooling propagation rule is applied instead of the original GCN [48] to better distinguish local structures. In fact, more powerful GNNs such as Graph Isomorphic Network [95] may further improve the performance. The key idea for GAS is that using a few critical structural features as the guidance information to train the model. The features are extracted in a similar way proposed in ReFeX but aggregated only once, normalized and not binned. With a MLP model as the decoder to approximate the features, i.e., . The loss function is:

(35)

RESD [103]. RESD also adopts ReFeX [35] to extract appropriate features . It uses a Variational Auto-Encoder [47] architecture to learn the low-noise and robust representations:

(36)

The VAE model is trained via feature reconstruction. A degree-guided regularizer Eq.(32) designed in DRNE [88] is introduced in RESD for preserving topological characteristics. The combined objective is as follows:

(37)

GraLSP [46]. GraLSP is a GNN framework integrating local structural patterns that can be employed on role-oriented tasks. For a node , it captures structural patterns by generating random walks starting from with length : , and then anonymizes them  [39]. Each anonymous walk is represented as an embedding lookup table . Then the aggregation of neighborhood representation is designed as follows:

(38)

where and are trainable parameter matrices. is learned attention values based on their local structure:

(39)

denotes a single-layer perceptron. is the amplification coefficients:

(40)

To preserve proximities between nodes, the loss function in DeepWalk [72] is leveraged:

(41)

After aggregations, the embeddings are . To capture structural similarities between nodes, GraLSP designs the following loss:

(42)