1. Introduction
Network (or graph) embedding, which involves learning low dimensional feature representations of nodes and links, has in recent years become a popular topic of research. Among them, structural role embedding is one type of embedding method that focuses on identifying nodes serving different “functions” in a network (e.g., acting as a bridge between two communities or being the center of a community). Different from local proximity (the focus of methods like DeepWalk (Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016)), nodes far apart in a network and having different local contexts can be similar in their structural role identity. Several structural role embedding methods have been proposed in recent years. Among them, struc2vec (Ribeiro et al., 2017) and GraphWave (Donnat et al., 2018) are two representative methods.
However, current methods approach node structural role embedding either through indirect modeling (such as GraphWave) or nonprecise methods (such as struc2vec). For example, struc2vec constructs a weighted multilayer graph that uses random walks to generate a context for each node, which is then fed into a language model. Basically, their model is actually based on the assumption that two nodes are structurally similar if and only if the context generated by the random walk is similar. However, the randomness of random walk makes the embeddings imprecise, often leading to nodes with the exact same role having different (though admittedly, similar) embeddings.
GraphWave, on the other hand, defines a wavelet coefficient matrix and uses the distribution of energy that comes from other nodes to model node roles. Though this method can perfectly preserve the roles of nodes, it cannot capture subtle dissimilarities between roles, given that is based on indirect modeling of node roles (i.e., the energy distribution of nodes). Furthermore, this indirect modeling approach is not flexible as it relies on specific structural similarity definitions based on the energy distributions of the nodes.
We propose a direct and precise (without approximation or indirect modeling) embedding method for node structural role identity using stress majorization. Though perfectly preserving role similarities between nodes in embedding space is impossible (since embedding reduces the dimensions, leading to inevitable information loss), our method minimizes this information loss. Moreover, our method is also flexible, in that it does not rely on specific structural similarity definitions. Specifically, in this paper, we make the following contributions:

We present a novel and flexible structural embedding framework, using stress majorization, that can directly and precisely capture the role structural identities and similarities of nodes in networks. Our method is also flexible, in that it does not rely on specific structural similarity definitions.

We prove mathematically that our method embeds nodes with the same roles into the exact same position in the embedding space.

We evaluate our method on the fundamental tasks of node classification, clustering, and visualizations on three realworld and five synthetic networks. Our experiments show that our framework outperforms existing methods in learning node role representations.
2. Related Work
As mentioned in the introduction, microscopicstructure preserving embedding methods (Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015; Wang et al., 2021a, c) can not capture the roles of nodes in networks. Besides struc2vec and GraphWave, discussed in the introduction, struc2gauss (Pei et al., 2020)
is a newer method for structural role preserving embedding. struc2gauss first generates structural context for each node and then learns representations from Gaussian embeddings. A Gaussian distribution is used to represent each node: the mean is used to represent the position and the covariance is used to represent the uncertainty. This method, like struc2vec, is also imprecise. There are also several related methods that focus on concepts related to node structural role embedding. DRNE
(Tu et al., 2018) introduces a concept similar to structural roles called regular equivalence and uses a layer normalized LSTM (Hochreiter and Schmidhuber, 1997) to learn the representations of nodes through aggregating their neighborhoods in a recursive way. RolX (Henderson et al., 2012) gives a mixedmembership approach that uses nonnegative matrixfactorization to assign every node a distribution over the set of identified roles. Our own prior work tackles embedding node role identity into hyperbolic space (Wang et al., 2020) and embedding role identity over time (Wang et al., 2021b). Finally, SNS (Lyu et al., 2017) uses graphlets for structural similarity, and combines neighborhood information and local subgraphs similarity to learn embeddings.3. Framework
Let be an undirected and unweighted network, where is a set of vertices, and is the set of unweighted edges between vertices in . We consider the problem of representing all the nodes in
as a set of ddimensional vectors
, with Our framework consists of two parts: calculation of the structural role distances of nodes and the use of stress majorization to generate embeddings.3.1. Calculation of Structural Role Distance
The structural role distance in our model can use any of the known measurements. In this paper, we use the similarity of hop degree sequence defined by struc2vec. Let denote the ordered degree sequence of the nodes at exactly hop count from in . The structural role similarity of two nodes and considering their hop neighbors can be defined as the similarity of the two ordered sequences and . Since these two sequences may not have equal sizes, we use Fast Dynamic Time Warping (FastDTW) (Salvador and Chan, 2007) to measure the distance between two ordered degree sequences. FastDTW is able to find the optimal alignment between two arbitrary length time series and limits both the time and space complexity to . Since elements of the sequences and are degrees of nodes, we adopt the following cost function of th and th elements in the above two sequences for FastDTW:
(1) 
Instead of measuring the absolute difference of degrees, this distance metric measures the relative difference which is more suitable for degree differences (since the degrees can be arbitrarily large). The structural distance of two nodes and considering their hop neighborhoods can be defined as:
(2) 
Where is the importance weight of each hop. In our experiment, we set all the to be equal. When we set the to be the diameter of the graph, depicts the structural role distance of these two nodes.
3.2. Stress Majorization
After we calculate the pairwise structural role distances, our goal is to embed the nodes into low dimensional space and make the pairwise distances of the nodes equal or close to the structural role distances. Notice that the structural role distances may not obey triangle inequality, which means that classical Multidimensional Scaling (MDS) (Torgerson, 1958) cannot be used and thus we adopt stress majorization (Kruskal, 1964; Borg and Groenen, 2005) here. We use an matrix to represent the dimensional embedding vectors, with the row vectors . The matrix is the structural role distance matrix, with . Given these definitions, we define the stress function as
(3) 
The problem can be formulated as seeking a matrix to minimize the stress function given . The theorem and corresponding proof below give a bound of the stress function.
Theorem 3.1 ().
The Laplacian matrix is defined as
For any matrix , we define matrix as
where is the th row vector of matrix . And the function is defined as
(4) 
We must have
(5) 
The equality holds when .
Proof.
Expanding the definition of stress function and we can get as
(6) 
Notice that the second term is a quadratic form, which can be written in matrix form:
(7) 
In this way, we just have to show that
(8) 
According to CauchySchwartz inequality,
(9) 
Therefore, the third term can be bounded as follows
(10) 
Write the righthand side in matrix form, we get
(11) 
Combining (10) and (11), we get (8), and thus the theorem has been proved. ∎
For a given , is minimized when . This means that we need to solve
(12) 
For a given layout , we take which makes
(13) 
If , we must have
(14) 
Now we can design an iterative optimization process as shown below:
Step 1 Initiate .
Step 2 Calculate from by solving .
Step 3 Check whether the error is below tolerance. If
terminate the process, else go back to Step 2. Typically, we set . Run this algorithm until it converges. The final is the embedding matrix that we want.
3.3. Proof of Structurally Equivalent Nodes
In this section, we prove the claim we proposed in the introduction, which is that nodes with the same roles will be embedded into the exact same position.
Theorem 3.2 ().
Assume that we have and , with for any , and . The result of stress majorization must follow .
Proof.
If , without loss of generality, we can assume and
(15) 
Let us construct a new solution , with row vectors following
(16) 
It is obvious that
(17) 
Considering , we get
which contradicts with the definition of stress majorization. Therefore, the same roles must be embedded into the same position. ∎
4. Experiment
In this section, we evaluate and compare our method on qualitative (visualization) and quantitative (node classification and clustering) tasks on realworld and synthetic networks.
4.1. Node Structural Role Embedding Visualization
The first set of experiments involves qualitative evaluation of the generated embeddings through visualization on a synthetic Barbell graph. The barbell graph consists of two complete subgraphs connected by a “bridge”. Figure 1(a) shows the barbell graph used in our experiments, where each subgraph has 10 nodes and the bridge has a length of 11. In the figure, the structurally equivalent nodes have the same color. Figure 1(b) shows the embeddings generated by node2vec, which is a representative example of local proximity methods. As can be seen, node2vec does not capture structural role identities. This is to be expected as node2vec generates embeddings based on the local context of nodes (this can be seen in the figure, where the nodes of the two complete subgraphs are placed separately and next to the nodes in the bridge that are close to them). Figures 1(c), (d), and (e), show the results of the three structural role embedding methods, struc2vec, GraphWave, and our method, respectively. We use struc2vec as a representative method of all the nonprecise structural role embedding methods (like struc2gauss and DRNE). Compared to our method and GraphWave, struc2vec cannot embed structural identical nodes precisely. This is because, as explained earlier, struc2vec uses the context generated by random walk to model similarity, which makes the embeddings stochastic and imprecise. GraphWave and our method both can embed the nodes with the same role precisely (as shown by the overlap of nodes with the same color). However, whereas our model perfectly captures the distance between nodes in the “bridge” (grey, purple, yellow, teal, red, and green nodes), GraphWave does not. For instance, the greypurple and tealred pairs are embedded much closer to each other than the purpleyellow and yellowteal pairs, which is not correct. The pairs are correctly positioned with similar distances to each other by our method. Also, the green nodes have a special role as they are the connections between the subgraphs and the bridge. Our method correctly captures all the role information of green nodes. First, they are embedded close (but not identical) to the dark blue nodes, because they are all part of the subgraph. Second, since the green nodes also serve as the nodes on the bridge that are the closest to the subgraphs, the role similarity between the green nodes and other nodes in the bridge should depend on their distance to the subgraph; our model perfectly captures this by having red nodes being closest to the green nodes, followed by teal, yellow, purple, and finally gray nodes. GraphWave, on the other hand, has embedded all the other bridge nodes with almost equal distance to the green nodes. This shortcoming of GraphWave may be attributed to their use of heat wavelet diffusion to indirectly model roles.


Shapes  Method 







House  node2vec  0.005  0.005  0.330  
RolX  1.000  1.000  1.000  
struc2vec  0.995  0.995  0.451  
GraphWave  1.000  1.000  1.000  
DRNE  0.697  0.850  0.832  
struc2gauss  0.836  0.920  0.457  
our method  1.000  1.000  1.000  



node2vec  0.030  0.032  0.276  
RolX  0.570  0.588  0.346  
struc2vec  0.206  0.235  0.180  
GraphWave  0.547  0.566  0.374  
DRNE  0.493  0.564  0.880  
struc2gauss  0.162  0.323  0.005  
our method  0.525  0.603  0.902  


Varied  node2vec  0.244  0.216  0.400  
RolX  0.841  0.862  0.736  
struc2vec  0.629  0.578  0.240  
GraphWave  0.828  0.852  0.816  
DRNE  0.630  0.904  0.737  
struc2gauss  0.210  0.616  0.078  
our method  0.888  0.938  0.991  



node2vec  0.303  0.265  0.360  
RolX  0.638  0.627  0.418  
struc2vec  0.457  0.433  0.289  
GraphWave  0.697  0.680  0.516  
DRNE  0.488  0.651  0.728  
struc2gauss  0.116  0.458  0.039  
our method  0.506  0.713  0.920  

4.2. Node Clustering
The next set of experiments involve quantitative evaluations of the generated embeddings through node clustering. For these experiments, we use the same synthetic graphs and metrics used by GraphWave (Donnat et al., 2018). For evaluation, we use agglomerative clustering with single linkage to cluster embeddings and report the average homogeneity, completeness (Rosenberg and Hirschberg, 2007), and the silhouette score (Rousseeuw, 1987) after 25 runs (the random seeds to generate the cycles are set to 0 at the beginning of the first run). The results are shown in Table 1. For the baselines node2vec, struc2vec, RolX, and GraphWave, we report the results provided by Donnat et al. ((2018)). Our model, along with RolX and GraphWave, achieves perfect scores for the house setting. In all other settings, our model outperforms all the baselines for all the shapes and metrics with two exceptions (out of 9), where GraphWave and RolX generate better results in the homogeneity metric for the perturbed settings.
4.3. Node Classification


Brazilian  American  European  


our method  0.784  0.657  0.601 
GraphWave  0.778  0.631  0.571 
DRNE  0.776  0.578  0.533 
struc2gauss  0.314  0.351  0.310 
struc2vec  0.732  0.651  0.577 
node2vec  0.267  0.473  0.329 

The final set of experiments involve node classification on three realworld datasets provided by Ribeiro et al. (Ribeiro et al., 2017)
: Brazilian, American, and European airtraffic networks. The nodes correspond to airports and are labeled with one of four possible labels, based on their activity. We use all the baselines and our method to extract embeddings from each network and then run 10fold crossvalidation using a support vector machine implemented using scikitlearn
(Pedregosa et al., 2011). Table 2 shows the node classification results. Our method outperforms all the baselines across all three datasets. This is further quantitative evidence of our method’s superiority in embedding node structural roles.5. Conclusion
In this paper, we introduced a novel and flexible structural role embedding framework using stress majorization, which can directly and precisely capture the role structural identities and similarities of nodes in networks. We also provided a strictly mathematical proof that nodes with the same roles overlap perfectly in the embedding space when embedded using our framework. We validated our method through qualitative and quantitative evaluations on synthetic and realworld datasets, showing that our method outperforms other wellknown related methods in learning node role representations, across all tasks. The code and data for this paper will be made available upon request.
References
 Modern multidimensional scaling: theory and applications. Springer Science & Business Media. Cited by: §3.2.
 Learning structural node embeddings via diffusion wavelets. In Proceedings of the 24th KDD, pp. 1320–1329. Cited by: §1, §4.2, Table 1.
 Node2vec: scalable feature learning for networks. In Proceedings of the 22nd KDD, pp. 855–864. Cited by: §1, §2.
 Rolx: structural role extraction & mining in large graphs. In Proceedings of the 18th KDD, pp. 1231–1239. Cited by: §2.
 Long shortterm memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §2.
 Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29 (1), pp. 1–27. Cited by: §3.2.
 Enhancing the network embedding quality with structural similarity. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 147–156. Cited by: §2.

Scikitlearn: machine learning in Python
. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §4.3.  Struc2gauss: structural role preserving network embedding via gaussian embedding. DATA MINING AND KNOWLEDGE DISCOVERY. Cited by: §2.
 Deepwalk: online learning of social representations. In Proceedings of the 20th KDD, pp. 701–710. Cited by: §1, §2.
 Struc2vec: learning node representations from structural identity. In Proceedings of the 23rd KDD, pp. 385–394. Cited by: §1, Figure 1, §4.3.

Vmeasure: a conditional entropybased external cluster evaluation measure.
In
Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLPCoNLL)
, pp. 410–420. Cited by: §4.2. 
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
. Journal of computational and applied mathematics 20, pp. 53–65. Cited by: §4.2.  Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11 (5), pp. 561–580. Cited by: §3.1.
 Line: largescale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067–1077. Cited by: §2.
 Theory and methods of scaling.. Cited by: §3.2.
 Deep recursive network embedding with regular equivalence. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2357–2366. Cited by: §2.

Embedding heterogeneous networks into hyperbolic space without metapath.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 35, pp. 10147–10155. Cited by: §2.  Dynamic structural role node embedding for user modeling in evolving networks. ACM Trans. Inf. Syst.. Cited by: §2.
 Hyperbolic node embedding for temporal networks. Data Mining and Knowledge Discovery, pp. 1–35. Cited by: §2.
 Embedding node structural role identity into hyperbolic space. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2253–2256. Cited by: §2.