DeepAI

# Embedding Node Structural Role Identity into Hyperbolic Space

Recently, there has been an interest in embedding networks in hyperbolic space, since hyperbolic space has been shown to work well in capturing graph/network structure as it can naturally reflect some properties of complex networks. However, the work on network embedding in hyperbolic space has been focused on microscopic node embedding. In this work, we are the first to present a framework to embed the structural roles of nodes into hyperbolic space. Our framework extends struct2vec, a well-known structural role preserving embedding method, by moving it to a hyperboloid model. We evaluated our method on four real-world and one synthetic network. Our results show that hyperbolic space is more effective than euclidean space in learning latent representations for the structural role of nodes.

• 22 publications
• 10 publications
• 4 publications
• 49 publications
10/29/2019

### Hyperbolic Node Embedding for Signed Networks

The rapid evolving World Wide Web has produced a large amount of complex...
10/15/2019

### RiWalk: Fast Structural Node Embedding via Role Identification

Nodes performing different functions in a network have different roles, ...
09/10/2020

### Optimisation of the coalescent hyperbolic embedding of complex networks

Several observations indicate the existence of a latent hyperbolic space...
09/14/2021

### Embedding Node Structural Role Identity Using Stress Majorization

Nodes in networks may have one or more functions that determine their ro...
04/11/2017

### struc2vec: Learning Node Representations from Structural Identity

Structural identity is a concept of symmetry in which network nodes are ...
12/05/2021

### Trivial bundle embeddings for learning graph representations

Embedding real-world networks presents challenges because it is not clea...
02/12/2019

### Hyperbolic Disk Embeddings for Directed Acyclic Graphs

Obtaining continuous representations of structural data such as directed...

## 1. Introduction

Most network embedding methods focus on preserving local structure information among connected vertices in their neighborhoods, like first-order, second-order, and high-order proximity. Using language models to preserve the microscopic structure of networks was first proposed by Perozzi et al. in their work DeepWalk (Perozzi et al., 2014). This method uses random walks to generate random sequences of nodes from the network, which are then treated as sentences by a Skip-Gram model (Mikolov et al., 2013). Grover et al. (Grover and Leskovec, 2016) demonstrated that DeepWalk can not accurately capture the diversity of connectivity patterns in a network and introduced node2vec

. They defined a flexible notion of a node’s network neighborhood and designed a second-order random walk strategy to sample the neighborhood nodes. The method can smoothly interpolate between breadth-first sampling (BFS) and depth-first sampling (DFS). However, a limitation of these methods is that they can not capture structural role proximities.

The structural role proximity depicts similarity between vertices serving similar “roles” in the network, such as being the center of a community, or a bridge between two communities. Different from the th-order proximity, which captures the local similarity between nodes, the structural role proximity tries to discover the similarity between nodes far away from each other (or even disconnected) but sharing the equivalent structural roles. One of the early unsupervised methods for learning structural node embeddings is RolX (Henderson et al., 2012)

. The method is based on enumerating various structural features for nodes in a network, finding the more suited basis vector for this joint feature space, and then assigning for every node a distribution over the identified roles.

struc2vec (Ribeiro et al., 2017) determines the structural similarity between each node pair in the graph considering -hop count neighborhood sizes. It constructs a weighted multilayer graph to generate a context for each node. GraphWave (Donnat et al., 2018) uses one matrix factorization method based on the assumption that if two nodes in the network share similar structural roles, the graph wavelets starting at them will diffuse similarly across their neighbors.

There also has been a relatively recent push for embedding networks into hyperbolic space. This has come with the realization that complex networks may have underlying hyperbolic geometry. This is because hyperbolic geometry can naturally reflect some properties of complex networks (such as the hierarchical and scale-free structures) (Krioukov et al., 2010). An emerging network embedding approach is to embed networks into hyperbolic space (Nickel and Kiela, 2017; Alanis-Lobato et al., 2016; De Sa et al., 2018; Muscoloni et al., 2017; McDonald and He, 2019; Wang et al., 2019). For instance, HEAT (McDonald and He, 2019) learns embeddings form attributed networks and HHNE (Wang et al., 2019) learns embeddings form heterogeneous information network in hyperbolic space.

However, to the best of our knowledge, none of the existing hyperbolic embedding methods can capture the structure role equivalence. To fill this gap, we present a framework to embed the structural roles of nodes into hyperbolic space. Our framework extends struct2vec, a well-known structural role preserving embedding method, by moving it to a hyperboloid model.

## 2. Our Framework

Let be a undirected and unweighted network, is a set of vertices and is the set of unweighted edges between vertices in . We consider the problem of representing a graph as set of low-dimensional vectors into the -dimensional hyperboloid with The described problem is unsupervised. Our framework consists of two parts: building the multi-layer graph which measures the structural similarity between node pairs, and using the context of each node generated by a biased random walk to learn hyperboloid embeddings.

### 2.1. Constructing the Multi-layer graph

The architecture presented in this paper can use any of the known approaches for node structural embeddings to generate the node context. In this paper, we extended struct2vec, the framework proposed by Ribeiro et al. (Ribeiro et al., 2017). Let denote the ordered sequence of the degree of the nodes at distance exactly from in (hop count). The structural role similarity of two nodes and considering the set of nodes of distance from them can be defined as the similarity of the two ordered sequences and . Note that these two sequences may not have equal sizes and their elements are integers in the range . We use Fast Dynamic Time Warping (FastDTW)(Salvador and Chan, 2007) to measure the distance between two ordered degree sequences. The dynamic time warping algorithm (DTW) is able to find the optimal alignment between two arbitrary length time series, but has a quadratic time and space complexity that limits its use to only small time series data sets. The FastDTW is an approximation of DTW which limits both the time and space complexity to . Since elements of the sequences and are degrees of nodes, we adopt the following distance function of th and th element in the above two sequences for FastDTW as follows:

 (1) DTWdis(Hopik(u),Hopjk(v))=max(Hopik(u),Hopjk(v))min(Hopik(u),Hopjk(v))−1

Instead of measuring the absolute difference of degrees, this distance measures the relative difference which is more suitable for degree differences. The structural role distance of two nodes and considering their -hop neighborhoods can be defined as:

 (2) distancek(u,v)=k∑i=0DTW(Hopi(u),Hopi(v))

Next, we construct a multilayer weighted graph that encodes the structural similarity between nodes. Each layer is constructed by a weighted undirected complete graph with all the nodes of the original graph . The edges of inside layer are defined as:

 (3) w(Muk,Mvk)=e−distancek(u,v),k=0,…,diameter

Note that if a node has too many or too few structurally similar nodes in the current layer , then it should change layers to obtain a more refined context. By moving up one layer the number of similar nodes will decrease, and by moving down one layer the number of similar nodes will increase. Thus, we define the inter-layer edges as follows:

 (4) w(Muk,Muk+1)=log(Lk(u)+e),k=0,…,diameter−1w(Muk,Muk−1)= 1, ,k=1,…,diameter

where denotes how many nodes are structurally similar in layer , which is the number of incoming edges to that have weight larger than the average weight of layer , more specifically:

 (5) Lk(u)=∑v∈V1⎛⎜ ⎜ ⎜⎝w(Muk,Mvk)>∑u′,v′∈Vw(Mu′k,Mv′k)(|V|2)⎞⎟ ⎟ ⎟⎠

We then adopt a random walk method to obtain the structural preserving context of each node. For each step, it can either walk inside one layer or walk between layers. We define the layer-change constant

, such that for each step the probability of staying in the current layer is

and the probability of going up or down one layer is . Thus, given the current node , the normalized probability of moving to a current layer node is:

 (6) p(Mvk|Muk)=(1−α)w(Muk,Mvk)∑u,v∈Vw(Muk,Mvk)

The normalized probability of moving to a node in the layer above, , is:

 (7) p(Muk+1|Muk)=αw(Muk,Muk+1)w(Muk,Muk+1)+w(Muk,Muk−1)

And similarly, the normalized probability of moving to a node in the layer below, , is:

 (8) p(Muk−1|Muk)=αw(Muk,Muk−1)w(Muk,Muk+1)+w(Muk,Muk−1)

### 2.2. Learning a Hyperboloid Model

Finally, we train a hyperboloid model on the generated random walk sequences to obtain structural role preserving embeddings. Hyperbolic space is a homogeneous space with constant negative curvature. It can not be embedded into the Euclidean space without distortion, however, there are several hyperbolic models that allow calculation of gradients. The most commonly used ones are hyperboloid, Poincaré ball, and Poincaré half-space. Unlike previous works using Poincaré ball model and approximate gradients, we use the hyperboloid model for network embedding because the gradient computation of this model is exact (Wilson and Leimeister, 2018)

and we can adopt a Support Vector Machine (SVM) on it

(Cho et al., 2019).

#### 2.2.1. Review of the Hyperboloid Model

The hyperboloid model has many similarities to the sphere model. Analogous to the sphere in the ambient Euclidean space, the hyperboloid model can be viewed as a “pseudo-sphere” in an ambient space called the . Consider an (n+1)-dimensional space equipped with an inner product whose form is given by:

 (9) M=n∑i=1uivi−un+1vn+1.

We use for the notation of Minkowski space. Analogous to the unit sphere in Euclidean space, the hyperboloid can be described using the following equation:

 (10) Hn={x∈Rn:1|M=−1},

For a given vector , the tangent space at that point is a set of points with the form

 (11) TpHn={x∈Rn:1|M=0}.

#### 2.2.2. Gradient calculation on the Hyperboloid Hn

Analogous to the case of sphere, The calculation of the gradient of a given function defined on has several steps (Wilson and Leimeister, 2018).

Calculate the gradient of E in the ambient space, i.e.

 (12) ∇Rn:1pE=(∂E∂x1|p,...,∂E∂xn|p,−∂E∂xn+1|p)∈Rn:1.

Project that vector onto the tangent space . Notice that the sign is flipped in the expression of the projected vector:

 (13) ∇HnpE=∇Rn:1pE+M⋅p∈TpHn.

Map the gradient vector onto the hyperboloid. This operation is called exponential map.

 (14) Expp(v)=cosh(∥v∥)p+sinh(∥v∥)v∥v∥∈Hn,

where .

#### 2.2.3. Hyperboloid Embedding Learning

After generating the random walk sequences, we use a sliding window to scan all the sequences and add pairs of nodes that appear within the window to a multi-set as all the positive sample pairs. Note that different from common sampling methods, each pair of nodes and can appear multiple times in . Intuitively, the number of times a pair is sampled indicates the importance of that pair. In prior work (Nickel and Kiela, 2017), Nickel et al used the distance of two nodes to define the possibility of a link. Similarly, we define the structural role similarity of two nodes to be their distance in the embedded hyperbolic space: nodes close to each other share a high similarity and vice versa. We define the structural role distance between nodes and as

 (15) exp(−d2Hn(xu,xv))∑v′∈Vexp(−d2Hn(xu,xv′)),

where is the embedding of node in the hyperboloid model. can be calculated by . However, computing the gradient of Equation 15 involves a summation over all the nodes of

and is inefficient for large networks. To address this, we leverage the negative sampling method which samples a small number of negative objects to enhance the influence of positive samples. As a result, our loss function

for an embedding can be written as following:

 (16) L(Θ)=−1|O|∑(u,v)∈Ologexp(−d2Hn(xu,xv))exp(−d2Hn(xu,xv))+∑i<=M,v′i∈P(u)exp(−d2Hn(xu,xv′i))

where is the negative sampling set with probability proportional to the occurrence frequency of in , is the number of negative samples. The calculation of its gradient follows Equation. 12, 13 and 14, which enables the gradient decent for model learning.

## 3. Experiments

We use the same five datasets used by Leonardo et al (Ribeiro et al., 2017): one synthetic barbell graph; four real-word datasets: Brazilian, American and European air-traffic network and karate network.

### 3.1. Model Training

For random on the multi-layer graph, the layer-change constant

is set to 0.7, and we do 8 random walks from each node in the training set with the length of 10. (Contrast that with the classic struct2vec in Euclidean space that needs to set the number of random walks to 80. Our method reduces the need for random walks, which are computationally expensive, by 90%) For training the hyperboloid model, we use a sliding window of size 3 to generate positive samples. For the hyperboloid embedding learning, we generate 20 negative samples for each positive ones, and use the learning rate of 1 and a batch size of 50 to train 5 epochs.

### 3.2. Barbell Graph

We consider the barbell graph which consists of two complete subgraphs connected by a long path. Figure 1(a) shows the barbell graph used in the experiment, where the structurally equivalent nodes have the same color. The result of RolX is in Figure 1(b), although RolX captures some structural role identity, all the blue nodes are placed in three different roles (0,2 and 5). Also, role 4 contains all the nodes in the path, but actually they are not exactly similar. Figure 1(c) shows the results of node2vec, it does not capture structural role identities and the nodes of two parts of the complete graph are placed separately, along with the nodes in the path close to them. Figure 1(e) shows our results on a 2-dimensional Poincaré ball, compared with struct2vec results in Euclidean space (Figure 1(d)), our method captures structural equivalence more accurately. Moreover, we only do 8 random walks of length 10 from each node in the hyperboloid model, and struct2vec needs to set the number of random walk to 80 to generate an accurate result, which also indicates the superiority of hyperbolic space in learning structural role equivalence.

### 3.3. Karate Network

The Zachary’s Karate Club (Zachary, 1977) is a network of 34 nodes: each node represents a club member and edges among them denote if two members have interacted. The network used in the experiment (Figure 2(a) ) is composed of two copies of the Karate Club network, where each node has a mirror node and one edge has been added between mirrored node pairs 1 and 37. Figure 2(b) shows the roles identified by RolX, only 7 out of 34 corresponding pairs are placed in the same role. Result of node2vec is shown in Figure 2(c), since this method only captures microscopic structural information, the two parts of the network are placed separately since there is only one edge that connects them. The corresponding pairs of our result on a 2-dimensional Poincaré ball (Figure 2(e)) are more close than the result of struct2vec (Figure 2(d)). Moreover, different roles’ embeddings generated by struct2vec are more likely to bunch together in Euclidean space. In hyperbolic space, however, these embeddings are located more sparsely, which indicates a better ability to distinguish different roles.

### 3.4. Node classification

We also test our method on three real-world datasets provided by Leonardo et al (Ribeiro et al., 2017)

: Brazilian, American and European air-traffic networks. The nodes correspond to airports and edges indicate the existence of commercial fights. For each airport, one of four possible labels is assigned corresponding to their activity (divided evenly into four quartiles). Thus, each class represents a ”role” played by the airport (e.g, major hubs). The task here is to predict the role of an airport. We train all the three models on each network to get embeddings and use a 10-fold cross-validation for the evaluation. For our model, we use a hyperbolic SVM

(Cho et al., 2019)

as the classifier, and for the other two Euclidean models

struct2vec and node2vec, we use the classic Euclidean SVM. Table 1 shows the node classification results where our model outperforms the baselines.

## 4. Conclusion

In this paper, we present a novel method for embedding nodes of a network into hyperbolic space which preserves structure role information. To the best of our knowledge, this is the first attempt at a hyperbolic model that can learn node structural role proximity. Our algorithm outperforms several baselines on a synthetic barbell graph and four real-world temporal datasets for embeddings visualization and node classification. The code and data for this paper will be made available upon request.

## References

• G. Alanis-Lobato, P. Mier, and M. A. Andrade-Navarro (2016)

Manifold learning and maximum likelihood estimation for hyperbolic network embedding

.
Applied network science 1 (1), pp. 10. Cited by: §1.
• H. Cho, B. DeMeo, J. Peng, and B. Berger (2019) Large-margin classification in hyperbolic space. In

The 22nd International Conference on Artificial Intelligence and Statistics

,
pp. 1832–1840. Cited by: §2.2, §3.4.
• C. De Sa, A. Gu, C. Ré, and F. Sala (2018) Representation tradeoffs for hyperbolic embeddings.

Proceedings of machine learning research

80, pp. 4460.
Cited by: §1.
• C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec (2018) Learning structural node embeddings via diffusion wavelets. In Proceedings of the 24th KDD, pp. 1320–1329. Cited by: §1.
• A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd KDD, pp. 855–864. Cited by: §1.
• K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Faloutsos, and L. Li (2012) Rolx: structural role extraction & mining in large graphs. In Proceedings of the 18th KDD, pp. 1231–1239. Cited by: §1.
• D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná (2010) Hyperbolic geometry of complex networks. Physical Review E 82 (3), pp. 036106. Cited by: §1.
• D. McDonald and S. He (2019) HEAT: hyperbolic embedding of attributed networks. arXiv preprint arXiv:1903.03036. Cited by: §1.
• T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In NIPS 2017, pp. 3111–3119. Cited by: §1.
• A. Muscoloni, J. M. Thomas, S. Ciucci, G. Bianconi, and C. V. Cannistraci (2017) Machine learning meets complex networks via coalescent embedding in the hyperbolic space. Nature communications 8 (1), pp. 1615. Cited by: §1.
• M. Nickel and D. Kiela (2017) Poincaré embeddings for learning hierarchical representations. In NIPS 2017, pp. 6338–6347. Cited by: §1, §2.2.3.
• B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th KDD, pp. 701–710. Cited by: §1.
• L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo (2017) Struc2vec: learning node representations from structural identity. In Proceedings of the 23rd KDD, pp. 385–394. Cited by: §1, §2.1, Figure 1, Figure 2, §3.4, §3.
• S. Salvador and P. Chan (2007) Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11 (5), pp. 561–580. Cited by: §2.1.
• X. Wang, Y. Zhang, and C. Shi (2019) Hyperbolic heterogeneous information network embedding. In Proceedings of AAAI 2019, Cited by: §1.
• B. Wilson and M. Leimeister (2018) Gradient descent in hyperbolic space. arXiv preprint arXiv:1805.08207. Cited by: §2.2.2, §2.2.
• W. W. Zachary (1977) An information flow model for conflict and fission in small groups. Journal of anthropological research 33 (4), pp. 452–473. Cited by: §3.3.