1. Introduction
Most network embedding methods focus on preserving local structure information among connected vertices in their neighborhoods, like firstorder, secondorder, and highorder proximity. Using language models to preserve the microscopic structure of networks was first proposed by Perozzi et al. in their work DeepWalk (Perozzi et al., 2014). This method uses random walks to generate random sequences of nodes from the network, which are then treated as sentences by a SkipGram model (Mikolov et al., 2013). Grover et al. (Grover and Leskovec, 2016) demonstrated that DeepWalk can not accurately capture the diversity of connectivity patterns in a network and introduced node2vec
. They defined a flexible notion of a node’s network neighborhood and designed a secondorder random walk strategy to sample the neighborhood nodes. The method can smoothly interpolate between breadthfirst sampling (BFS) and depthfirst sampling (DFS). However, a limitation of these methods is that they can not capture structural role proximities.
The structural role proximity depicts similarity between vertices serving similar “roles” in the network, such as being the center of a community, or a bridge between two communities. Different from the thorder proximity, which captures the local similarity between nodes, the structural role proximity tries to discover the similarity between nodes far away from each other (or even disconnected) but sharing the equivalent structural roles. One of the early unsupervised methods for learning structural node embeddings is RolX (Henderson et al., 2012)
. The method is based on enumerating various structural features for nodes in a network, finding the more suited basis vector for this joint feature space, and then assigning for every node a distribution over the identified roles.
struc2vec (Ribeiro et al., 2017) determines the structural similarity between each node pair in the graph considering hop count neighborhood sizes. It constructs a weighted multilayer graph to generate a context for each node. GraphWave (Donnat et al., 2018) uses one matrix factorization method based on the assumption that if two nodes in the network share similar structural roles, the graph wavelets starting at them will diffuse similarly across their neighbors.There also has been a relatively recent push for embedding networks into hyperbolic space. This has come with the realization that complex networks may have underlying hyperbolic geometry. This is because hyperbolic geometry can naturally reflect some properties of complex networks (such as the hierarchical and scalefree structures) (Krioukov et al., 2010). An emerging network embedding approach is to embed networks into hyperbolic space (Nickel and Kiela, 2017; AlanisLobato et al., 2016; De Sa et al., 2018; Muscoloni et al., 2017; McDonald and He, 2019; Wang et al., 2019). For instance, HEAT (McDonald and He, 2019) learns embeddings form attributed networks and HHNE (Wang et al., 2019) learns embeddings form heterogeneous information network in hyperbolic space.
However, to the best of our knowledge, none of the existing hyperbolic embedding methods can capture the structure role equivalence. To fill this gap, we present a framework to embed the structural roles of nodes into hyperbolic space. Our framework extends struct2vec, a wellknown structural role preserving embedding method, by moving it to a hyperboloid model.
2. Our Framework
Let be a undirected and unweighted network, is a set of vertices and is the set of unweighted edges between vertices in . We consider the problem of representing a graph as set of lowdimensional vectors into the dimensional hyperboloid with The described problem is unsupervised. Our framework consists of two parts: building the multilayer graph which measures the structural similarity between node pairs, and using the context of each node generated by a biased random walk to learn hyperboloid embeddings.
2.1. Constructing the Multilayer graph
The architecture presented in this paper can use any of the known approaches for node structural embeddings to generate the node context. In this paper, we extended struct2vec, the framework proposed by Ribeiro et al. (Ribeiro et al., 2017). Let denote the ordered sequence of the degree of the nodes at distance exactly from in (hop count). The structural role similarity of two nodes and considering the set of nodes of distance from them can be defined as the similarity of the two ordered sequences and . Note that these two sequences may not have equal sizes and their elements are integers in the range . We use Fast Dynamic Time Warping (FastDTW)(Salvador and Chan, 2007) to measure the distance between two ordered degree sequences. The dynamic time warping algorithm (DTW) is able to find the optimal alignment between two arbitrary length time series, but has a quadratic time and space complexity that limits its use to only small time series data sets. The FastDTW is an approximation of DTW which limits both the time and space complexity to . Since elements of the sequences and are degrees of nodes, we adopt the following distance function of th and th element in the above two sequences for FastDTW as follows:
(1) 
Instead of measuring the absolute difference of degrees, this distance measures the relative difference which is more suitable for degree differences. The structural role distance of two nodes and considering their hop neighborhoods can be defined as:
(2) 
Next, we construct a multilayer weighted graph that encodes the structural similarity between nodes. Each layer is constructed by a weighted undirected complete graph with all the nodes of the original graph . The edges of inside layer are defined as:
(3) 
Note that if a node has too many or too few structurally similar nodes in the current layer , then it should change layers to obtain a more refined context. By moving up one layer the number of similar nodes will decrease, and by moving down one layer the number of similar nodes will increase. Thus, we define the interlayer edges as follows:
(4) 
where denotes how many nodes are structurally similar in layer , which is the number of incoming edges to that have weight larger than the average weight of layer , more specifically:
(5) 
We then adopt a random walk method to obtain the structural preserving context of each node. For each step, it can either walk inside one layer or walk between layers. We define the layerchange constant
, such that for each step the probability of staying in the current layer is
and the probability of going up or down one layer is . Thus, given the current node , the normalized probability of moving to a current layer node is:(6) 
The normalized probability of moving to a node in the layer above, , is:
(7) 
And similarly, the normalized probability of moving to a node in the layer below, , is:
(8) 
2.2. Learning a Hyperboloid Model
Finally, we train a hyperboloid model on the generated random walk sequences to obtain structural role preserving embeddings. Hyperbolic space is a homogeneous space with constant negative curvature. It can not be embedded into the Euclidean space without distortion, however, there are several hyperbolic models that allow calculation of gradients. The most commonly used ones are hyperboloid, Poincaré ball, and Poincaré halfspace. Unlike previous works using Poincaré ball model and approximate gradients, we use the hyperboloid model for network embedding because the gradient computation of this model is exact (Wilson and Leimeister, 2018)
and we can adopt a Support Vector Machine (SVM) on it
(Cho et al., 2019).2.2.1. Review of the Hyperboloid Model
The hyperboloid model has many similarities to the sphere model. Analogous to the sphere in the ambient Euclidean space, the hyperboloid model can be viewed as a “pseudosphere” in an ambient space called the . Consider an (n+1)dimensional space equipped with an inner product whose form is given by:
(9) 
We use for the notation of Minkowski space. Analogous to the unit sphere in Euclidean space, the hyperboloid can be described using the following equation:
(10) 
For a given vector , the tangent space at that point is a set of points with the form
(11) 
2.2.2. Gradient calculation on the Hyperboloid
Analogous to the case of sphere, The calculation of the gradient of a given function defined on has several steps (Wilson and Leimeister, 2018).
Calculate the gradient of E in the ambient space, i.e.
(12) 
Project that vector onto the tangent space . Notice that the sign is flipped in the expression of the projected vector:
(13) 
Map the gradient vector onto the hyperboloid. This operation is called exponential map.
(14) 
where .
2.2.3. Hyperboloid Embedding Learning
After generating the random walk sequences, we use a sliding window to scan all the sequences and add pairs of nodes that appear within the window to a multiset as all the positive sample pairs. Note that different from common sampling methods, each pair of nodes and can appear multiple times in . Intuitively, the number of times a pair is sampled indicates the importance of that pair. In prior work (Nickel and Kiela, 2017), Nickel et al used the distance of two nodes to define the possibility of a link. Similarly, we define the structural role similarity of two nodes to be their distance in the embedded hyperbolic space: nodes close to each other share a high similarity and vice versa. We define the structural role distance between nodes and as
(15) 
where is the embedding of node in the hyperboloid model. can be calculated by . However, computing the gradient of Equation 15 involves a summation over all the nodes of
and is inefficient for large networks. To address this, we leverage the negative sampling method which samples a small number of negative objects to enhance the influence of positive samples. As a result, our loss function
for an embedding can be written as following:(16) 
where is the negative sampling set with probability proportional to the occurrence frequency of in , is the number of negative samples. The calculation of its gradient follows Equation. 12, 13 and 14, which enables the gradient decent for model learning.
3. Experiments
We use the same five datasets used by Leonardo et al (Ribeiro et al., 2017): one synthetic barbell graph; four realword datasets: Brazilian, American and European airtraffic network and karate network.
3.1. Model Training
For random on the multilayer graph, the layerchange constant
is set to 0.7, and we do 8 random walks from each node in the training set with the length of 10. (Contrast that with the classic struct2vec in Euclidean space that needs to set the number of random walks to 80. Our method reduces the need for random walks, which are computationally expensive, by 90%) For training the hyperboloid model, we use a sliding window of size 3 to generate positive samples. For the hyperboloid embedding learning, we generate 20 negative samples for each positive ones, and use the learning rate of 1 and a batch size of 50 to train 5 epochs.
3.2. Barbell Graph
We consider the barbell graph which consists of two complete subgraphs connected by a long path. Figure 1(a) shows the barbell graph used in the experiment, where the structurally equivalent nodes have the same color. The result of RolX is in Figure 1(b), although RolX captures some structural role identity, all the blue nodes are placed in three different roles (0,2 and 5). Also, role 4 contains all the nodes in the path, but actually they are not exactly similar. Figure 1(c) shows the results of node2vec, it does not capture structural role identities and the nodes of two parts of the complete graph are placed separately, along with the nodes in the path close to them. Figure 1(e) shows our results on a 2dimensional Poincaré ball, compared with struct2vec results in Euclidean space (Figure 1(d)), our method captures structural equivalence more accurately. Moreover, we only do 8 random walks of length 10 from each node in the hyperboloid model, and struct2vec needs to set the number of random walk to 80 to generate an accurate result, which also indicates the superiority of hyperbolic space in learning structural role equivalence.
3.3. Karate Network
The Zachary’s Karate Club (Zachary, 1977) is a network of 34 nodes: each node represents a club member and edges among them denote if two members have interacted. The network used in the experiment (Figure 2(a) ) is composed of two copies of the Karate Club network, where each node has a mirror node and one edge has been added between mirrored node pairs 1 and 37. Figure 2(b) shows the roles identified by RolX, only 7 out of 34 corresponding pairs are placed in the same role. Result of node2vec is shown in Figure 2(c), since this method only captures microscopic structural information, the two parts of the network are placed separately since there is only one edge that connects them. The corresponding pairs of our result on a 2dimensional Poincaré ball (Figure 2(e)) are more close than the result of struct2vec (Figure 2(d)). Moreover, different roles’ embeddings generated by struct2vec are more likely to bunch together in Euclidean space. In hyperbolic space, however, these embeddings are located more sparsely, which indicates a better ability to distinguish different roles.
3.4. Node classification
We also test our method on three realworld datasets provided by Leonardo et al (Ribeiro et al., 2017)
: Brazilian, American and European airtraffic networks. The nodes correspond to airports and edges indicate the existence of commercial fights. For each airport, one of four possible labels is assigned corresponding to their activity (divided evenly into four quartiles). Thus, each class represents a ”role” played by the airport (e.g, major hubs). The task here is to predict the role of an airport. We train all the three models on each network to get embeddings and use a 10fold crossvalidation for the evaluation. For our model, we use a hyperbolic SVM
(Cho et al., 2019)as the classifier, and for the other two Euclidean models
struct2vec and node2vec, we use the classic Euclidean SVM. Table 1 shows the node classification results where our model outperforms the baselines.Brazilian  American  European  

Hyperboloid  0.780  0.670  0.581 
Struct2vec  0.732  0.651  0.577 
Node2vec  0.267  0.473  0.329 
4. Conclusion
In this paper, we present a novel method for embedding nodes of a network into hyperbolic space which preserves structure role information. To the best of our knowledge, this is the first attempt at a hyperbolic model that can learn node structural role proximity. Our algorithm outperforms several baselines on a synthetic barbell graph and four realworld temporal datasets for embeddings visualization and node classification. The code and data for this paper will be made available upon request.
References

Manifold learning and maximum likelihood estimation for hyperbolic network embedding
. Applied network science 1 (1), pp. 10. Cited by: §1. 
Largemargin classification in hyperbolic space.
In
The 22nd International Conference on Artificial Intelligence and Statistics
, pp. 1832–1840. Cited by: §2.2, §3.4. 
Representation tradeoffs for hyperbolic embeddings.
Proceedings of machine learning research
80, pp. 4460. Cited by: §1.  Learning structural node embeddings via diffusion wavelets. In Proceedings of the 24th KDD, pp. 1320–1329. Cited by: §1.
 Node2vec: scalable feature learning for networks. In Proceedings of the 22nd KDD, pp. 855–864. Cited by: §1.
 Rolx: structural role extraction & mining in large graphs. In Proceedings of the 18th KDD, pp. 1231–1239. Cited by: §1.
 Hyperbolic geometry of complex networks. Physical Review E 82 (3), pp. 036106. Cited by: §1.
 HEAT: hyperbolic embedding of attributed networks. arXiv preprint arXiv:1903.03036. Cited by: §1.
 Distributed representations of words and phrases and their compositionality. In NIPS 2017, pp. 3111–3119. Cited by: §1.
 Machine learning meets complex networks via coalescent embedding in the hyperbolic space. Nature communications 8 (1), pp. 1615. Cited by: §1.
 Poincaré embeddings for learning hierarchical representations. In NIPS 2017, pp. 6338–6347. Cited by: §1, §2.2.3.
 Deepwalk: online learning of social representations. In Proceedings of the 20th KDD, pp. 701–710. Cited by: §1.
 Struc2vec: learning node representations from structural identity. In Proceedings of the 23rd KDD, pp. 385–394. Cited by: §1, §2.1, Figure 1, Figure 2, §3.4, §3.
 Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11 (5), pp. 561–580. Cited by: §2.1.
 Hyperbolic heterogeneous information network embedding. In Proceedings of AAAI 2019, Cited by: §1.
 Gradient descent in hyperbolic space. arXiv preprint arXiv:1805.08207. Cited by: §2.2.2, §2.2.
 An information flow model for conflict and fission in small groups. Journal of anthropological research 33 (4), pp. 452–473. Cited by: §3.3.