Learning graph-structured data using Poincaré embeddings and Riemannian K-means algorithms

by   Hatem Hajri, et al.

Recent literature has shown several benefits of hyperbolic embedding of graph-structured data (GSD) in representing their structures and latent relations. While several studies have explored the ability of hyperbolic embedding to represent data (for example, by quantifying their mean average precision) and their ability to produce better visualisations of clusters, only few works exploited the effectiveness of hyperbolic embedding to perform learning on the initial GSD. Motivated by innovative ideas from the fields of Brain computer interfaces and Radar processing, this paper presents a new scheme for learning GSD based on hyperbolic embedding, Riemannian barycentre (i.e. Fréchet or geometric mean) and K-means algorithms as a significant tool that derives from it. The main idea is as follows. Relying on the Riemannian barycentre, we define a notion of minimal variance which allows us to choose an embedding between different ones. This embedding is used thereafter together with K-means algorithms to perform unsupervised clustering and in combination with the nearest neighbour rule to perform supervised learning. We demonstrate the performance of the proposed framework through several experiments on real-world social networks and hierarchical GSD. The obtained results outperform their counterparts in high-dimensional Euclidean spaces and recent proposed geometric approaches.



There are no comments yet.


page 8


Hyperbolic Node Embedding for Signed Networks

The rapid evolving World Wide Web has produced a large amount of complex...

Computationally Tractable Riemannian Manifolds for Graph Embeddings

Representing graphs as sets of node embeddings in certain curved Riemann...

Poincaré Embeddings for Learning Hierarchical Representations

Representation learning has become an invaluable approach for learning f...

Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Knowledge graph (KG) embeddings learn low-dimensional representations of...

Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design

Hyperbolic neural networks have been popular in the recent past due to t...

Mercator: uncovering faithful hyperbolic embeddings of complex networks

We introduce Mercator, a reliable embedding method to map real complex n...

Neural Distance Embeddings for Biological Sequences

The development of data-dependent heuristics and representations for bio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, the idea of embedding data in new spaces has proven effective in many applications. Indeed, several techniques have become very popular because of their great ability to represent data while reducing the complexity and dimensionality of the space. For instance, Word2vec [25] and Glove [28]

are widely used tools in natural language processing, Nod2vec

[19], Graph2vec [26] and DeepWalk [29] are commonly used for community detection, link prediction and node classification in social networks [15].
A major achievement in recent years has been the discovery of hyperbolic embeddings [27, 13, 31]. Although it has been speculated since several years that hyperbolic spaces would better represent GSD than Euclidean spaces [18, 22, 11, 2], it is only recently that these speculations have been proven effective through concrete studies and applications [27, 13, 31]. As outlined by [27], Euclidean embeddings require large dimensions to capture certain complex relations such as the Wordnet noun hierarchy [16]. On the other hand, this complexity can be captured by a simple model of hyperbolic geometry such as the Poincaré disk of two dimensions [31]. Hyperbolic embeddings also provide better visualisation of clusters on graphs than Euclidean embeddings [13].

The present paper is concerned with learning GSD represented by an adjacency matrix. Examples of these GSD include social networks, hierarchical lexical databases such as Wordnet and Lexical entailments datasets such as Hyperlex [27, 13, 31]

. In the state of the art, one can distinguish two different approaches for clustering this type of data. The first one applies pure clustering techniques on graphs such as spectral clustering algorithms

[32], power iteration clustering [24] and label propagation [38]. The second one is two-step and may be called Euclidean clustering after (Euclidean) embedding. First it embedds data in Euclidean spaces using techniques such as Nod2vec, Graph2vec and DeepWalk and then applies traditional clustering techniques such as -means algorithms. This approach appeared notably in [37, 33, 35]. Our main objective throughout the paper is to present the hyperbolic counterpart of this approach.

The motivation for this paper comes from recent works in Brain computer interfaces [10] and Radar processing [8]. In particular [10]

used the distance to the Riemannian barycentre on the space of covariance matrices to classify brain computer signals.


used the Riemannian median on the Poincaré disk to detect outliers in Radar data based on a similar idea. In this paper, we apply for the first time (to the best of our knowledge) these techniques in link with hyperbolic embeddings of GSD. Our main contributions can be summarised as follows:

Unsupervised learning on GSD: We consider the task of clustering nodes on a GSD with a known number of classes denoted . For this, we first generate several hyperbolic embeddings on the Poincaré disk following [27]. For each embedding, we run a Riemannian -means algorithm. Finally we keep the embedding with minimal total variance, a notion which we introduce. This procedure is evaluated on real-data social networks and compared with its analog on the Euclidean space with dimensions and with the recent clustering method proposed in [1]. Experiments show that our method outperforms these two approaches.

As another application, we considered the task of clustering a typical example of hierarchical GSD which is a subtree of Wordnet. We focused on the representation of clusters and the interpretation of their contents.

Supervised learning on GSD: We adapted the previous idea to the context of supervised learning by keeping the embedding which has minimal total variance on the training dataset. Non labelled nodes are then classified according to the nearest barycentre rule. This approach is compared with the recent hyperbolic supervised approach [14]

based on support vector machines. Experiments show the advantage of our method over that of


The rest of the paper is organised as follows. Section 2 reviews necessary optimisation tools on the Poincaré disk as a Riemannian manifold of negative curvature. In particular, we discuss existence, uniqueness and numerical computations of the Riemannian barycentre and then deduce a Riemannian -means clustering algorithm on this space. Using barycentres to unroll -means algorithms have had several applications including object detection and tracking, shape classification, and image segmentation [34, 23, 20, 17, 9]. In these works the reference space is the manifold of covariance matrices. Here, we adapt this idea to the Poincaré space. Section 3 reviews the Poincaré embedding as introduced in [27] and presents our approach to learn GSD. Finally Section 4 provides experimental results and a comparison with the state of the art111 The package generating these results will be made public in the near future..

2 Statistical learning on the Poincaré disk

This section reviews the Riemannian geometry of , discusses the existence and uniqueness of the Riemannian barycentre, focuses on its numerical computations and as a consequence derives a Riemannian -means algorithm on this space.

The Poincaré disk is commonly equipped with the Riemannian metric known as the Poincaré metric and expressed as


where and is the scalar product on . The Riemannian metric (1) induces a Riemannian distance between and given as follows:

This distance can also be expressed as which is half the distance used in [27].

The Poincaré metric (1) turns into a Riemannian manifold of negative sectional curvature [21, 36]. As a result, enjoys the property of existence and uniqueness of the Riemannian barycentre [3]. More precisely, for every set of points in , the empirical Riemmanian barycentre


exists, is unique and belongs to . Several stochastic gradient algorithms can be applied to numerically approximate [12, 5, 7, 6]. In this paper, we will use the algorithm of [8] which has proven effective for Radar applications. For this, we first recall the definitions of the Riemannian exponential and logarithmic maps.

For given and the exponential map is


The exponential map is a diffeomorphism from to . Its inverse, called the logarithmic map and denoted by is given by

with and atanh is the inverse of the hyperbolic tangent. For more details on the previous formulas, we refer to [4, 36]. For numerical computation of the barycentre, we will use Algorithm 1 below from [8].

Inputs: a subset of (complex), : barycentre initialisation (complex), : step size (strictly positive float)
      Output: numerical approximation of the Riemannian barycentre (complex)

1: random or a trickier initialisation (e.g. , average mean)
2: a significantly large number
4:                                , ,  
5:until  is small
Algorithm 1 SGD on for barycentre computation

A direct consequence of existence and uniqueness of the Riemannian barycentre is -means algorithm illustrated in Algorithm 2. In the description we use the word centroid to denote Riemannian barycentre.

Inputs   : number of clusters (integer), : set of complex numbers that are a subset of (complex),: barycentre approximation step size (strictly positive float)
      Output: set of centroids (complex), : labels of the input data (table of integers)

1:Initialize centroids, randomly in
3:     for   do
4:                                    is the Riemannian distance
5:     end for
6:     for  do
8:     end for
9:until convergence return ,
Algorithm 2 -means clustering on

3 Learning GSD using Poincaré embeddings and Riemannian optimisation

This section starts by reviewing the approach of [27] to embed GSD in the Poincaré disk. Based on this embedding and -means algorithm introduced in the previous section, we present our algorithm to perform supervised and unsupervised clustering tasks.
Given a GSD , where and and

are the nodes and edges datasets, the probability of an edge

is modelled in [27] using the Fermi-Dirac distribution

To embed in , [27] learns a map by minimising


where is the set of neighbours of . Following [25], (3) is optimised by selecting small number of negative samples according to a priori distribution . Taking this into account, the objective function (3) can be written as


with the softmax function 222Since the Riemannian gradient of is given by

(see [8]), we actually used instead of to avoid division by and get better stability. We also notice that [27] proposes to maximise (4) for social networks GSD and to maximise

for hierarchical GSD. We found that with the latter loss function, the embedded nodes quickly approach the boundary of the disk and that using (

4) also for hierarchical GSD gives us more stable results. . In practice, is optimised by generating nodes on the graph using DeepWalk [29] and then sampling from these nodes using the unigram distribution raised to [25].

Given a cluster in , with barycentre , we define its variance as

This definition is in accordance with the use of mentioned in the footnote. The variance will be used to choose one embedding between several ones. In fact, it is reasonable to give more confidence on the embedding for which the variance is minimal, that is points are more concentrated around barycentres. This idea will be justified empirically in the next section.

Finally, based on the Riemannian barycentre, Algorithm 3 below presents our scheme to perform unsupervised clustering. The main idea is to embed GSD in the Poincaré disk, form clusters using the ground truth data and lastly associate points to clusters according to the nearest barycentre rule.

In what follows, the Poincare_Embedding function, given an adjacency matrix of an input GSD, minimises (4) and outputs the embedding of every node on .

Inputs: adjacency matrix of a GSD with entries, : an object containing the embedding parameters, : number of classes, : an object containing the -means algorithm parameters, : number of experiments
Outputs: embedding of input graph with minimal total variance, : barycentres of each cluster, : cluster labels for each node

1:repeat Experimentally each execution is run in parallel
6:until  embeddings have been computed
Algorithm 3 Unsupervised clustering algorithm.

Algorithm 4 below presents our scheme to perform supervised clustering. Details regarding the implementation aspects together with experimental results of Algorithms 3 and 4 will be provided and discussed in the next section.

Inputs: adjacency matrix of a GSD with entries, : an object containing the embedding parameters (object), : The ground truth of each node in the GSD (table of integers), : barycentre approximation step size (strictly positive float)
Output: computed cluster of each node used for training (integer)

3: Embedded nodes are divided into two parts
4:                       Compute_Clusters
5:for  do
7:end for
8:for  do
9:                                is the Riemannian distance
10:end for
12: Performances are then obtained by comparing the ground truth with the computed centroids of the .
Algorithm 4 Supervised clustering algorithm

4 Experimental results

The algorithms from the previous sections are implemented as a package that does the following: given the adjacency matrix of a binary graph as input, it performs embedding over the Poincaré disk, applies Riemannian -means clustering and provides visualisation of the computed clusters.

The package is implemented in Python and makes use of multiprocessing to run a number of experiments in parallel. All computations are performed on a machine using four cores equipped with an Intel Core i5 running at a 2.71 GHz frequency. The threshold is set to for all experiments. The datasets used in the paper are given in Table 1 with their number of nodes (Nodes) and their number of edges (Edges) [30].

Dataset Nodes Edges
Karate 34 77 2
Polblogs 1224 16781 2
Polbooks 105 441 3
Football 115 613 12
Adjnoun 112 425 2
Mammals subtree 1179 6541 NA
Table 1: Datasets used in the paper and their charachteristics.

4.1 Unsupervised clustering

Social Networks. In this part, we are interested in applying the previous algorithms to the datasets presented above.

Comparison criterion.

For each dataset 10 experiments are performed. Each experiment is conducted in two steps. In the first step, we generate a Poincaré and an Euclidean embedding which uses DeepWalk [29]. An intermediate step of the latter algorithm is the generation of random walks that captures the structure of the graph. The same random walks are used for both embeddings. In the second step, we apply Riemannian (resp. Euclidean) -means algorithm over the embedding. For the Euclidean embedding, we set the space dimension to 10 and use the Euclidean distance and barycentre. For each dataset, we choose the embedding having the second greatest variance as given in Algorithm 3 as the best one (in the Riemannian and Euclidean case). Finally, for each dataset, we computed the mean average performance of the embeddings (Riemannian and Euclidean). The results are presented in Table 2 with the following abbreviations:

PBPE: Performance of the best Poincaré embedding

PBEE: Performance of the best Euclidean embedding

APPE: Average performance of the Poincaré embeddings

APEE: Average performance of the Euclidean embeddings

Karate 91.2% 70.6% 91.4% 65.8% -
Polblogs 92.8% 51.9% 92.5% 53.5% -
Polbooks 84.8% 77.1% 80% 62% 75%
Football 87% 67.8% 69.4% 56.8% 77%
Adjnoun 51.8% 51.8 % 52.5% 51.6% 51%
Table 2: Comparative performances table of -means Poincaré clustering for different examples compared to Euclidean -means clustering. The best results are highlighted in bold text.

In addition to Table 2, Figure 1 provides visualisations of the computed clusters by the best Poincaré embedding for each dataset. Each cluster is represented with a different color and its barycentre by a square symbol.

Performance increase with the number of experiments.

In order to justify our choice of the best embedding as having the minimal total variance, we plot for the Football dataset the evolution of the PBPE with respect to the number of experiments NE. The obtained plot (Figure 2), shows indeed that the PBPE increases as NE grows from 1 to 10.

(a) Karate.
(b) Polblogs.
(c) Polbooks.
(d) Football.
(e) Adjnoun.
Figure 1: Visualisation of the computed clusters for the experiments having the best embedding (according to the defined variance criterion), the barycentres are represented by square shapes.

Hierarchical GSD. In this part, we are interested in applications over hierarchical GSD. We consider an example from Wordnet which is the mammals subtree. Figure 3 illustrates the obtained clusters with . The barycentres are represented with squares. Figure 4 then shows explicitly some of the nodes labels, chosen randomly. A focus is given for nodes near barycentres on one hand and at boundaries between distinct clusters on the other hand.

Figure 2: PBPE performance increase with the number of experiments NE for the Football dataset.
Figure 3: Partitioning the mammals subtree into six clusters by -means algorithms.
Figure 4: Close up over the mammal subtree labels: at the boundaries of distinct clusters and inside the cluster in a small neighborhood of the barycentre (the later is represented by a square)

Notice that the obtained clusters discern between different types of mammals. For example the blue cluster contains mostly canine mammals while the orange one contains mostly larger mammals (lion, tigress and so on…).

4.2 Supervised learning

In this section, we exploit Riemannian barycentres to perform supervised clustering. Each dataset from the previous list is divided into five parts with (almost) equal sizes. The ground truth of the data will be used to define the clusters and the remaining will serve for testing in a cross-validation fashion. Following Algorithm 4, each element from the test dataset is assigned to one cluster according to the nearest neighbor barycentre rule. This experiment is performed 10 times. Finally the performance of this method is evaluated using ground truth of the test data. Table 3 presents the obtained results with the following abbreviation:

CVMAP: Average performance obtained using cross-validation over the number of performed experiments.

Dataset CVMAP [14]
Karate 93.9% 86%
Polblogs 92.4% 93%
Polbooks 83.3% 73%
Football 77.9 % 24%
Adjnoun 57.8 % -
Table 3: Cross validation performances of the supervised clustering algorithm.

4.3 Discussion and comparison with the state of the art

First we point out that using the notion of minimal variance it is always possible to generate more embeddings than what we did previously (10 embeddings) and it is also possible to increase the dimension of the Poincaré ball as in [27]. This reasoning takes advantage of the nice properties of hyperbolic manifolds and is completely unsupervised in the sense that it does not require any ground truth. Thus, improvements in the results given above remain possible in both supervised and unsupervised settings.

Unsupervised clustering. In this paragraph, we comment on results of Table 2. First the advantages of Poincaré clustering over Euclidean clustering are straightforward and confirm that hyperbolic spaces represent GSD more suitably than Euclidean spaces. Regarding Poincaré clustering, notice that the best embedding as defined before is not necessarily the one with the highest clustering performance since in two situations PBPEAPPE. However in both cases, the gaps are slight and do not exceed . The advantage of PBPE is more visible for the Football dataset where it largely exceeds APPE (with more than ) and for the Polbooks dataset where it exceeds APPE with . Our results outperform that of [1] whose authors proposed an embedding on generalised surfaces and considered the datasets Polbooks, Football, Adjnoun for testing. They obtained approximate successes rates of respectively. The improvements for these datasets using our proposed scheme are significant: and .

Supervised clustering. In this paragraph, we compare our results with that of [14] in which the authors used a generalisation of the SVM method to the Poincaré disk. [14] considered the datasets Karate, Polbooks, Football, Polblogs and reported mean approximate successes of respectively over cross-validation trials over different embeddings using the embedding of [13]. We obtained significant improvements for the datasets Karate, Polbooks, Football of 7.9, 10.3, and with a slight gap of for the Polblogs dataset.

4.4 Multidimensional embedding and clustering

In a multidimensional setting, we considered -means clustering after Poincaré embeddings in the hyperbolic space given by the product of Poincaré disks. This space is equipped with the product metric. The current publicly available Python implementation provides this multidimensional setting. However, experimentally, we did not observe a significant increase in performances for the values of ranging till for the datasets used in this paper. This may be explained by the fact that these datasets are not very large. In future work, we aim to experiment with larger datasets while increasing the dimension of the hyperbolic space in order to better study the scalability of our approach.

5 Conclusion

Several recent studies [27, 13, 14, 31] have concluded that hyperbolic spaces, even in small dimensions, are more suitable embedding spaces than Euclidean spaces for representing a GSD. In this paper, we used Poincaré embeddings [25] to propose a new method for clustering GSD data. This method is based on Riemannian -means algorithms and a notion of minimal variance that allowed us to choose one embedding among several ones.

The proposed method has been tested on several datasets and has shown improvements in the state-of-the-art for both supervised and unsupervised clustering tasks. In particular, these performances were achieved at minimal cost:

  • We used the Poincaré disk of only dimensions.

  • We got visualisation with high level representation of clusters on graphs.

Our results on clustering with the DeepWalk Euclidean embedding, suggest that getting good performances with this approach or other varieties of it such as Graph2vec [26], Nod2vec [19] would need very high dimensional Euclidean representation spaces.

Acknowledgement. The first author is grateful to Jeanine Harb for drawing his attention to the paper [27]

during the machine learning seminar at IRT SystemX.