1 Introduction
In recent years, the idea of embedding data in new spaces has proven effective in many applications. Indeed, several techniques have become very popular because of their great ability to represent data while reducing the complexity and dimensionality of the space. For instance, Word2vec [25] and Glove [28]
are widely used tools in natural language processing, Nod2vec
[19], Graph2vec [26] and DeepWalk [29] are commonly used for community detection, link prediction and node classification in social networks [15].A major achievement in recent years has been the discovery of hyperbolic embeddings [27, 13, 31]. Although it has been speculated since several years that hyperbolic spaces would better represent GSD than Euclidean spaces [18, 22, 11, 2], it is only recently that these speculations have been proven effective through concrete studies and applications [27, 13, 31]. As outlined by [27], Euclidean embeddings require large dimensions to capture certain complex relations such as the Wordnet noun hierarchy [16]. On the other hand, this complexity can be captured by a simple model of hyperbolic geometry such as the Poincaré disk of two dimensions [31]. Hyperbolic embeddings also provide better visualisation of clusters on graphs than Euclidean embeddings [13].
The present paper is concerned with learning GSD represented by an adjacency matrix. Examples of these GSD include social networks, hierarchical lexical databases such as Wordnet and Lexical entailments datasets such as Hyperlex [27, 13, 31]
. In the state of the art, one can distinguish two different approaches for clustering this type of data. The first one applies pure clustering techniques on graphs such as spectral clustering algorithms
[32], power iteration clustering [24] and label propagation [38]. The second one is twostep and may be called Euclidean clustering after (Euclidean) embedding. First it embedds data in Euclidean spaces using techniques such as Nod2vec, Graph2vec and DeepWalk and then applies traditional clustering techniques such as means algorithms. This approach appeared notably in [37, 33, 35]. Our main objective throughout the paper is to present the hyperbolic counterpart of this approach.The motivation for this paper comes from recent works in Brain computer interfaces [10] and Radar processing [8]. In particular [10]
used the distance to the Riemannian barycentre on the space of covariance matrices to classify brain computer signals.
[8]used the Riemannian median on the Poincaré disk to detect outliers in Radar data based on a similar idea. In this paper, we apply for the first time (to the best of our knowledge) these techniques in link with hyperbolic embeddings of GSD. Our main contributions can be summarised as follows:
Unsupervised learning on GSD: We consider the task of clustering nodes on a GSD with a known number of classes denoted . For this, we first generate several hyperbolic embeddings on the Poincaré disk following [27]. For each embedding, we run a Riemannian means algorithm. Finally we keep the embedding with minimal total variance, a notion which we introduce. This procedure is evaluated on realdata social networks and compared with its analog on the Euclidean space with dimensions and with the recent clustering method proposed in [1]. Experiments show that our method outperforms these two approaches.
As another application, we considered the task of clustering a typical example of hierarchical GSD which is a subtree of Wordnet. We focused on the representation of clusters and the interpretation of their contents.
Supervised learning on GSD: We adapted the previous idea to the context of supervised learning by keeping the embedding which has minimal total variance on the training dataset. Non labelled nodes are then classified according to the nearest barycentre rule. This approach is compared with the recent hyperbolic supervised approach [14]
based on support vector machines. Experiments show the advantage of our method over that of
[14].The rest of the paper is organised as follows. Section 2 reviews necessary optimisation tools on the Poincaré disk as a Riemannian manifold of negative curvature. In particular, we discuss existence, uniqueness and numerical computations of the Riemannian barycentre and then deduce a Riemannian means clustering algorithm on this space. Using barycentres to unroll means algorithms have had several applications including object detection and tracking, shape classification, and image segmentation [34, 23, 20, 17, 9]. In these works the reference space is the manifold of covariance matrices. Here, we adapt this idea to the Poincaré space. Section 3 reviews the Poincaré embedding as introduced in [27] and presents our approach to learn GSD. Finally Section 4 provides experimental results and a comparison with the state of the art^{1}^{1}1 The package generating these results will be made public in the near future..
2 Statistical learning on the Poincaré disk
This section reviews the Riemannian geometry of , discusses the existence and uniqueness of the Riemannian barycentre, focuses on its numerical computations and as a consequence derives a Riemannian means algorithm on this space.
The Poincaré disk is commonly equipped with the Riemannian metric known as the Poincaré metric and expressed as
(1) 
where and is the scalar product on . The Riemannian metric (1) induces a Riemannian distance between and given as follows:
This distance can also be expressed as which is half the distance used in [27].
The Poincaré metric (1) turns into a Riemannian manifold of negative sectional curvature [21, 36]. As a result, enjoys the property of existence and uniqueness of the Riemannian barycentre [3]. More precisely, for every set of points in , the empirical Riemmanian barycentre
(2) 
exists, is unique and belongs to . Several stochastic gradient algorithms can be applied to numerically approximate [12, 5, 7, 6]. In this paper, we will use the algorithm of [8] which has proven effective for Radar applications. For this, we first recall the definitions of the Riemannian exponential and logarithmic maps.
For given and the exponential map is
with
The exponential map is a diffeomorphism from to . Its inverse, called the logarithmic map and denoted by is given by
with and atanh is the inverse of the hyperbolic tangent. For more details on the previous formulas, we refer to [4, 36]. For numerical computation of the barycentre, we will use Algorithm 1 below from [8].
A direct consequence of existence and uniqueness of the Riemannian barycentre is means algorithm illustrated in Algorithm 2. In the description we use the word centroid to denote Riemannian barycentre.
3 Learning GSD using Poincaré embeddings and Riemannian optimisation
This section starts by reviewing the approach of [27] to embed GSD in the Poincaré disk. Based on this embedding and means algorithm introduced in the previous section, we present our algorithm to perform supervised and unsupervised clustering tasks.
Given a GSD , where and and
are the nodes and edges datasets, the probability of an edge
is modelled in [27] using the FermiDirac distributionTo embed in , [27] learns a map by minimising
(3) 
where is the set of neighbours of . Following [25], (3) is optimised by selecting small number of negative samples according to a priori distribution . Taking this into account, the objective function (3) can be written as
(4) 
with the softmax function ^{2}^{2}2Since the Riemannian gradient of is given by
for hierarchical GSD. We found that with the latter loss function, the embedded nodes quickly approach the boundary of the disk and that using (
4) also for hierarchical GSD gives us more stable results. . In practice, is optimised by generating nodes on the graph using DeepWalk [29] and then sampling from these nodes using the unigram distribution raised to [25].Given a cluster in , with barycentre , we define its variance as
This definition is in accordance with the use of mentioned in the footnote. The variance will be used to choose one embedding between several ones. In fact, it is reasonable to give more confidence on the embedding for which the variance is minimal, that is points are more concentrated around barycentres. This idea will be justified empirically in the next section.
Finally, based on the Riemannian barycentre, Algorithm 3 below presents our scheme to perform unsupervised clustering. The main idea is to embed GSD in the Poincaré disk, form clusters using the ground truth data and lastly associate points to clusters according to the nearest barycentre rule.
In what follows, the Poincare_Embedding function, given an adjacency matrix of an input GSD, minimises (4) and outputs the embedding of every node on .
Inputs : adjacency matrix of a GSD with entries, : an object containing the embedding parameters, : number of classes,
: an object containing the means algorithm parameters, : number of experiments
Outputs : embedding of input graph with minimal total variance, : barycentres of each cluster, : cluster labels for each node
4 Experimental results
The algorithms from the previous sections are implemented as a package that does the following: given the adjacency matrix of a binary graph as input, it performs embedding over the Poincaré disk, applies Riemannian means clustering and provides visualisation of the computed clusters.
The package is implemented in Python and makes use of multiprocessing to run a number of experiments in parallel. All computations are performed on a machine using four cores equipped with an Intel Core i5 running at a 2.71 GHz frequency. The threshold is set to for all experiments. The datasets used in the paper are given in Table 1 with their number of nodes (Nodes) and their number of edges (Edges) [30].
Dataset  Nodes  Edges  

Karate  34  77  2 
Polblogs  1224  16781  2 
Polbooks  105  441  3 
Football  115  613  12 
Adjnoun  112  425  2 
Mammals subtree  1179  6541  NA 
4.1 Unsupervised clustering
Social Networks. In this part, we are interested in applying the previous algorithms to the datasets presented above.
Comparison criterion.
For each dataset 10 experiments are performed. Each experiment is conducted in two steps. In the first step, we generate a Poincaré and an Euclidean embedding which uses DeepWalk [29]. An intermediate step of the latter algorithm is the generation of random walks that captures the structure of the graph. The same random walks are used for both embeddings. In the second step, we apply Riemannian (resp. Euclidean) means algorithm over the embedding. For the Euclidean embedding, we set the space dimension to 10 and use the Euclidean distance and barycentre. For each dataset, we choose the embedding having the second greatest variance as given in Algorithm 3 as the best one (in the Riemannian and Euclidean case). Finally, for each dataset, we computed the mean average performance of the embeddings (Riemannian and Euclidean). The results are presented in Table 2 with the following abbreviations:
PBPE: Performance of the best Poincaré embedding
PBEE: Performance of the best Euclidean embedding
APPE: Average performance of the Poincaré embeddings
APEE: Average performance of the Euclidean embeddings
Dataset  PBPE  PBEE  APPE  APEE  [1] 

Karate  91.2%  70.6%  91.4%  65.8%   
Polblogs  92.8%  51.9%  92.5%  53.5%   
Polbooks  84.8%  77.1%  80%  62%  75% 
Football  87%  67.8%  69.4%  56.8%  77% 
Adjnoun  51.8%  51.8 %  52.5%  51.6%  51% 
Performance increase with the number of experiments.
In order to justify our choice of the best embedding as having the minimal total variance, we plot for the Football dataset the evolution of the PBPE with respect to the number of experiments NE. The obtained plot (Figure 2), shows indeed that the PBPE increases as NE grows from 1 to 10.
Hierarchical GSD. In this part, we are interested in applications over hierarchical GSD. We consider an example from Wordnet which is the mammals subtree. Figure 3 illustrates the obtained clusters with . The barycentres are represented with squares. Figure 4 then shows explicitly some of the nodes labels, chosen randomly. A focus is given for nodes near barycentres on one hand and at boundaries between distinct clusters on the other hand.
Notice that the obtained clusters discern between different types of mammals. For example the blue cluster contains mostly canine mammals while the orange one contains mostly larger mammals (lion, tigress and so on…).
4.2 Supervised learning
In this section, we exploit Riemannian barycentres to perform supervised clustering. Each dataset from the previous list is divided into five parts with (almost) equal sizes. The ground truth of the data will be used to define the clusters and the remaining will serve for testing in a crossvalidation fashion. Following Algorithm 4, each element from the test dataset is assigned to one cluster according to the nearest neighbor barycentre rule. This experiment is performed 10 times. Finally the performance of this method is evaluated using ground truth of the test data. Table 3 presents the obtained results with the following abbreviation:
CVMAP: Average performance obtained using crossvalidation over the number of performed experiments.
Dataset  CVMAP  [14] 

Karate  93.9%  86% 
Polblogs  92.4%  93% 
Polbooks  83.3%  73% 
Football  77.9 %  24% 
Adjnoun  57.8 %   
4.3 Discussion and comparison with the state of the art
First we point out that using the notion of minimal variance it is always possible to generate more embeddings than what we did previously (10 embeddings) and it is also possible to increase the dimension of the Poincaré ball as in [27]. This reasoning takes advantage of the nice properties of hyperbolic manifolds and is completely unsupervised in the sense that it does not require any ground truth. Thus, improvements in the results given above remain possible in both supervised and unsupervised settings.
Unsupervised clustering. In this paragraph, we comment on results of Table 2. First the advantages of Poincaré clustering over Euclidean clustering are straightforward and confirm that hyperbolic spaces represent GSD more suitably than Euclidean spaces. Regarding Poincaré clustering, notice that the best embedding as defined before is not necessarily the one with the highest clustering performance since in two situations PBPEAPPE. However in both cases, the gaps are slight and do not exceed . The advantage of PBPE is more visible for the Football dataset where it largely exceeds APPE (with more than ) and for the Polbooks dataset where it exceeds APPE with . Our results outperform that of [1] whose authors proposed an embedding on generalised surfaces and considered the datasets Polbooks, Football, Adjnoun for testing. They obtained approximate successes rates of respectively. The improvements for these datasets using our proposed scheme are significant: and .
Supervised clustering. In this paragraph, we compare our results with that of [14] in which the authors used a generalisation of the SVM method to the Poincaré disk. [14] considered the datasets Karate, Polbooks, Football, Polblogs and reported mean approximate successes of respectively over crossvalidation trials over different embeddings using the embedding of [13]. We obtained significant improvements for the datasets Karate, Polbooks, Football of 7.9, 10.3, and with a slight gap of for the Polblogs dataset.
4.4 Multidimensional embedding and clustering
In a multidimensional setting, we considered means clustering after Poincaré embeddings in the hyperbolic space given by the product of Poincaré disks. This space is equipped with the product metric. The current publicly available Python implementation provides this multidimensional setting. However, experimentally, we did not observe a significant increase in performances for the values of ranging till for the datasets used in this paper. This may be explained by the fact that these datasets are not very large. In future work, we aim to experiment with larger datasets while increasing the dimension of the hyperbolic space in order to better study the scalability of our approach.
5 Conclusion
Several recent studies [27, 13, 14, 31] have concluded that hyperbolic spaces, even in small dimensions, are more suitable embedding spaces than Euclidean spaces for representing a GSD. In this paper, we used Poincaré embeddings [25] to propose a new method for clustering GSD data. This method is based on Riemannian means algorithms and a notion of minimal variance that allowed us to choose one embedding among several ones.
The proposed method has been tested on several datasets and has shown improvements in the stateoftheart for both supervised and unsupervised clustering tasks. In particular, these performances were achieved at minimal cost:

We used the Poincaré disk of only dimensions.

We got visualisation with high level representation of clusters on graphs.
Our results on clustering with the DeepWalk Euclidean embedding, suggest that getting good performances with this approach or other varieties of it such as Graph2vec [26], Nod2vec [19] would need very high dimensional Euclidean representation spaces.
Acknowledgement. The first author is grateful to Jeanine Harb for drawing his attention to the paper [27]
during the machine learning seminar at IRT SystemX.
References
 [1] Aalto, M., Verma, N.: Metric learning on manifolds. CoRR https://arxiv.org/abs/1902.01738 (2018)
 [2] Adcock, A.B., Sullivan, B.D., Mahoney, M.W.: Treelike structure in large social and information networks. In: 2013 IEEE 13th International Conference on Data Mining. pp. 1–10 (Dec 2013)
 [3] Afsari, B.: Riemannian center of mass: existence, uniqueness and convexity. Proc. Amer. Math. Soc. 139(2), 655–6673
 [4] Anderson, J.W.: Hyperbolic geometry; 2nd ed. Springer undergraduate mathematics series, Springer, Berlin (2005), https://cds.cern.ch/record/1164418
 [5] Arnaudon, M., Dombry, C., Phan, A., Yang, L.: Stochastic algorithms for computing means of probability measures. Stoch. Proc. Appl. 58(9), 1473–1455 (2012)
 [6] Arnaudon, M., Miclo, L.: Means in complete manifolds : uniqueness and approximation. ESAIM Probability and statistics 18, 185–206 (2014)
 [7] Arnaudon, M., Yang, L., Barbaresco, F.: Stochastic algorithms for computing pmeans of probability measures, geometry of Radar Toeplitz covariance matrices and applications to HR Doppler processing. In: International Radar Symposium (IRS). pp. 651–656 (2011)
 [8] Arnaudon, M., Barbaresco, F., Yang, L.: Riemannian medians and means with applications to radar signal processing. J. Sel. Topics Signal Processing 7(4), 595–604 (2013)
 [9] Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (Dec 2005), http://dl.acm.org/citation.cfm?id=1046920.1194902
 [10] Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Multiclass braincomputer interface classification by riemannian geometry. IEEE Trans. Biomed. Engineering 59(4), 920–928 (2012)
 [11] Boguñá, M., Papadopoulos, F., Krioukov, D.: Sustaining the Internet with Hyperbolic Mapping. Nature Communications 1(62) (Oct 2010)

[12]
Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control.
122(4), 2217–2229 (2013)  [13] Chamberlain, B.P., Clough, J., Deisenroth, M.P.: Neural embeddings of graphs in hyperbolic space. 13th international workshop on mining and learning with graphs. Available at https://arxiv.org/abs/1705.10359 (2017)
 [14] Cho, H., Demeo, B., Peng, J., Berger, B.: Largemargin classification in hyperbolic space. CoRR abs/1806.00437 (2018)
 [15] Cui, P., Wang, X., Pei, J., Zhu, W.: A survey on network embedding. CoRR abs/1711.08752 (2017)
 [16] Fellbaum, C. (ed.): WordNet: an electronic lexical database. MIT Press (1998)
 [17] Goh, A., Vidal, R.: Clustering and dimensionality reduction on riemannian manifolds. In: CVPR. IEEE Computer Society (2008)
 [18] Gromov, M.: Hyperbolic Groups, pp. 75–263. Springer New York, New York, NY (1987)
 [19] Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. KDD : proceedings. International Conference on Knowledge Discovery & Data Mining 2016, 855–864 (2016)
 [20] Gu, X., Deng, J.D., Purvis, M.K.: Improving superpixelbased image segmentation by incorporating color covariance matrix manifolds. In: ICIP. pp. 4403–4406. IEEE (2014)
 [21] Helgason, S.: Differential geometry, Lie groups, and symmetric spaces. American Mathematical Society (2001)
 [22] Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., Boguñá, M.: Hyperbolic geometry of complex networks. Phys. Rev. E 82, 036106 (Sep 2010)
 [23] Kurtek, S., Klassen, E., Ding, Z., Jacobson, S., Jacobson, J.B., Avison, M., Srivastava, A.: Parameterizationinvariant shape comparisons of anatomical surfaces. IEEE Trans. Med. Imaging 30(3), 849–858 (2011)
 [24] Lin, F., Cohen, W.W.: Power iteration clustering. In: ICML (2010)

[25]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013),
http://papers.nips.cc/paper/5021distributedrepresentationsofwordsandphrasesandtheircompositionality.pdf  [26] Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: graph2vec: Learning distributed representations of graphs. CoRR abs/1707.05005 (2017)
 [27] Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: Advances in Neural Information Processing Systems 30, pp. 6338–6347. Curran Associates, Inc. (2017)
 [28] Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: In EMNLP (2014)
 [29] Perozzi, B., AlRfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 701–710. KDD ’14, ACM, New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623732, http://doi.acm.org/10.1145/2623330.2623732

[30]
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence (2015),
http://networkrepository.com  [31] Sala, F., Sa, C.D., Gu, A., Ré, C.: Representation tradeoffs for hyperbolic embeddings. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 1015, 2018. pp. 4457–4466 (2018), http://proceedings.mlr.press/v80/sala18a.html
 [32] Spielmat, D.A.: Spectral partitioning works: Planar graphs and finite element meshes. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science. pp. 96–. FOCS ’96, IEEE Computer Society, Washington, DC, USA (1996), http://dl.acm.org/citation.cfm?id=874062.875505
 [33] Tu, C., Wang, H., Zeng, X., Liu, Z., Sun, M.: Communityenhanced network representation learning for network analysis. CoRR abs/1611.06645 (2016), http://arxiv.org/abs/1611.06645
 [34] Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1713–1727 (2008), http://dblp.unitrier.de/db/journals/pami/pami30.html#TuzelPM08
 [35] Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1225–1234. KDD ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939753, http://doi.acm.org/10.1145/2939672.2939753
 [36] Yang, L.: Médianes de mesures de probabilité dans les variétés riemanniennes et applications à la détection de cibles radar. PhD thesis (2011), http://www.theses.fr/2011POIT2311
 [37] Zheng, V.W., Cavallari, S., Cai, H., Chang, K.C., Cambria, E.: From node embedding to community embedding. CoRR abs/1610.09950 (2016), http://arxiv.org/abs/1610.09950
 [38] Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Tech. rep. (2002)
Comments
There are no comments yet.