GOT: An Optimal Transport framework for Graph comparison

06/05/2019 ∙ by Hermina Petric Maretic, et al. ∙ EPFL ESIEE PARIS 0

We present a novel framework based on optimal transport for the challenging problem of comparing graphs. Specifically, we exploit the probabilistic distribution of smooth graph signals defined with respect to the graph topology. This allows us to derive an explicit expression of the Wasserstein distance between graph signal distributions in terms of the graph Laplacian matrices. This leads to a structurally meaningful measure for comparing graphs, which is able to take into account the global structure of graphs, while most other measures merely observe local changes independently. Our measure is then used for formulating a new graph alignment problem, whose objective is to estimate the permutation that minimizes the distance between two graphs. We further propose an efficient stochastic algorithm based on Bayesian exploration to accommodate for the non-convexity of the graph alignment problem. We finally demonstrate the performance of our novel framework on different tasks like graph alignment, graph classification and graph signal prediction, and we show that our method leads to significant improvement with respect to the-state-of-art algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the rapid development of digitisation in various domains, the volume of data increases very rapidly, with many of those taking the form of structured data. Such information is often represented by graphs that capture potentially complex structures. It stays however pretty challenging to analyze, classify or predict graph data, due to the lack of efficient measures for comparing graphs. In particular, the mere comparison of graph matrices is not necessarily a meaningful distance, as different edges can have a diverse importance in the graph. Spectral distances have also been proposed

Jovanović and Stanić (2012); Gera et al. (2018)

, but they usually do not take into account all the information provided by the graphs, focusing only on the Laplacian matrix eigenvectors and ignoring a large portion of the structure encoded in eigenvectors. In addition to the lack of effective distances, a major difficulty with graph representations is that their nodes may not be aligned, which further complicates graph comparisons.

In this paper, we propose a new framework for graph comparison, which permits to compute both the distance between two graphs under unknown permutations, and the transportation plan for data from one graph to another. Instead of comparing graph matrices directly, we propose to look at the smooth graph signal distributions associated to each graph, and to relate the distance between graphs to the distance between the graph signal distributions. We resort to optimal transport for computing the Wasserstein distance between distributions, as well as the corresponding transportation plan. Optimal transport (OT) was introduced by Monge Monge (1781), and reformulated in a more tractable way by Kantorovich Kantorovich (1942). It has been a topic of great interest from both theoretical and practical points of view (Villani, 2008)

, and has recently been largely revisited with new applications in image processing, data analysis, and machine learning

(Peyré and Cuturi, 2018). Interestingly, the Wasserstein distance takes a closed-form expression in our settings, which essentially depends on the Laplacian matrices of the graphs under comparison. We further show that the Wasserstein distance has the important advantage of capturing the main structural information of the graphs.

Equipped with this distance, we formulate a new graph alignment problem for finding the permutation that minimises the mass transportation between a "fixed" distribution and a "permuted" distribution. This yields a nonconvex optimization problem that we solve efficiently with a novel stochastic gradient descent algorithm. It permits to efficiently align and compare graphs, and it outputs a structurally meaningful distance as well as a transport plan. These are important elements in graph analysis, comparison or graph signal prediction tasks. We finally illustrate the benefits of our new graph comparison framework in representative tasks such as noisy graph alignment, graph classification, and graph signal transfer. Our results show that the proposed distance outperforms both Gromov-Wasserstein and Euclidean distance for what concerns the graph alignment and graph clustering. In addition, we show the use of transport plans to predict graph signals. To the best of our knowledge, this is the only framework for graph comparison that includes the possibility to adapt graph signals to another graph.

In the literature, many methods have formulated the graph matching as a quadratic assignment problem (Yan et al., 2016; Jiang et al., 2017)

, under the constraint that the solution is a permutation matrix. As this is a NP-hard problem, different relaxations have been proposed to find approximate solutions. In this context, spectral clustering

(Caelli and Kosinov, 2004; Srinivasan et al., 2007)

emerged as a simple relaxation, which consists of finding the orthogonal matrix whose squared entries sum to one, but the drawback is that the matching accuracy is suboptimal. To improve on this behavior, the semi-definite programming relaxation was adopted to tackle the graph matching problem by relaxing the non-convex constraint into a semi-definite one

(Schellewald and Schnörr, 2005). Based on the assumption that the space of doubly-stochastic matrices is a convex hull, the graph matching problem was relaxed into a non-convex quadratic problem in Cho et al. (2010); Zhou and Torre (2016). A related approach was recently proposed to approximate discrete graph matching in the continuous domain asymptotically by using separable functions Yu et al. (2018). Along similar lines, a Gumbel-sinkhorn network was proposed to infer permutations from data Mena et al. (2018); Emami and Ranka (2018)

. The approach consists of producing a discrete permutation from a continuous doubly-stochastic matrix obtained with the Sinkhorn operator.

Closer to our framework, some recent works have studied the graph alignment problem from an optimal transport perspective. For example, Flamary et al. Flamary et al. (2014) propose a method based on optimal transport for empirical distributions with a graph-based regularization. The objective of this work is to compute an optimal transportation plan by controlling the displacement of a pair of points. Graph-based regularization encodes neighborhood similarity between samples on either the final position of the transported samples, or their displacement Ferradans et al. (2013). Gu et al. Gu et al. (2015)

define a spectral distance by assigning a probability measure to the nodes via the spectrum representation of each graph, and by using Wasserstein distances between probability measures. This approach however does not take into account the full graph structure in the alignment problem. Nikolentzos

et al. Nikolentzos et al. (2017)

proposed instead to match the graph embeddings, where the latter are represented as bags of vectors, and the Wasserstein distance is computed between them. The authors also propose a heuristic to take into account possible node labels or signals.

Another line of works have looked at more specific graphs. Memoli Mémoli (2011) investigates the Gromov-Wasserstein distance for object matching, and Peyré et al. Peyré et al. (2016) propose an efficient algorithm to compute the Gromov-Wasserstein distance and the barycenter of pairwise dissimilarity matrices. The algorithm uses enthropic regularization and Sinkhorn projections, as proposed by Cuturi (2013)

. The work has many interesting applications, including multimedia with point-cloud averaging and matching, but also natural language processing with alignment of word embedding spaces

(Alvarez-Melis and Jaakkola, 2018). Vayer et al. Vayer et al. (2018) build on this work and propose a distance for graphs and signals living on them. The problem is given as a combination between the Gromov-Wasserstein of graph distance matrices and the Wasserstein distance of graph signals. However, while the above methods solve the alignment problem using optimal transport, the simple distances between aligned graphs do not take into account its global structure and the methods do not consider the transportation of signals between graphs.

In this paper, we propose to resort to smooth graph signal distributions in order to compare graphs, and develop an effective algorithm to align graphs under a priori unknown permutations. The paper is organized as follows. Section 2 details the graph alignment with optimal transport. Section 3 presents the algorithm for solving the proposed approach via a stochastic gradient technique. Section 4 provides an experimental validation of graph matching in the context of graph classification, and graph signal transfer. Finally, the conclusion is given in Section 5.

2 Graph Alignment with Optimal Transport

(a)
(b) : ,
   
(c) : ,
   
Figure 1: Illustration of the structural differences captured with Wasserstein distance between graphs defined in (5). The graphs and are both copies of , with 2 edges removed. The modification in is very influential, as the two communities are almost disconnected; here, both Euclidean and Wasserstein distances measure a significant difference w.r.t. . Conversely, the modification in is hardly noticeable; here, the Euclidean distance still measures a significant difference, whereas the Wasserstein distance does not. The latter is a desirable property in the context of graph comparison.

Despite recent advances in the analysis of graph data, it stays pretty challenging to define a meaningful distance between graphs. Even more, a major difficulty with graph representations is the lack of node alignment, which prevents from performing direct quantitative comparisons between graphs. In this section, we propose a new distance based on Optimal Transport (OT) to compare graphs through smooth graph signal distributions. Then, we use this distance to formulate a new graph alignment problem, which aims at finding the permutation matrix that minimizes the distance between graphs.

Preliminaries

We denote by a graph with a set of vertices and a set of edges. The graph is assumed to be connected, undirected, and edge weighted. The adjacency matrix is denoted by . The degree of a vertex , denoted by , is the sum of weights of all the edges incident to in the graph . The degree matrix is then defined as:

(1)

Based on and , the Laplacian matrix of is

(2)

Moreover, we consider additional attributes modelled as features on the graph vertices. Assuming that each node is associated to a scalar feature, the graph signal takes the form of a vector in .

Smooth graph signals

Following (Rue and Held, 2005)

, we interpret graphs as key elements that drive the probability distributions of signals. Specifically, we consider two graphs

and with Laplacian matrices and

, and we consider signals that follow the normal distributions defined as

111Note that denotes a pseudoinverse operator. (Dong et al., 2016)

(3)
(4)

The above formulation means that the graph signal values vary slowly between strongly connected nodes (Dong et al., 2016). This assumption is verified for most common graph and network datasets. It is further used in many graph inference algorithms implicitly representing a graph through its smooth signals (Dempster, 1972; Friedman et al., 2008; Dong et al., 2018)

. Furthermore, the smoothness assumption is used as regularization in many graph applications, such as robust principal component analysis

(Shahid et al., 2015) and label propagation (Zhu et al., 2003).

Wasserstein distance between graphs

Instead of comparing graphs directly, we propose to look at the signal distributions, which are governed by the graphs. Specifically, we measure the dissimilarity between two graphs and through the Wasserstein distance of the respective distributions and . More precisely, the 2-Wasserstein distance corresponds to the minimal “effort” required to transport one probability measure to another with respect to the Euclidean norm (Monge, 1781), that is

(5)

where denotes the push forward of by the transport map defined on a metric space . For normal distributions such as and , the 2-Wasserstein distance can be explicitly written in terms of their covariance matrices (Takatsu, 2011), yielding

(6)

and the optimal transportation plan is .

The Wasserstein distance captures the structural information of the graphs under comparison. Namely, it is sensitive to differences that cause a global change in the connection between graph components, while it gives less importance to differences that have a small impact on the whole graph structure. Intuitively, this is a direct result from the definition of Wasserstein distance, with bigger changes to the expected behaviour of smooth signals resulting in a larger graph distance. This behaviour is illustrated in Figure 1 by a comparison with a simple distance that is the Euclidean norm between the Laplacian matrices of the graphs.

The optimal transportation plan enables the movement of signals from one graph to another. Namely, this continuous Lipshitz mapping adapts a graph signal to the distribution of another graph, while keeping similarity. This results in simple, but efficient prediction of this signal on another graph.

Note that in our setting a possible alternative to the Wasserstein distance could be the Kullback-Leibler divergence, whose expression is explicit for normal distributions. However, the KL divergence goes to infinity when the covariance matrices are singular. Furthermore, the OT framework also produces a transportation plan between distributions, which can prove beneficial in some graph analysis tasks.

Graph alignment

Equiped with a measure to compare graphs through signal distributions, we now propose a new formulation of the graph alignment problem. It is important to note that the graph signal distributions depend on the enumeration of nodes chosen to build and . While in some cases (e.g., dynamically changing graphs, multilayer graphs, etc…) a consistent enumeration can be trivially chosen for all graphs, it generally leads to the challenging problem of estimating an a priori unknown permutation between graphs. In our approach, we are given two connected graphs and , each with distinct vertices and with different sets of edges. We aim at finding the optimal transportation plan from to . Furthermore, in order to take all possible enumerations into account, we define the probability measure of a permuted representation for the graph as

(7)

where is a permutation matrix. Consequently, our graph alignment problem consists in finding the permutation that minimizes the mass transportation between and , which reads

(8)

where and is the identity matrix. According to (3), (6), (7), the above distance boils down to

(9)

The optimal permutation allows us to compare and when the consistent enumeration of nodes is not available. This is however a non-convex optimization problem that cannot be easily solved with standard tools. In the next section, we present an efficient algorithm to tackle this problem.

3 GOT Algorithm

(a) Graph 1
(b) Graph 2
(c) Solution to (14)
(d) Matrix
Figure 2: Illustrative example of the graph alignment problem. The solution to (14) is a matrix whose rows may be interpreted as assignment log-likelihoods. Applying the Sinkhorn operator to yields a matrix whose rows are assignment probabilities from Graph 1 (columns) to Graph 2 (rows).

We propose to solve the OT-based graph alignment problem described in the previous section via stochastic gradient descent. The latter is summarized in Algorithm 1, and its derivation is presented in the remaining of this section.

1:Graphs and
2:Sampling size , learning rate , and constant
3:Random initialization of matrices and
4:for  do
5:     Draw samples from the distribution
6:     Define the stochastic approximation of the cost function as
7:      gradient of evaluated at
8:      update of using
9:return
Algorithm 1 Approximate solution to the graph alignment problem defined in (8).

Optimization

The main difficulty in solving Problem (8) arises from the constraint that is a permutation matrix, since it leads to a discrete optimization problem with a factorial number of feasible solutions. We propose to circumvent this issue through an implicit constraint reformulation. The idea is that the constraints in (8) can be enforced implicitly by using the Sinkhorn operator (Sinkhorn, 1964; Cuturi, 2013; Genevay et al., 2018; Mena et al., 2018). Given a square matrix (not necessarily a permutation) and a small constant , the Sinkhorn operator normalizes the rows and columns of via the multiplication by two diagonal matrices and , yielding222Note that is applied element-wise to ensures the positivity of the matrix entries.

(10)

The diagonal matrices and are computed iteratively as follows:

(11)
(12)
(13)

with . It can be shown (Mena et al., 2018) that the operator yields a permutation matrix in the limit . Consequently, with a slight abuse of notation (as no longer denotes a permutation), we can rewrite Problem (8) as follows

(14)

The above cost function is differentiable (Luise et al., 2018), and can be thus optimized by gradient descent. An illustrative example of a solution of the proposed approach is presented in Fig. 2.

Stochastic exploration

Problem (14) is highly nonconvex, which may cause gradient descent to converge towards a local minimum. Hence, instead of directly optimizing the cost function in (14), we can optimize its expectation w.r.t. the parameters of some distribution , yielding

(15)

The optimization of the expectation w.r.t. the parameters aims at shaping the distribution so as to put all its mass on a minimizer of the original cost function, thus integrating the use of Bayesian exploration in the optimization process.

A standard choice for in continuous optimization is the multivariate normal distribution, thus leading to and . By leveraging the reparameterization trick (Kingma and Welling, 2014; Figurnov et al., 2018), which boils down to the equivalence

(16)

the above problem can be reformulated as333Note that is the entry-wise (Hadamard) product between matrices.

(17)

where

denotes the multivariate normal distribution with zero mean and unitary variance. The advantage of this reformulation is that the gradient of the above stochastic function can be approximated by sampling from the parameterless distribution

, yielding

(18)

The problem can be thus solved by stochastic gradient descent (Khan et al., 2017). An illustrative application of this approach on a simple one-dimensional nonconvex function is presented in Fig. 3.

(a) Plot of
(b) Contours of
Figure 3: Illustrative example of stochastic exploration. The white circles mark the iterates produced by optimizing (the expectation of ) via stochastic gradient descent. As this optimization is performed in the space of parameters and (see the right panel), the algorithm avoids local minima and successfully converges to the global minimum of both and .

4 Experimental results

We illustrate the behaviour of our approach, named GOT, in terms of both distance metric computation and transportation map inference. We show how, due to the ability of our distance metric to strongly capture structural properties, it can be beneficial in computing alignment between structured graphs even when they are very different. For similar reasons, the metric is able to properly separate instances of random graphs according to their original model. Finally, we show illustrations of the use of transportation plans for signal prediction in simple image classes.

Alignment of structured graphs

Figure 4: Alignment and community detection performance for distorted stochastic block model graphs as a function of the edege removal probability. The first three plots show different error measures (closer to 0 the better); the last one shows the community detection performance in terms of Normalized Mutual Information (NMI closer to 1 the better).

We generate a stochastic block model graph with 40 nodes and 4 communities. A noisy version of this graph is created by randomly removing edges within communities with probability , and edges between communities with increasing probabilities . We then generate a random permutation to change the order of nodes in the noisy graph. We investigate the influence of a distance metric on alignment recovery. We compare three different methods for graph alignment, namely the proposed method based on the suggested Wasserstein distance between graphs (GOT), the proposed stochastic algorithm with the Euclidean distance (L2), and the state-of-the-art Gromov-Wasserstein distance Peyré et al. (2016) Vayer et al. (2018) for graphs (GW), based on the Euclidean distance between shortest path matrices, as proposed in Vayer et al. (2018). We repeat each experiment 50 times, after adjusting parameters for all compared methods, and show the results in Figure 4.

Apart from analysing the distance between aligned graphs with all three error measures, we also evaluate the structural recovery of these community-based models by inspecting the normalized mutual information (NMI) for community detection. While GW slightly outperforms GOT in terms of its own error measure, GOT clearly performs better in terms of all other inspected metrics. In particular, the last plot shows that the structural information is well captured in GOT, and communities are successfully recovered even when the graphs contain a large amount of introduced perturbations.

Graph classification

We tackle the task of graph classification on random graph models. We create 100 graphs following five different models (20 per model), namely Stochastic Block Model Holland et al. (1983) with 2 blocks (SBM2), Stochastic Block Model with 3 blocks (SBM3), random regular graph (RG) Steger and Wormald (1999), Barabasy-Albert model (BA) Barabási and Albert (1999), and Watts-Strogatz model (WS) Watts and Strogatz (1998). All graphs have 20 nodes and a similar number of edges to make the task more meaningful, and are randomly permuted. We use GOT to align graphs, and eventually use a simple non-parametric 1-NN classification algorithm to classify graphs. We compare to several methods for graph alignment: GW Peyré et al. (2016); Vayer et al. (2018), FGM Zhou and De la Torre (2013), IPFP Leordeanu et al. (2009) and RRWM Cho et al. (2010). We present the results in terms of confusion matrices in Figure 5, accompanied with their accuracy scores. GOT clearly outperforms the other methods in terms of general accuracy, with GW and RRWM also performing well, but having more difficulties with SBMs and the WS model. This once again suggests that GOT is able to capture structural information of graphs.

Figure 5: Confusion matrices for 1-NN classification results on random graph models. Rows represent actual classes, while columns are predicted classes: SBM2, SBM3, RG, BA, WS respectively.

Graph signal transportation

Finally, we look at the relevance of the transportation plans produced by GOT in illustrative experiments with simple images. We use the MNIST dataset, which contains around images of size displaying handwritten digits from to , with per class. For each class , we stack all the available images into a feature matrix of size , and we build a 20-nearest-neighbour graph over the resulting feature vectors. Hence, each class of digits is represented by a graph of nodes (i.e., image pixels), yielding aligned graphs .

Each image of a given class can be seen as a smooth signal that lives on the corresponding graph. A transportation plan is then constructed between the source graph (e.g., ) and all other graphs (e.g., , , …, ). Figure 6 shows two original “zero" digits with different inclination, transported to the graphs of all other digits. We can see that the predicted digits are recognisable, because they are adapted to their corresponding graphs, and they further keep the similarity with the original digit in terms of inclination.

We repeated the same experiment on Fashion MNIST, and reported the results in Figure 6. By transporting a “Shirt” image to the graphs of classes “T-shirt”, “Trouser”, “Pullover”, “Dress”, “Coat”, “Sandal”, “Sneaker”, “Bag”, “Ankle boot”, we can remark that the predicted images are still recognisable with a good degree of fidelity. Furthermore, we observe that the white shirt translates to white clothing items, while the textured shirt leads to textured items. This experiment confirms the potential of GOT in graph signal prediction through adaptation of a graph signal to another graph.

Figure 6: First two rows: Original “zero” digits in MNIST dataset, and their images transported to graphs of different digits. The transported digits in each row follow the inclination of the original zero digit. Last two rows: Original “Shirt” images in Fashion MNIST dataset, and their images transported to the graphs of other classes (“T-shirt”, “Trouser”, “Pullover”, “Dress”, “Coat”, “Sandal”, “Sneaker”,“Bag”, “Ankle boot”).

5 Conclusion

We presented an optimal transport based approach for computing the distance between two graphs and the associated transportation plan. Equipped with this distance, we formulated the problem of finding the permutation between two unaligned graphs, and we proposed to solve it with a novel stochastic gradient descent algorithm. We evaluated the proposed approach in the context of graph alignment, graph classification, and graph signal transportation. Our experiments confirmed that GOT can efficiently capture the structural information of graphs, and the proposed transportation plan leads to promising results for the transfer of signals from one graph to another.

References

  • Jovanović and Stanić (2012) I. Jovanović and Z. Stanić. Spectral distances of graphs. Linear Algebra and its Applications, 436(5):1425 – 1435, 2012.
  • Gera et al. (2018) R. Gera, L. Alonso, B. Crawford, J. House, J. A. Mendez-Bermudez, T. Knuth, and R. Miller. Identifying network structure similarity using spectral graph theory. Applied Network Science, 3(1):2, January 2018.
  • Monge (1781) M. Monge. Mémoire sur la théorie des déblais et des remblais. De l’Imprimerie Royale, 1781.
  • Kantorovich (1942) L. Kantorovich. On the transfer of masses: Doklady akademii nauk ussr. pages 227–229, 1942.
  • Villani (2008) C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
  • Peyré and Cuturi (2018) G. Peyré and M. Cuturi. Computational optimal transport. Preprint arXiv:1803.00567, 2018.
  • Yan et al. (2016) J. Yan, X. Yin, W. Lin, C. Deng, H. Zha, and X. Yang. A short survey of recent advances in graph matching. In International Conference on Multimedia Retrieval, pages 167–174, New York, NY, USA, 2016. ACM.
  • Jiang et al. (2017) B. Jiang, J. Tang, C. Ding, Y. Gong, and B. Luo. Graph matching via multiplicative update algorithm. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 3187–3195. Curran Associates, Inc., 2017.
  • Caelli and Kosinov (2004) T. Caelli and S. Kosinov.

    An eigenspace projection clustering method for inexact graph matching.

    IEEE transactions on Pattern Analysis and Machine Intelligence, 26(4):515–519, 2004.
  • Srinivasan et al. (2007) P. Srinivasan, T. Cour, and J. Shi. Balanced graph matching. In B. Schölkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, pages 313–320. MIT Press, 2007.
  • Schellewald and Schnörr (2005) C. Schellewald and C. Schnörr. Probabilistic subgraph matching based on convex relaxation. In Anand Rangarajan, Baba Vemuri, and Alan L. Yuille, editors,

    Energy Minimization Methods in Computer Vision and Pattern Recognition

    , pages 171–186, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
  • Cho et al. (2010) M. Cho, J. Lee, and K. M. Lee. Reweighted random walks for graph matching. In European conference on Computer vision, pages 492–505. Springer, 2010.
  • Zhou and Torre (2016) F. Zhou and F. D. Torre. Factorized graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1774–1789, Sep. 2016.
  • Yu et al. (2018) T. Yu, J. Yan, Y. Wang, W. Liu, and B. Li. Generalizing graph matching beyond quadratic assignment model. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 853–863. Curran Associates, Inc., 2018.
  • Mena et al. (2018) G. Mena, D. Belanger, S. Linderman, and J. Snoek. Learning latent permutations with gumbel-sinkhorn networks. In International Conference on Learning Representations, 2018.
  • Emami and Ranka (2018) P. Emami and S. Ranka. Learning permutations with sinkhorn policy gradient. Preprint arXiv:1805.07010, 2018.
  • Flamary et al. (2014) R. Flamary, N. Courty, A. Rakotomamonjy, and D. Tuia. Optimal transport with Laplacian regularization. In NIPS 2014, Workshop on Optimal Transport and Machine Learning, Montréal, Canada, December 2014.
  • Ferradans et al. (2013) S. Ferradans, N. Papadakis, J. Rabin, G. Peyré, and J.-F. Aujol. Regularized discrete optimal transport. In A. Kuijper, K. Bredies, T. Pock, and H. Bischof, editors, Scale Space and Variational Methods in Computer Vision, pages 428–439, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
  • Gu et al. (2015) J. Gu, B. Hua, and S. Liu. Spectral distances on graphs. Discrete Applied Mathematics, 190-191:56 – 74, 2015.
  • Nikolentzos et al. (2017) G. Nikolentzos, P. Meladianos, and M. Vazirgiannis. Matching node embeddings for graph similarity. In

    Thirty-First AAAI Conference on Artificial Intelligence

    , 2017.
  • Mémoli (2011) F. Mémoli. Gromov–wasserstein distances and the metric approach to object matching. Foundations of computational mathematics, 11(4):417–487, 2011.
  • Peyré et al. (2016) G. Peyré, M. Cuturi, and Solomon J. Gromov-wasserstein averaging of kernel and distance matrices. In Maria Florina Balcan and Kilian Q. Weinberger, editors, International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2664–2672, New York, New York, USA, 20–22 Jun 2016.
  • Cuturi (2013) M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, pages 2292–2300. Curran Associates, Inc., 2013.
  • Alvarez-Melis and Jaakkola (2018) D. Alvarez-Melis and T. S. Jaakkola. Gromov-wasserstein alignment of word embedding spaces. Preprint arXiv:1809.00013, 2018.
  • Vayer et al. (2018) T. Vayer, L. Chapel, R. Flamary, R. Tavenard, and N. Courty. Optimal transport for structured data. Preprint arXiv:1805.09114, 2018.
  • Rue and Held (2005) Havard Rue and Leonhard Held. Gaussian Markov random fields: theory and applications. Chapman and Hall/CRC, 2005.
  • Dong et al. (2016) X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst. Learning laplacian matrix in smooth graph signal representations. IEEE Transactions on Signal Processing, 64(23):6160–6173, 2016.
  • Dempster (1972) A. P. Dempster. Covariance selection. Biometrics, pages 157–175, 1972.
  • Friedman et al. (2008) J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
  • Dong et al. (2018) X. Dong, D. Thanou, M. Rabbat, and P. Frossard. Learning graphs from data: A signal representation perspective. Preprint arXiv:1806.00848, 2018.
  • Shahid et al. (2015) N. Shahid, V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst. Robust principal component analysis on graphs. In Proceedings of the IEEE International Conference on Computer Vision, pages 2812–2820, 2015.
  • Zhu et al. (2003) X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In International conference on Machine learning, pages 912–919, 2003.
  • Takatsu (2011) A. Takatsu. Wasserstein geometry of gaussian measures. Osaka Journal of Mathematics, 48(4):1005–1026, 2011.
  • Sinkhorn (1964) R. Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. The Annals of Mathematical Statistics, 35(2):876–879, 1964.
  • Genevay et al. (2018) A. Genevay, G Peyré, and M. Cuturi. Learning generative models with sinkhorn divergences. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 1608–1617, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018.
  • Luise et al. (2018) G. Luise, A. Rudi, M. Pontil, and C. Ciliberto. Differential properties of sinkhorn approximation for learning with wasserstein distance. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 5859–5870. 2018.
  • Kingma and Welling (2014) D. P. Kingma and M. Welling. Auto-encoding variational bayes. preprint arXiv:1312.6114, 2014.
  • Figurnov et al. (2018) M. Figurnov, S. Mohamed, and A. Mnih. Implicit reparameterization gradients. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 441–452. Curran Associates, Inc., 2018.
  • Khan et al. (2017) M. E. Khan, W. Lin, V. Tangkaratt, Z. Liu, and D. Nielsen. Variational adaptive-newton method for explorative learning. Preprint arXiv:1711.05560, 2017.
  • Holland et al. (1983) P. W Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
  • Steger and Wormald (1999) A. Steger and N. C. Wormald. Generating random regular graphs quickly. Combinatorics, Probability and Computing, 8(4):377–396, 1999.
  • Barabási and Albert (1999) A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
  • Watts and Strogatz (1998) D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’networks. Naturevolume, 393:440–442, June 1998.
  • Zhou and De la Torre (2013) F. Zhou and F. De la Torre. Deformable graph matching. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2922–2929, June 2013.
  • Leordeanu et al. (2009) M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and map inference. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1114–1122. Curran Associates, Inc., 2009.