1 Introduction
With the rapid development of digitisation in various domains, the volume of data increases very rapidly, with many of those taking the form of structured data. Such information is often represented by graphs that capture potentially complex structures. It stays however pretty challenging to analyze, classify or predict graph data, due to the lack of efficient measures for comparing graphs. In particular, the mere comparison of graph matrices is not necessarily a meaningful distance, as different edges can have a diverse importance in the graph. Spectral distances have also been proposed
Jovanović and Stanić (2012); Gera et al. (2018), but they usually do not take into account all the information provided by the graphs, focusing only on the Laplacian matrix eigenvectors and ignoring a large portion of the structure encoded in eigenvectors. In addition to the lack of effective distances, a major difficulty with graph representations is that their nodes may not be aligned, which further complicates graph comparisons.
In this paper, we propose a new framework for graph comparison, which permits to compute both the distance between two graphs under unknown permutations, and the transportation plan for data from one graph to another. Instead of comparing graph matrices directly, we propose to look at the smooth graph signal distributions associated to each graph, and to relate the distance between graphs to the distance between the graph signal distributions. We resort to optimal transport for computing the Wasserstein distance between distributions, as well as the corresponding transportation plan. Optimal transport (OT) was introduced by Monge Monge (1781), and reformulated in a more tractable way by Kantorovich Kantorovich (1942). It has been a topic of great interest from both theoretical and practical points of view (Villani, 2008)
, and has recently been largely revisited with new applications in image processing, data analysis, and machine learning
(Peyré and Cuturi, 2018). Interestingly, the Wasserstein distance takes a closedform expression in our settings, which essentially depends on the Laplacian matrices of the graphs under comparison. We further show that the Wasserstein distance has the important advantage of capturing the main structural information of the graphs.Equipped with this distance, we formulate a new graph alignment problem for finding the permutation that minimises the mass transportation between a "fixed" distribution and a "permuted" distribution. This yields a nonconvex optimization problem that we solve efficiently with a novel stochastic gradient descent algorithm. It permits to efficiently align and compare graphs, and it outputs a structurally meaningful distance as well as a transport plan. These are important elements in graph analysis, comparison or graph signal prediction tasks. We finally illustrate the benefits of our new graph comparison framework in representative tasks such as noisy graph alignment, graph classification, and graph signal transfer. Our results show that the proposed distance outperforms both GromovWasserstein and Euclidean distance for what concerns the graph alignment and graph clustering. In addition, we show the use of transport plans to predict graph signals. To the best of our knowledge, this is the only framework for graph comparison that includes the possibility to adapt graph signals to another graph.
In the literature, many methods have formulated the graph matching as a quadratic assignment problem (Yan et al., 2016; Jiang et al., 2017)
, under the constraint that the solution is a permutation matrix. As this is a NPhard problem, different relaxations have been proposed to find approximate solutions. In this context, spectral clustering
(Caelli and Kosinov, 2004; Srinivasan et al., 2007)emerged as a simple relaxation, which consists of finding the orthogonal matrix whose squared entries sum to one, but the drawback is that the matching accuracy is suboptimal. To improve on this behavior, the semidefinite programming relaxation was adopted to tackle the graph matching problem by relaxing the nonconvex constraint into a semidefinite one
(Schellewald and Schnörr, 2005). Based on the assumption that the space of doublystochastic matrices is a convex hull, the graph matching problem was relaxed into a nonconvex quadratic problem in Cho et al. (2010); Zhou and Torre (2016). A related approach was recently proposed to approximate discrete graph matching in the continuous domain asymptotically by using separable functions Yu et al. (2018). Along similar lines, a Gumbelsinkhorn network was proposed to infer permutations from data Mena et al. (2018); Emami and Ranka (2018). The approach consists of producing a discrete permutation from a continuous doublystochastic matrix obtained with the Sinkhorn operator.
Closer to our framework, some recent works have studied the graph alignment problem from an optimal transport perspective. For example, Flamary et al. Flamary et al. (2014) propose a method based on optimal transport for empirical distributions with a graphbased regularization. The objective of this work is to compute an optimal transportation plan by controlling the displacement of a pair of points. Graphbased regularization encodes neighborhood similarity between samples on either the final position of the transported samples, or their displacement Ferradans et al. (2013). Gu et al. Gu et al. (2015)
define a spectral distance by assigning a probability measure to the nodes via the spectrum representation of each graph, and by using Wasserstein distances between probability measures. This approach however does not take into account the full graph structure in the alignment problem. Nikolentzos
et al. Nikolentzos et al. (2017)proposed instead to match the graph embeddings, where the latter are represented as bags of vectors, and the Wasserstein distance is computed between them. The authors also propose a heuristic to take into account possible node labels or signals.
Another line of works have looked at more specific graphs. Memoli Mémoli (2011) investigates the GromovWasserstein distance for object matching, and Peyré et al. Peyré et al. (2016) propose an efficient algorithm to compute the GromovWasserstein distance and the barycenter of pairwise dissimilarity matrices. The algorithm uses enthropic regularization and Sinkhorn projections, as proposed by Cuturi (2013)
. The work has many interesting applications, including multimedia with pointcloud averaging and matching, but also natural language processing with alignment of word embedding spaces
(AlvarezMelis and Jaakkola, 2018). Vayer et al. Vayer et al. (2018) build on this work and propose a distance for graphs and signals living on them. The problem is given as a combination between the GromovWasserstein of graph distance matrices and the Wasserstein distance of graph signals. However, while the above methods solve the alignment problem using optimal transport, the simple distances between aligned graphs do not take into account its global structure and the methods do not consider the transportation of signals between graphs.In this paper, we propose to resort to smooth graph signal distributions in order to compare graphs, and develop an effective algorithm to align graphs under a priori unknown permutations. The paper is organized as follows. Section 2 details the graph alignment with optimal transport. Section 3 presents the algorithm for solving the proposed approach via a stochastic gradient technique. Section 4 provides an experimental validation of graph matching in the context of graph classification, and graph signal transfer. Finally, the conclusion is given in Section 5.
2 Graph Alignment with Optimal Transport
Despite recent advances in the analysis of graph data, it stays pretty challenging to define a meaningful distance between graphs. Even more, a major difficulty with graph representations is the lack of node alignment, which prevents from performing direct quantitative comparisons between graphs. In this section, we propose a new distance based on Optimal Transport (OT) to compare graphs through smooth graph signal distributions. Then, we use this distance to formulate a new graph alignment problem, which aims at finding the permutation matrix that minimizes the distance between graphs.
Preliminaries
We denote by a graph with a set of vertices and a set of edges. The graph is assumed to be connected, undirected, and edge weighted. The adjacency matrix is denoted by . The degree of a vertex , denoted by , is the sum of weights of all the edges incident to in the graph . The degree matrix is then defined as:
(1) 
Based on and , the Laplacian matrix of is
(2) 
Moreover, we consider additional attributes modelled as features on the graph vertices. Assuming that each node is associated to a scalar feature, the graph signal takes the form of a vector in .
Smooth graph signals
Following (Rue and Held, 2005)
, we interpret graphs as key elements that drive the probability distributions of signals. Specifically, we consider two graphs
and with Laplacian matrices and, and we consider signals that follow the normal distributions defined as
^{1}^{1}1Note that denotes a pseudoinverse operator. (Dong et al., 2016)(3)  
(4) 
The above formulation means that the graph signal values vary slowly between strongly connected nodes (Dong et al., 2016). This assumption is verified for most common graph and network datasets. It is further used in many graph inference algorithms implicitly representing a graph through its smooth signals (Dempster, 1972; Friedman et al., 2008; Dong et al., 2018)
. Furthermore, the smoothness assumption is used as regularization in many graph applications, such as robust principal component analysis
(Shahid et al., 2015) and label propagation (Zhu et al., 2003).Wasserstein distance between graphs
Instead of comparing graphs directly, we propose to look at the signal distributions, which are governed by the graphs. Specifically, we measure the dissimilarity between two graphs and through the Wasserstein distance of the respective distributions and . More precisely, the 2Wasserstein distance corresponds to the minimal “effort” required to transport one probability measure to another with respect to the Euclidean norm (Monge, 1781), that is
(5) 
where denotes the push forward of by the transport map defined on a metric space . For normal distributions such as and , the 2Wasserstein distance can be explicitly written in terms of their covariance matrices (Takatsu, 2011), yielding
(6) 
and the optimal transportation plan is .
The Wasserstein distance captures the structural information of the graphs under comparison. Namely, it is sensitive to differences that cause a global change in the connection between graph components, while it gives less importance to differences that have a small impact on the whole graph structure. Intuitively, this is a direct result from the definition of Wasserstein distance, with bigger changes to the expected behaviour of smooth signals resulting in a larger graph distance. This behaviour is illustrated in Figure 1 by a comparison with a simple distance that is the Euclidean norm between the Laplacian matrices of the graphs.
The optimal transportation plan enables the movement of signals from one graph to another. Namely, this continuous Lipshitz mapping adapts a graph signal to the distribution of another graph, while keeping similarity. This results in simple, but efficient prediction of this signal on another graph.
Note that in our setting a possible alternative to the Wasserstein distance could be the KullbackLeibler divergence, whose expression is explicit for normal distributions. However, the KL divergence goes to infinity when the covariance matrices are singular. Furthermore, the OT framework also produces a transportation plan between distributions, which can prove beneficial in some graph analysis tasks.
Graph alignment
Equiped with a measure to compare graphs through signal distributions, we now propose a new formulation of the graph alignment problem. It is important to note that the graph signal distributions depend on the enumeration of nodes chosen to build and . While in some cases (e.g., dynamically changing graphs, multilayer graphs, etc…) a consistent enumeration can be trivially chosen for all graphs, it generally leads to the challenging problem of estimating an a priori unknown permutation between graphs. In our approach, we are given two connected graphs and , each with distinct vertices and with different sets of edges. We aim at finding the optimal transportation plan from to . Furthermore, in order to take all possible enumerations into account, we define the probability measure of a permuted representation for the graph as
(7) 
where is a permutation matrix. Consequently, our graph alignment problem consists in finding the permutation that minimizes the mass transportation between and , which reads
(8) 
where and is the identity matrix. According to (3), (6), (7), the above distance boils down to
(9) 
The optimal permutation allows us to compare and when the consistent enumeration of nodes is not available. This is however a nonconvex optimization problem that cannot be easily solved with standard tools. In the next section, we present an efficient algorithm to tackle this problem.
3 GOT Algorithm
We propose to solve the OTbased graph alignment problem described in the previous section via stochastic gradient descent. The latter is summarized in Algorithm 1, and its derivation is presented in the remaining of this section.
Optimization
The main difficulty in solving Problem (8) arises from the constraint that is a permutation matrix, since it leads to a discrete optimization problem with a factorial number of feasible solutions. We propose to circumvent this issue through an implicit constraint reformulation. The idea is that the constraints in (8) can be enforced implicitly by using the Sinkhorn operator (Sinkhorn, 1964; Cuturi, 2013; Genevay et al., 2018; Mena et al., 2018). Given a square matrix (not necessarily a permutation) and a small constant , the Sinkhorn operator normalizes the rows and columns of via the multiplication by two diagonal matrices and , yielding^{2}^{2}2Note that is applied elementwise to ensures the positivity of the matrix entries.
(10) 
The diagonal matrices and are computed iteratively as follows:
(11)  
(12)  
(13) 
with . It can be shown (Mena et al., 2018) that the operator yields a permutation matrix in the limit . Consequently, with a slight abuse of notation (as no longer denotes a permutation), we can rewrite Problem (8) as follows
(14) 
The above cost function is differentiable (Luise et al., 2018), and can be thus optimized by gradient descent. An illustrative example of a solution of the proposed approach is presented in Fig. 2.
Stochastic exploration
Problem (14) is highly nonconvex, which may cause gradient descent to converge towards a local minimum. Hence, instead of directly optimizing the cost function in (14), we can optimize its expectation w.r.t. the parameters of some distribution , yielding
(15) 
The optimization of the expectation w.r.t. the parameters aims at shaping the distribution so as to put all its mass on a minimizer of the original cost function, thus integrating the use of Bayesian exploration in the optimization process.
A standard choice for in continuous optimization is the multivariate normal distribution, thus leading to and . By leveraging the reparameterization trick (Kingma and Welling, 2014; Figurnov et al., 2018), which boils down to the equivalence
(16) 
the above problem can be reformulated as^{3}^{3}3Note that is the entrywise (Hadamard) product between matrices.
(17) 
where
denotes the multivariate normal distribution with zero mean and unitary variance. The advantage of this reformulation is that the gradient of the above stochastic function can be approximated by sampling from the parameterless distribution
, yielding(18) 
The problem can be thus solved by stochastic gradient descent (Khan et al., 2017). An illustrative application of this approach on a simple onedimensional nonconvex function is presented in Fig. 3.
4 Experimental results
We illustrate the behaviour of our approach, named GOT, in terms of both distance metric computation and transportation map inference. We show how, due to the ability of our distance metric to strongly capture structural properties, it can be beneficial in computing alignment between structured graphs even when they are very different. For similar reasons, the metric is able to properly separate instances of random graphs according to their original model. Finally, we show illustrations of the use of transportation plans for signal prediction in simple image classes.
Alignment of structured graphs
We generate a stochastic block model graph with 40 nodes and 4 communities. A noisy version of this graph is created by randomly removing edges within communities with probability , and edges between communities with increasing probabilities . We then generate a random permutation to change the order of nodes in the noisy graph. We investigate the influence of a distance metric on alignment recovery. We compare three different methods for graph alignment, namely the proposed method based on the suggested Wasserstein distance between graphs (GOT), the proposed stochastic algorithm with the Euclidean distance (L2), and the stateoftheart GromovWasserstein distance Peyré et al. (2016) Vayer et al. (2018) for graphs (GW), based on the Euclidean distance between shortest path matrices, as proposed in Vayer et al. (2018). We repeat each experiment 50 times, after adjusting parameters for all compared methods, and show the results in Figure 4.
Apart from analysing the distance between aligned graphs with all three error measures, we also evaluate the structural recovery of these communitybased models by inspecting the normalized mutual information (NMI) for community detection. While GW slightly outperforms GOT in terms of its own error measure, GOT clearly performs better in terms of all other inspected metrics. In particular, the last plot shows that the structural information is well captured in GOT, and communities are successfully recovered even when the graphs contain a large amount of introduced perturbations.
Graph classification
We tackle the task of graph classification on random graph models. We create 100 graphs following five different models (20 per model), namely Stochastic Block Model Holland et al. (1983) with 2 blocks (SBM2), Stochastic Block Model with 3 blocks (SBM3), random regular graph (RG) Steger and Wormald (1999), BarabasyAlbert model (BA) Barabási and Albert (1999), and WattsStrogatz model (WS) Watts and Strogatz (1998). All graphs have 20 nodes and a similar number of edges to make the task more meaningful, and are randomly permuted. We use GOT to align graphs, and eventually use a simple nonparametric 1NN classification algorithm to classify graphs. We compare to several methods for graph alignment: GW Peyré et al. (2016); Vayer et al. (2018), FGM Zhou and De la Torre (2013), IPFP Leordeanu et al. (2009) and RRWM Cho et al. (2010). We present the results in terms of confusion matrices in Figure 5, accompanied with their accuracy scores. GOT clearly outperforms the other methods in terms of general accuracy, with GW and RRWM also performing well, but having more difficulties with SBMs and the WS model. This once again suggests that GOT is able to capture structural information of graphs.
Graph signal transportation
Finally, we look at the relevance of the transportation plans produced by GOT in illustrative experiments with simple images. We use the MNIST dataset, which contains around images of size displaying handwritten digits from to , with per class. For each class , we stack all the available images into a feature matrix of size , and we build a 20nearestneighbour graph over the resulting feature vectors. Hence, each class of digits is represented by a graph of nodes (i.e., image pixels), yielding aligned graphs .
Each image of a given class can be seen as a smooth signal that lives on the corresponding graph. A transportation plan is then constructed between the source graph (e.g., ) and all other graphs (e.g., , , …, ). Figure 6 shows two original “zero" digits with different inclination, transported to the graphs of all other digits. We can see that the predicted digits are recognisable, because they are adapted to their corresponding graphs, and they further keep the similarity with the original digit in terms of inclination.
We repeated the same experiment on Fashion MNIST, and reported the results in Figure 6. By transporting a “Shirt” image to the graphs of classes “Tshirt”, “Trouser”, “Pullover”, “Dress”, “Coat”, “Sandal”, “Sneaker”, “Bag”, “Ankle boot”, we can remark that the predicted images are still recognisable with a good degree of fidelity. Furthermore, we observe that the white shirt translates to white clothing items, while the textured shirt leads to textured items. This experiment confirms the potential of GOT in graph signal prediction through adaptation of a graph signal to another graph.
5 Conclusion
We presented an optimal transport based approach for computing the distance between two graphs and the associated transportation plan. Equipped with this distance, we formulated the problem of finding the permutation between two unaligned graphs, and we proposed to solve it with a novel stochastic gradient descent algorithm. We evaluated the proposed approach in the context of graph alignment, graph classification, and graph signal transportation. Our experiments confirmed that GOT can efficiently capture the structural information of graphs, and the proposed transportation plan leads to promising results for the transfer of signals from one graph to another.
References
 Jovanović and Stanić (2012) I. Jovanović and Z. Stanić. Spectral distances of graphs. Linear Algebra and its Applications, 436(5):1425 – 1435, 2012.
 Gera et al. (2018) R. Gera, L. Alonso, B. Crawford, J. House, J. A. MendezBermudez, T. Knuth, and R. Miller. Identifying network structure similarity using spectral graph theory. Applied Network Science, 3(1):2, January 2018.
 Monge (1781) M. Monge. Mémoire sur la théorie des déblais et des remblais. De l’Imprimerie Royale, 1781.
 Kantorovich (1942) L. Kantorovich. On the transfer of masses: Doklady akademii nauk ussr. pages 227–229, 1942.
 Villani (2008) C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
 Peyré and Cuturi (2018) G. Peyré and M. Cuturi. Computational optimal transport. Preprint arXiv:1803.00567, 2018.
 Yan et al. (2016) J. Yan, X. Yin, W. Lin, C. Deng, H. Zha, and X. Yang. A short survey of recent advances in graph matching. In International Conference on Multimedia Retrieval, pages 167–174, New York, NY, USA, 2016. ACM.
 Jiang et al. (2017) B. Jiang, J. Tang, C. Ding, Y. Gong, and B. Luo. Graph matching via multiplicative update algorithm. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 3187–3195. Curran Associates, Inc., 2017.

Caelli and Kosinov (2004)
T. Caelli and S. Kosinov.
An eigenspace projection clustering method for inexact graph matching.
IEEE transactions on Pattern Analysis and Machine Intelligence, 26(4):515–519, 2004.  Srinivasan et al. (2007) P. Srinivasan, T. Cour, and J. Shi. Balanced graph matching. In B. Schölkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, pages 313–320. MIT Press, 2007.

Schellewald and Schnörr (2005)
C. Schellewald and C. Schnörr.
Probabilistic subgraph matching based on convex relaxation.
In Anand Rangarajan, Baba Vemuri, and Alan L. Yuille, editors,
Energy Minimization Methods in Computer Vision and Pattern Recognition
, pages 171–186, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.  Cho et al. (2010) M. Cho, J. Lee, and K. M. Lee. Reweighted random walks for graph matching. In European conference on Computer vision, pages 492–505. Springer, 2010.
 Zhou and Torre (2016) F. Zhou and F. D. Torre. Factorized graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1774–1789, Sep. 2016.
 Yu et al. (2018) T. Yu, J. Yan, Y. Wang, W. Liu, and B. Li. Generalizing graph matching beyond quadratic assignment model. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 853–863. Curran Associates, Inc., 2018.
 Mena et al. (2018) G. Mena, D. Belanger, S. Linderman, and J. Snoek. Learning latent permutations with gumbelsinkhorn networks. In International Conference on Learning Representations, 2018.
 Emami and Ranka (2018) P. Emami and S. Ranka. Learning permutations with sinkhorn policy gradient. Preprint arXiv:1805.07010, 2018.
 Flamary et al. (2014) R. Flamary, N. Courty, A. Rakotomamonjy, and D. Tuia. Optimal transport with Laplacian regularization. In NIPS 2014, Workshop on Optimal Transport and Machine Learning, Montréal, Canada, December 2014.
 Ferradans et al. (2013) S. Ferradans, N. Papadakis, J. Rabin, G. Peyré, and J.F. Aujol. Regularized discrete optimal transport. In A. Kuijper, K. Bredies, T. Pock, and H. Bischof, editors, Scale Space and Variational Methods in Computer Vision, pages 428–439, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
 Gu et al. (2015) J. Gu, B. Hua, and S. Liu. Spectral distances on graphs. Discrete Applied Mathematics, 190191:56 – 74, 2015.

Nikolentzos et al. (2017)
G. Nikolentzos, P. Meladianos, and M. Vazirgiannis.
Matching node embeddings for graph similarity.
In
ThirtyFirst AAAI Conference on Artificial Intelligence
, 2017.  Mémoli (2011) F. Mémoli. Gromov–wasserstein distances and the metric approach to object matching. Foundations of computational mathematics, 11(4):417–487, 2011.
 Peyré et al. (2016) G. Peyré, M. Cuturi, and Solomon J. Gromovwasserstein averaging of kernel and distance matrices. In Maria Florina Balcan and Kilian Q. Weinberger, editors, International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2664–2672, New York, New York, USA, 20–22 Jun 2016.
 Cuturi (2013) M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, pages 2292–2300. Curran Associates, Inc., 2013.
 AlvarezMelis and Jaakkola (2018) D. AlvarezMelis and T. S. Jaakkola. Gromovwasserstein alignment of word embedding spaces. Preprint arXiv:1809.00013, 2018.
 Vayer et al. (2018) T. Vayer, L. Chapel, R. Flamary, R. Tavenard, and N. Courty. Optimal transport for structured data. Preprint arXiv:1805.09114, 2018.
 Rue and Held (2005) Havard Rue and Leonhard Held. Gaussian Markov random fields: theory and applications. Chapman and Hall/CRC, 2005.
 Dong et al. (2016) X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst. Learning laplacian matrix in smooth graph signal representations. IEEE Transactions on Signal Processing, 64(23):6160–6173, 2016.
 Dempster (1972) A. P. Dempster. Covariance selection. Biometrics, pages 157–175, 1972.
 Friedman et al. (2008) J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
 Dong et al. (2018) X. Dong, D. Thanou, M. Rabbat, and P. Frossard. Learning graphs from data: A signal representation perspective. Preprint arXiv:1806.00848, 2018.
 Shahid et al. (2015) N. Shahid, V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst. Robust principal component analysis on graphs. In Proceedings of the IEEE International Conference on Computer Vision, pages 2812–2820, 2015.
 Zhu et al. (2003) X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semisupervised learning using gaussian fields and harmonic functions. In International conference on Machine learning, pages 912–919, 2003.
 Takatsu (2011) A. Takatsu. Wasserstein geometry of gaussian measures. Osaka Journal of Mathematics, 48(4):1005–1026, 2011.
 Sinkhorn (1964) R. Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. The Annals of Mathematical Statistics, 35(2):876–879, 1964.
 Genevay et al. (2018) A. Genevay, G Peyré, and M. Cuturi. Learning generative models with sinkhorn divergences. In Amos Storkey and Fernando PerezCruz, editors, Proceedings of the TwentyFirst International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 1608–1617, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018.
 Luise et al. (2018) G. Luise, A. Rudi, M. Pontil, and C. Ciliberto. Differential properties of sinkhorn approximation for learning with wasserstein distance. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 5859–5870. 2018.
 Kingma and Welling (2014) D. P. Kingma and M. Welling. Autoencoding variational bayes. preprint arXiv:1312.6114, 2014.
 Figurnov et al. (2018) M. Figurnov, S. Mohamed, and A. Mnih. Implicit reparameterization gradients. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 441–452. Curran Associates, Inc., 2018.
 Khan et al. (2017) M. E. Khan, W. Lin, V. Tangkaratt, Z. Liu, and D. Nielsen. Variational adaptivenewton method for explorative learning. Preprint arXiv:1711.05560, 2017.
 Holland et al. (1983) P. W Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
 Steger and Wormald (1999) A. Steger and N. C. Wormald. Generating random regular graphs quickly. Combinatorics, Probability and Computing, 8(4):377–396, 1999.
 Barabási and Albert (1999) A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
 Watts and Strogatz (1998) D. J. Watts and S. H. Strogatz. Collective dynamics of ‘smallworld’networks. Naturevolume, 393:440–442, June 1998.
 Zhou and De la Torre (2013) F. Zhou and F. De la Torre. Deformable graph matching. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2922–2929, June 2013.
 Leordeanu et al. (2009) M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and map inference. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1114–1122. Curran Associates, Inc., 2009.
Comments
There are no comments yet.