1 Introduction
The problem of estimating the correspondence between two graphs has a long history and has a wide range of applications, including multiplelayer social network analysis, pattern recognition and computer vision, biomedical image analysis, document processing and analysis and so on. For a comprehensive review of these and more applications, see
Conte et al. (2004) and Fishkind et al. (2012).The prototype graph matching problem has the following basic form. Suppose we have participating individuals numbered as in the network (or called “graph” – in this paper, we shall use “network” and “graph” interchangeably) as nodes or vertices. The data we collect describe the interactions or relationships between them, called edges. The edges may be binary or weighted, depending on the context, and the same set of individuals form two networks, but at least one of the networks has the order of its nodes shuffled, and the correspondence between the nodes of the two networks is missing. The primary goal of graph matching problem is to recover the lost node mapping, such that after aligning the order of nodes in one network to those in the other network would result the same or a similar networks. The data we observe are the two networks with their node orders shuffled, and further the data may be contaminated by random noises, as we shall explain right next.
There are two main versions of this problem. The exact graph matching problem assumes no randomness or noise in graph generation but only that the two graphs are exactly identical under the hidden true node correspondence. The task is to recover the true map. It is wellknown to be an NP problem, despite the recent significant advances (Babai, 2016; Svensson and Tarnawski, 2017)
showing that it could be solved in quasipolynomial time. The other version is inexact graph matching. This version assumes that data are observed with random noise. For example, a popular assumption is that the true graphs are edge probability matrices and only their Bernoulli realizations are observable – moreover, the generations of the corresponding edges in the two graphs may be dependent, such as the model studied in
Lyzinski et al. (2014b).The existing research on the exact and inexact graph matching problems is not superficially nested as the appearance of them may suggest, but rather pointing to distinct directions. Research on the former largely focused on the worstcase complexity; while the latter has usually been discussed with structural assumptions. Network data with nodes involved apparently has complexity. But with structures assumed, this is significantly reduced. In this paper, we shall concentrate our attentions on the lowrank case, in which one may roughly think that data essentially reside in space, where is the dimension of assumed structures, thus solving the problem efficiently is possible. The lowrank model is more universal than it seems. Letting the rank grow, we may hope to consistently approach very general network structures (Bickel and Chen, 2009; Gao et al., 2015; Xu, 2017). Recent advancements in matrix analysis further suggest that lowrank models are decent approximations to much more general models, for example, Udell and Townsend (2017) claims that smooth structures can be approximated by models with rank growth rate.
By far, we have been discussing the unsupervised graph matching problem. Its counterpart – the seeded graph matching problem is also a popular topic. Seeds refer to subsets of nodes in the two networks, the true correspondence relationship between the members of which are known. It is intuitively understandable that a “representative” pair of seed node set, even with small cardinality, may dramatically lower the difficulty of the problem, making it not only polynomial. It is known that the seeded graph matching problems for graphs with lowrank structures are usually efficiently solvable (Lyzinski et al., 2014b, a, c). Learning any seed node in the unseeded context, however, seems difficult, as an efficient method that solves this problem may lead to P=NP.
In existing literature to date, the difficulty of the unseeded graph matching problem remains unknown. Despite the conjecture that it should be efficiently solvable, currently there exists no such provable method. The unknown node correspondence has been playing the main obstacle in the way. The problem can be translated into a point registration problem in space, but the two point clouds to be matched are further gapped by an unknown orthonormal transformation on one of them, thus it cannot be solved by directly applying a Hungarian algorithm. Attempts to solve this problem by far almost all involve an optimization partially over the unknown correspondence, and the focuses have been relaxing it. This makes these methods hard to analyze and many of their computations costly.
In this paper, we present a novel method that solves this problem in polynomial time with theoretical guarantees under quite mild structural assumptions. Our approach is distinct from the majority, if not all, of existing methods in that we directly pursue the lowrank structure thus completely avoided the optimization over the permutation matrix in the main stages of our method – only except that we run the Hungarian algorithm or its alternative just once at the end. Our method is simple, scalable and convenient to analyze, not only utilizing our analysis, but potentially also enabling a rich variety of subsequent estimations and inferences.
2 Problem formulation
We represent a graph of nodes by its by adjacency matrix , where if there is an edge from node to node , and otherwise. For simplicity, in this paper we only discuss binary and symmetric graphs with independent edge generations, that is, for every pair such that , generate data by Bernoulli, where is the edge probability matrix, and for every , where and , is independent of . This is a popular model basis studied in many network inference papers such as Bickel and Chen (2009); Wolfe and Olhede (2013); Gao et al. (2015) and Zhang et al. (2017).
If the edge probabilities are completely arbitrary numbers and unrelated to each other, no meaningful inferences would be possible, so now we quantitatively define what we mean by “network structures”. According to the AldousHoover representation (Aldous, 1981; Hoover, 1979), the edge probability matrix of an exchangeable network can be written as:
Definition 1 (AldousHoover).
For any exchangeable network, there exists a symmetric function
, called the “graphon function”, and a set of i.i.d. random variables
Uniform, such that the edge probability matrix can be represented byFor directed coexchangeable networks, simply remove the symmetry requirement on and can be represented by , where and ’s are independent standard uniform random variables. Notice that both and the latent positions , and if applicable, are notestimable due to identifiability issues, unless some strong additional model assumptions are assumed (Airoldi et al., 2013). Indeed, many existing work on graphon estimation tend to assume smoothness on , so will this paper, but such assumptions usually only help us in indirect ways, such as elucidated in Gao et al. (2015) and Zhang et al. (2017). Notice that the smoothness in does not mean that the resulting distribution of the elements of is continuous – a quick example is the ErdösRenyi model, in which .
The AldousHoover representation has a more specific form for lowrank networks. Here we impose our lowrank assumption on , and the lowrankness is straightforwardly inherited by the probability matrix generated based on . We have the functional spectral decomposition of as follows:
(1) 
where is the
th largest nonzero eigenvalue and
is its corresponding eigenfunction that is defined on
and . In this paper, we only consider piecewise Lipschitz universally bounded between 0 and 1, and one can show that this implies the universal boundedness of all the eigenfunction ’s piecewise Lipschitz and thus their are universal boundedness.Now, based on (1), we may represent from a lowrank graphon as follows:
(2) 
where and are defined as , , and , respectively. Similar representation also appeared recently in Lei (2018b). When the graph is positive semidefinite (PSD) then and simply
This model is called random dotproduct graph (RDPG) (Young and Scheinerman, 2007; Athreya et al., 2017). For general , where , we may separately estimate for the positive and negativesemidefinite parts. For simplicity of illustration, in the statement of our method and the theory, we focus on the PSD case for simplicity.
Now we are ready to formally introduce the graph matching problem in the lowrank setting. Suppose we have two graphs and generated based on the same lowrank probability matrix as in (2), but the rows and columns of the second network is permuted by an unknown permutation . Denoting the two edge probability matrices by and , we have
and defining , the induced data generation can be described as follows:
(3)  
(4) 
where ranges over all index pairs satisfying .
If we have access to and the nonzero eigenvalues are distinct, then we may exactly recover and only up to an multiplier on each of its columns. For nottoolarge , we can exhaust all possible sign flip combinations on the columns of , and, for each of them, we run a Hungarian algorithm to match the rows of the columnwise sign flipped to . This further leads to an exact recovery of the correspondence true node correspondence . But the problem would seemingly grow significantly less trivial, even in the oracle, if has repeated nonzero eigenvalues. The estimation may only get to the spanning linear space of the corresponding columns in , and now the rows of and are only matchable up to an unknown orthonormal transformation on the columns, that is . Another source that contribute to the introduction of the latent orthonormal transformation is the concentration inequalities regarding and . In practice, we never observe and may only work and the estimated from decomposing . By DavisKahan type theorem (Yu et al., 2014) and concentration results of eigenvalues, we can only approximate by from up to an unknown transform such that .
Now it is clear that the unseeded lowrank graph matching problem can be translated into an unsupervised point registration problem. Suppose there are two sets of points in a bounded set of . The two data sets and
are i.i.d. samples random vectors
and , respectively, where is an unknown orthonormal transformation. In this paper, distinct from most existing work, we do not impose any smoothness assumption on the distribution of , but instead only assume its universal boundedness, which is naturally satisfied when the point registration problem origins from the lowrank graph matching problem. The main task is to estimate both the transform and the permutation matrix that minimize the MSE loss function:(5) 
where the rows of and are and ’s. As mentioned earlier, we may not have access to and , but instead only observe errorcontaminated versions of them. Moreover, the measurement errors may be dependent across sample points, but in fact this does not pose additional challenge to our method. For this reason, when introducing our method, we focus our attentions on the vanilla form of the unsupervised point registration problem (5).
3 Related work
In this section, we briefly review some popular existing methods for point registration and graph matching, respectively. Arguably one of the most popular point registration methods is Iterative Closest Point (ICP) (Ezra et al., 2006; Du et al., 2010; Maron et al., 2016). It solves the optimization problem (5) by iteratively optimizing over and . This method is simple yet popular. An ICP equipped with Hungarian algorithm costs
in each iteration, making it hard for large data sets. Another popular method is kernel correlation (KC). KC matches the two distributions by minimizing the integrated difference between their density functions empirically approximated by kernel density estimations (KDE). KC is originally designed only for continuous distributions, and it has a distinct form for discrete distributions. It is substantively difficult to apply KC to distributions of mixed continuity types.
Many existing methods on graph matching are based on seed nodes. Representative seed nodes may significantly reduce the difficulty of the problem and allows for efficient method for estimating the matching of graphs of general structures. On the other hand, most existing methods for unseeded graph matching focus on relaxing in the following optimization problem
(6) 
from permutation into a continuum such as doubly stochastic relaxations, see Lyzinski (2016) and Vogelstein et al. (2015). As suggested by Lyzinski et al. , convex relaxations on almost never find the global optimality unless initialized already close to the optimal solution.
4 Our method
To introduce our method, we start with the observation that the main challenge in solving (5) lies in the optimization over the permutation matrix . This is a chickenandegg problem. Notice that if either the optimal or even part of the optimal is known or wellestimated, the estimation of the remaining parameters would be greatly simplified. This motivates us to consider the possibility of estimating only one of them and bypassing the optimization over the other one. Between and , clearly is a more “essential” parameter, because may look very differently from realization to realization and even have different dimensions if we consider the more general version of the point registration problem with different sample sizes; where as determines how the two distributions should be distorted to match up with each other.
The core idea of our method is that instead of aiming at matching up the individual points, we match the two distributions. To serve this purpose, we design a discrepancy measure that describes the difference between the two distributions as a function of . This naturally introduces an optimization problem over only , circumventing the optimization over since the empirical version of any such discrepancy measure would depend on data only through the empirical distributions of and , invariant to the order in which we observe the individual points and thus invariant to .
We now focus on the design of the discrepancy measure between distributions. Recall that we desire this measure to be welldefined for all distribution continuity types. One natural choice is to match their moments. Specifically, we want to match
all their moments simultaneously, since for any , one may always find random vectors and such that all their th moments match, but at least one of their st moments do not match. This naturally leads us to consider momentgenerating transformations.Among the arguably most popular choices, including moment generating function (MGF), Laplace transform and characteristic function (CF), we choose to work with Laplace transform for its convenient inversion formula form that significantly facilitates theoretical analysis. MGF’s known inversion formula
(Post, 1930; Widder, 2015)is an infinite series; while CF’s inversion formula for recovering cumulative distribution function (CDF) is defined in a limit form (Lévy’s theorem, see
Durrett (2010)). The complicated CDF inversion formula of CF brings technical obstacles in analysis. Conventional Laplace transforms are defined only for positive random variables and vectors, but we will see that both the Laplace transform and its inversion formula are welldefined for universally bounded random variables and vectors, too. For a random vector satisfying for a universal constant , its Laplace transform is defined by(7) 
where . The inversion formula for (7) that recovers ’s joint CDF is
(8) 
where the integration limit means integrating on the line segment connecting the two points . Given two random vectors and , where the former is tuned by an orthonormal transform , we wish to estimate that matches these two distributions. For this purpose, we define a loss function that describes the discrepancy between the two functions and . Inspired by (8), we design the population version of our loss function as follows
(9) 
where is a tuning parameter that will be set by the theory. Clearly, under our assumption that the two distributions under study are matchable, the only ’s that achieve the minimum of 0 of (9) for all are those that match the distribution of to , that is, . The form (9) is intractable since it contains unknown components and and their integration over a continuum. Therefore, in practice, we work with its sample version. In order to realize the integration over , we sample by the following importance sampling:

Define
Clearly
as with a fixed .

, where is independent of any other , for all

For each , set , where is a preset constant, and the imaginary part is sampled from the continuous distribution with the density function :
The importance sampling scheme reflects the fact that as the imaginary part of drifts away from 0, the influence of the Laplace transform on the shape of the CDF function decreases. We are now ready to define the sample version of our loss function (9) as follows:
(10) 
where the rows of and are independent samples from the distributions of and , respectively, that is:
In practice, the factor introduced by importance sampling in (10) can be ignored. Notice that is smooth for all that . After is estimated, we may simply run a Hungarian algorithm to obtain the mapping between the points. In the unseeded lowrank graph matching context, we may obtain estimated latent node positions by directly decomposing the adjacency matrices:
(11)  
(12) 
where and . We then solve the following point registration problem:
(13) 
for permutation matrix and .
Computationally, our method demands optimization of the function over , where is the collection of all orthonormal matrices. For each , the cost to evaluate is , where recall that and are sample sizes of the data sets and we have control over , the number of ’es we shall sample elementwise from . This contrasts the cost of estimating the best match within each iteration in ICP, and moreover gives us the flexibility of controlling the tradeoff between computation time and accuracy. Compared to KC, our method can handle continuous, discrete or mixed distributions by a unified formulation. Compared to both ICP and KC, our method is backed by a consistency guarantee with an explicit error rate. The results will be presented in Section 5.
Before concluding the description of our method, we briefly explain two small but important details. First, our criterion (10) does not require equal sample sizes. If the sample sizes are different, we may simply bootstrap the smaller sample, and the Hungarian algorithm will naturally produce a manytoone estimated map, which is desired. The second topic is the choice of . At first it may seem natural to choose as an increasing function of , as did in an earlier version of this paper. However, doing so would likely greatly depreciate the guaranteed error rate. If we recall the idea of matching moments that motivated our method, we realize an arguable intuition that all the moments can be determined by the curvature of the Laplace transform around , and as we travel far away with large in the tail, the shape of the Laplace transforms there might possibly grow less relevant. In Section 5, we shall see that fixing helps us to achieve a nearly tight error bound.
5 Theory
By Zhang et al. (2014) and Anderson et al. (1986), with probability where as , for the th largest nonzero eigenvalue of the matrix , denoted by , we have
(14)  
(15) 
Without loss of generality, we may organize the columns of and in (3), (4) and (11), (12) to be put in the order aligned to the true or estimated leading nonzero eigenvalues of the corresponding matrices from which those ’s are decomposed, then by (14) and (15), we have
Next, combining the results of Yu et al. (2014), Lei et al. (2015), (14) and (15), there exists unknown orthonormal matrices , such that with high probability,
(16)  
(17) 
Moreover, if BlockDiagonal, then we may further shrink the sample spaces of and as follows:
where , and , respectively, because we can apply Yu et al. (2014)
on positive and negative eigenvalues and their corresponding eigenvectors, respectively. This reduction was not explicitly emphasized in network analysis literature, mostly works on community detection, because the orthonormal transform
is nuisance and has no impact on the subsequent estimation and inference steps. But in the matching problem, the dimensionality of is determining on both accuracy and computation cost.We now present the consistency theory of our method. The proofs are in the Appendix. First we present the uniform concentration inequality of our proposed criterion to its population version.
Theorem 1 (Uniform concentration of the loss function).
Given the distributions of universally bounded random vectors and in , there are universal constants such that
(18) 
Moreover,
(19) 
where is a constant depending only on .
It is worth noting that (19) is stated with unknown transforms and . This means that we know that with high probability and will be close, but we cannot disentangle and mixed inside the optimal , but this is fine. Recall that our ultimate goal is to accurately estimate the point registration . Our theory only demands that is close to some optimal solution , if the optimal solution is not unique.
Next we state a crucial regularity condition regarding the shape of our loss function near its minimum. This property is satisfied by a wide range of frequently used distributions in practice.
Definition 2 (Sharp Slope Condition).
A function is said to satisfy Steep Slope Condition, if there exist universal constants such that for all , for some compact , the following properties hold:

The function is minimized to 0:
and the minimum is attained only at a finite number of ’s

For any
and , we have
The Sharp Slope Condition is satisfied by with respect to
restricted in orthonormal transformations for many distributions, examples include multinomial distribution and multivariate normal distribution – notice the latter is not within the range of the consideration of our current theory as the distribution is unbounded, but we believe the it can be expanded to subgaussian distributions. If a function satisfies Sharp Slope Condition, then optimizing a sample version of the function that has uniform convergence would yield an estimation decently close to the true optimal solution.
Theorem 2.
Suppose the function satisfies Sharp Slope Condition, and it has a uniformly concentrating sample version such that
Assume that and the time cost to evaluate is polynomial in , when is the sample size associated with . Then there exists a polynomial algorithm, such its output
is close to the optimal solution, in the sense that
If satisfies Sharp Slope Condition, where we can regard and to be some parameterization of , such as when we can parameterize SO by the rotation angle , then using the results of Levina and Bickel (2001), Fournier and Guillin (2015) and Lei (2018a), for the equal sample size case , we have
Theorem 3.
Suppose the parameterization of as a function of satisfies the Sharp Slope Condition. Let be the output of the algorithm in Theorem 2, and then define to be the optimal permutation under MSE estimated by the Hungarian algorithm to match the rows of and , we have
Notice that when , the term is dominating. The error bound by Theorem 3 seems tight when among methods that assume population structures, as the concentration of the empirical distribution to the population distribution is likely unavoidable.
Theorem 3 immediately implies the control on graph matching error:
(20) 
6 Numerical examples
In this section, we test our method and two other popular benchmark methods for unseeded graph matching on three example lowrank graphs. Graph 1 is generated from the graphon of a stochastic block model with communities of equal sizes. Withincommunity probabilities of community is and betweencommunity probabilities are . Graphon 2 is a more general lowrank graph with distinct nonzero eigenvalues. The graphon function is defined by . Graphon 3 is relatively most difficult for all methods, as it has repeated nonzero eigenvalues. The leading eigenvalues are and their corresponding eigenfunctions are , , and
. Graphs generated from graphon 1 elementwise follow Bernoulli distributions, and graphs generated from both graphons 2 and 3 are elementwise contaminated by
random noises. We repeat the experiment 30 times for each graphon. In each experiment, we randomly generate two independent realizations from the graphon, and shuffle the node order of one of the adjacency matrices by a randomly chosen permutation matrix unknown to all the compared methods. We measure the performance of the methods by RMSE:In all these experiments, we run our method with the following random starts: we initialize the orthonormal matrix in our loss function from Givens rotations (Merchant et al., 2018):
Where are defined as follows; where and , and the matrix excluding the st and
th rows and columns is an identity matrix
. We start our method with , where we choose . Notice that this is feasible since all the tested graphons have at most nonzero leading eigenvalues. If is large, we may reduce to and also considering subsampling from all the possible configurations of and ’s. In this simulation study, we fix and . For , we used LAPJV to perform the final Hungarian algorithm match, and for all the other (smaller) ’s, we used MUNKRES.The bench mark methods we compared to are Fishkind et al. (2012) and Vogelstein et al. (2015) using the MATLAB codes downloaded from the authors’ websites.
Graphon  net size  Our method  SGM  FAQ 

RMSE  5.02(0.28)  6.03(0.24)  31.48(0.01)  
0.00(0.09)  0.00(0.05)  31.89(0.00)  
0.00(0.00)  0.00(0.00)  31.59(0.00)  
0.00(0.00)  0.00(0.00)  31.60(0.00)  
0.00(0.00)  0.00(0.00)  31.61(0.00)  
Time  5.57(0.36)  0.14(0.00)  0.17(0.00)  
19.57(0.07)  0.96(0.01)  1.69(0.01)  
27.25(0.06)  4.87(0.02)  8.64(0.02)  
34.84(0.08)  25.22(0.04)  49.98(0.08)  
55.31(0.08)  141.21(0.11)  327.48(0.26) 
Graphon  net size  Our method  SGM  FAQ 

RMSE  8.62(0.17)  6.99(0.06)  70.30(0.01)  
6.18(0.11)  4.96(0.02)  70.52(0.00)  
4.38(0.04)  3.91(0.01)  70.60(0.00)  
3.06(0.02)  3.18(0.00)  70.64(0.00)  
2.35(0.01)  2.46(0.00)  70.67(0.00)  
Time  5.33(0.02)  0.16(0.00)  0.28(0.00)  
12.55(0.03)  1.16(0.00)  1.90(0.01)  
18.20(0.10)  5.96(0.01)  9.72(0.02)  
29.26(0.07)  30.67(0.03)  57.29(0.04)  
50.30(0.04)  189.78(0.14)  376.96(0.18) 
Graphon  net size  Our method  SGM  FAQ 

RMSE  3.10(0.05)  5.88(0.20)  11.63(0.06)  
2.46(0.07)  5.74(0.17)  11.62(0.02)  
1.99(0.10)  6.03(0.15)  11.64(0.01)  
1.74(0.08)  12.23(0.13)  11.88(0.01)  
1.51(0.07)  12.24(0.03)  11.99(0.00)  
Time  18.81(0.38)  0.14(0.00)  0.82(0.01)  
42.30(0.45)  1.01(0.02)  5.74(0.03)  
68.00(0.56)  5.95(0.09)  27.87(0.17)  
109.94(0.61)  16.95(0.32)  159.75(0.62)  
194.93(0.87)  121.93(0.39)  1028.84(2.98) 
Graphon 1 is relatively easy for all methods, and we observed that our method and SGM quickly became perfectly accurate from a quite small sample size onward. Graphon 2 is slightly harder and our method performed similarly to SGM. On the most challenging model Graphon 3, our method shows its advantage in exploiting the lowrank structure and has a diminishing MSE, whereas SGM seemed to become increasingly disoriented in its growing search space of optimization. In all examples, FAQ did not perform well, possibly due to the poor initialization at .
On computational efficiency, we observe that with a fixed , our method’s time complexity increases linearly with the sample size . Recall that we have the flexibility to tune the increment rate of with , so in the extreme case if we have to handle an increasing sample size with a fixed target error bound, then we may use a fixed to achieve linear computational time. Also recall that increasing faster than , however, will not further improve error rate since is now the bottleneck factor.
7 Discussions
In this section, we present discussions on various aspects of our method and some future directions along the two lines of point registration and graph matching.
First, on the point registration side: our method can handle more general invertible linear transformations than orthonormal
, but in order to retain the satisfaction of the Steep Slope Condition and our analysis, some regularity assumptions on the family of transformations would be necessary. Similarly, further extension to parameterized nonlinear transformation can be considered. We envision the even further extension to general nonparametric transformations to remain challenging.In this paper, we have been placing our attentions solely to matching points from universally bounded distributions, which is indeed a natural feature of positions by decomposing moderately regular graphons. We conjecture that our theoretical results might be generalizable to subGaussian and and possibly other lighttailed distributions, as many preliminary empirical evidences encouragingly suggest, and we are currently working on this direction. On the other hand, if the distributions to be matched are extremely heavytailed that even the first moment does not exist, then we may need a new goodness measurement as the population Wasserstein distance may not be always welldefined unless the two distributions are already perfectly matched up. If the population version of some criterion is notdefined, the meaningfulness of its sample version, despite its possible existence, would be under doubts.
Matching the points generated from different graphons, however, remains a major challenge. Notice that our criterion as a discrepancy measure between the two distributions’ Laplace transforms is, by itself, a valid statistical distance, as it satisfies triangular inequality and other requirements for a distance. With nonmatchable distributions, optimizing our criterion and optimizing other criteria such as Wasserstein distance may find different estimated transforms of one data set that “best matches” the other in their own senses, and subsequently, the resulting matches are likely different. The potential presence of outliers is a similar but different topic – the two underlying point generating distributions may still be matchable, but now except for a few outliers. Another closely related but different topic is outliers – our method might not be robust against outliers, and we recommend users to detect and eliminate outliers in the data preprocessing procedure.
Then, on the graph matching side. First, we made the assumption that there are optimal and that can perfectly match up the true latent node positions, that is, , under equal sample sizes. This assumption is by no means substantive and the assumption could be easily relaxed to networks whose latent node positions in the graphon model and or arbitrary dependence structure between each pair of nodes belonging to different networks, while the internal independence within each network holds: and for any and .
Our method is certainly not designed to match up general graphons of full rank, despite it may find competitive solutions under some full rank graphons, if the leading few eigenfunctions can already differentiate the different roles of the nodes in the networks well. In such cases, despite the other eigenvalues and eigenfunctions may matter much for purposes such as graphon estimation, the leading ones may already suffice to provide accurate node matching. But the general problem of matching full rank graphons is outside the scope of this paper.
References
 Airoldi et al. (2013) E. M. Airoldi, T. B. Costa, and S. H. Chan. Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In Advances in Neural Information Processing Systems, pages 692–700, 2013.

Aldous (1981)
D. J. Aldous.
Representations for partially exchangeable arrays of random
variables.
Journal of Multivariate Analysis
, 11(4):581–598, 1981.  Anderson et al. (1986) J. B. Anderson, T. Aulin, and C.E. Sundberg. Introduction. In Digital Phase Modulation, pages 1–14. Springer, 1986.
 Athreya et al. (2017) A. Athreya, D. E. Fishkind, K. Levin, V. Lyzinski, Y. Park, Y. Qin, D. L. Sussman, M. Tang, J. T. Vogelstein, and C. E. Priebe. Statistical inference on random dot product graphs: a survey. arXiv preprint arXiv:1709.05454, 2017.

Babai (2016)
L. Babai.
Graph isomorphism in quasipolynomial time.
In
Proceedings of the fortyeighth annual ACM symposium on Theory of Computing
, pages 684–697. ACM, 2016.  Bickel and Chen (2009) P. J. Bickel and A. Chen. A nonparametric view of network models and newman–girvan and other modularities. Proceedings of the National Academy of Sciences, pages pnas–0907096106, 2009.

Conte et al. (2004)
D. Conte, P. Foggia, C. Sansone, and M. Vento.
Thirty years of graph matching in pattern recognition.
International journal of pattern recognition and artificial intelligence
, 18(03):265–298, 2004.  Du et al. (2010) S. Du, N. Zheng, S. Ying, and J. Liu. Affine iterative closest point algorithm for point set registration. Pattern Recognition Letters, 31(9):791–799, 2010.
 Durrett (2010) R. Durrett. Probability: theory and examples. Cambridge university press, 2010.
 Ezra et al. (2006) E. Ezra, M. Sharir, and A. Efrat. On the icp algorithm. In Proceedings of the twentysecond annual symposium on Computational geometry, pages 95–104. ACM, 2006.
 Fishkind et al. (2012) D. E. Fishkind, S. Adali, H. G. Patsolic, L. Meng, V. Lyzinski, and C. E. Priebe. Seeded graph matching. arXiv preprint arXiv:1209.0367, 2012.
 Fournier and Guillin (2015) N. Fournier and A. Guillin. On the rate of convergence in wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(34):707–738, 2015.
 Gao et al. (2015) C. Gao, Y. Lu, H. H. Zhou, et al. Rateoptimal graphon estimation. The Annals of Statistics, 43(6):2624–2652, 2015.
 Hoover (1979) D. N. Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 2, 1979.
 Lei (2018a) J. Lei. Convergence and concentration of empirical measures under wasserstein distance in unbounded functional spaces. arXiv preprint arXiv:1804.10556, 2018a.
 Lei (2018b) J. Lei. Network representation using graph root distributions. arXiv preprint arXiv:1802.09684, 2018b.

Lei et al. (2015)
J. Lei, A. Rinaldo, et al.
Consistency of spectral clustering in stochastic block models.
The Annals of Statistics, 43(1):215–237, 2015.  Levina and Bickel (2001) E. Levina and P. Bickel. The earth mover’s distance is the mallows distance: Some insights from statistics. In null, page 251. IEEE, 2001.
 Lyzinski (2016) V. Lyzinski. Information recovery in shuffled graphs via graph matching. arXiv preprint arXiv:1605.02315, 2016.
 (20) V. Lyzinski, D. Fishkind, M. Fiori, J. Vogelstein, C. Priebe, and G. Sapiro. Graph matching: Relax at your own risk. IEEE Transactions on Pattern Analysis & Machine Intelligence, (1):1–1.
 Lyzinski et al. (2014a) V. Lyzinski, S. Adali, J. T. Vogelstein, Y. Park, and C. E. Priebe. Seeded graph matching via joint optimization of fidelity and commensurability. arXiv preprint arXiv:1401.3813, 2014a.

Lyzinski et al. (2014b)
V. Lyzinski, D. E. Fishkind, and C. E. Priebe.
Seeded graph matching for correlated erdösrényi graphs.
Journal of Machine Learning Research
, 15(1):3513–3540, 2014b.  Lyzinski et al. (2014c) V. Lyzinski, D. L. Sussman, D. E. Fishkind, H. Pao, J. T. Vogelstein, and C. E. Priebe. Seeded graph matching for large stochastic block model graphs. stat, 1050:12, 2014c.
 Maron et al. (2016) H. Maron, N. Dym, I. Kezurer, S. Kovalsky, and Y. Lipman. Point registration via efficient convex relaxation. ACM Transactions on Graphics (TOG), 35(4):73, 2016.
 Merchant et al. (2018) F. Merchant, T. Vatwani, A. Chattopadhyay, S. Raha, S. Nandy, R. Narayan, and R. Leupers. Efficient realization of givens rotation through algorithmarchitecture codesign for acceleration of qr factorization. arXiv preprint arXiv:1803.05320, 2018.
 Post (1930) E. L. Post. Generalized differentiation. Transactions of the American Mathematical Society, 32(4):723–781, 1930.
 Svensson and Tarnawski (2017) O. Svensson and J. Tarnawski. The matching problem in general graphs is in quasinc. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on, pages 696–707. Ieee, 2017.
 Udell and Townsend (2017) M. Udell and A. Townsend. Nice latent variable models have logrank. arXiv preprint arXiv:1705.07474, 2017.
 Vogelstein et al. (2015) J. T. Vogelstein, J. M. Conroy, V. Lyzinski, L. J. Podrazik, S. G. Kratzer, E. T. Harley, D. E. Fishkind, R. J. Vogelstein, and C. E. Priebe. Fast approximate quadratic programming for graph matching. PLOS one, 10(4):e0121002, 2015.
 Widder (2015) D. V. Widder. Laplace transform (PMS6). Princeton university press, 2015.
 Wolfe and Olhede (2013) P. J. Wolfe and S. C. Olhede. Nonparametric graphon estimation. arXiv preprint arXiv:1309.5936, 2013.
 Xu (2017) J. Xu. Rates of convergence of spectral methods for graphon estimation. arXiv preprint arXiv:1709.03183, 2017.
 Young and Scheinerman (2007) S. J. Young and E. R. Scheinerman. Random dot product graph models for social networks. In International Workshop on Algorithms and Models for the WebGraph, pages 138–149. Springer, 2007.
 Yu et al. (2014) Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the davis–kahan theorem for statisticians. Biometrika, 102(2):315–323, 2014.
 Zhang et al. (2014) Y. Zhang, E. Levina, and J. Zhu. Detecting overlapping communities in networks using spectral methods. arXiv preprint arXiv:1412.3432, 2014.
 Zhang et al. (2017) Y. Zhang, E. Levina, and J. Zhu. Estimating network edge probabilities by neighbourhood smoothing. Biometrika, 104(4):771–783, 2017.
Comments
There are no comments yet.