Unseeded low-rank graph matching by transform-based unsupervised point registration

07/12/2018 ∙ by Yuan Zhang, et al. ∙ The Ohio State University 0

The problem of learning a correspondence relationship between nodes of two networks has drawn much attention of the computer science community and recently that of statisticians. The unseeded version of this problem, in which we do not know any part of the true correspondence, is a long-standing challenge. For low-rank networks, the problem can be translated into an unsupervised point registration problem, in which two point sets generated from the same distribution are matchable by an unknown orthonormal transformation. Conventional methods generally lack consistency guarantee and are usually computationally costly. In this paper, we propose a novel approach to this problem. Instead of simultaneously estimating the unknown correspondence and orthonormal transformation to match up the two point sets, we match their distributions via minimizing our designed loss function capturing the discrepancy between their Laplace transforms, thus avoiding the optimization over all possible correspondences. This dramatically reduces the dimension of the optimization problem from Ω(n^2) parameters to O(d^2) parameters, where d is the fixed rank, and enables convenient theoretical analysis. In this paper, we provide arguably the first consistency guarantee and explicit error rate for general low-rank models. Our method provides control over the computational complexity ranging from ω(n) (any growth rate faster than n) to O(n^2) while pertaining consistency. We demonstrate the effectiveness of our method through several numerical examples.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The problem of estimating the correspondence between two graphs has a long history and has a wide range of applications, including multiple-layer social network analysis, pattern recognition and computer vision, biomedical image analysis, document processing and analysis and so on. For a comprehensive review of these and more applications, see

Conte et al. (2004) and Fishkind et al. (2012).

The prototype graph matching problem has the following basic form. Suppose we have participating individuals numbered as in the network (or called “graph” – in this paper, we shall use “network” and “graph” interchangeably) as nodes or vertices. The data we collect describe the interactions or relationships between them, called edges. The edges may be binary or weighted, depending on the context, and the same set of individuals form two networks, but at least one of the networks has the order of its nodes shuffled, and the correspondence between the nodes of the two networks is missing. The primary goal of graph matching problem is to recover the lost node mapping, such that after aligning the order of nodes in one network to those in the other network would result the same or a similar networks. The data we observe are the two networks with their node orders shuffled, and further the data may be contaminated by random noises, as we shall explain right next.

There are two main versions of this problem. The exact graph matching problem assumes no randomness or noise in graph generation but only that the two graphs are exactly identical under the hidden true node correspondence. The task is to recover the true map. It is well-known to be an NP problem, despite the recent significant advances (Babai, 2016; Svensson and Tarnawski, 2017)

showing that it could be solved in quasi-polynomial time. The other version is inexact graph matching. This version assumes that data are observed with random noise. For example, a popular assumption is that the true graphs are edge probability matrices and only their Bernoulli realizations are observable – moreover, the generations of the corresponding edges in the two graphs may be dependent, such as the model studied in

Lyzinski et al. (2014b).

The existing research on the exact and inexact graph matching problems is not superficially nested as the appearance of them may suggest, but rather pointing to distinct directions. Research on the former largely focused on the worst-case complexity; while the latter has usually been discussed with structural assumptions. Network data with nodes involved apparently has complexity. But with structures assumed, this is significantly reduced. In this paper, we shall concentrate our attentions on the low-rank case, in which one may roughly think that data essentially reside in space, where is the dimension of assumed structures, thus solving the problem efficiently is possible. The low-rank model is more universal than it seems. Letting the rank grow, we may hope to consistently approach very general network structures (Bickel and Chen, 2009; Gao et al., 2015; Xu, 2017). Recent advancements in matrix analysis further suggest that low-rank models are decent approximations to much more general models, for example, Udell and Townsend (2017) claims that smooth structures can be approximated by models with rank growth rate.

By far, we have been discussing the unsupervised graph matching problem. Its counterpart – the seeded graph matching problem is also a popular topic. Seeds refer to subsets of nodes in the two networks, the true correspondence relationship between the members of which are known. It is intuitively understandable that a “representative” pair of seed node set, even with small cardinality, may dramatically lower the difficulty of the problem, making it not only polynomial. It is known that the seeded graph matching problems for graphs with low-rank structures are usually efficiently solvable (Lyzinski et al., 2014b, a, c). Learning any seed node in the unseeded context, however, seems difficult, as an efficient method that solves this problem may lead to P=NP.

In existing literature to date, the difficulty of the unseeded graph matching problem remains unknown. Despite the conjecture that it should be efficiently solvable, currently there exists no such provable method. The unknown node correspondence has been playing the main obstacle in the way. The problem can be translated into a point registration problem in space, but the two point clouds to be matched are further gapped by an unknown orthonormal transformation on one of them, thus it cannot be solved by directly applying a Hungarian algorithm. Attempts to solve this problem by far almost all involve an optimization partially over the unknown correspondence, and the focuses have been relaxing it. This makes these methods hard to analyze and many of their computations costly.

In this paper, we present a novel method that solves this problem in polynomial time with theoretical guarantees under quite mild structural assumptions. Our approach is distinct from the majority, if not all, of existing methods in that we directly pursue the low-rank structure thus completely avoided the optimization over the permutation matrix in the main stages of our method – only except that we run the Hungarian algorithm or its alternative just once at the end. Our method is simple, scalable and convenient to analyze, not only utilizing our analysis, but potentially also enabling a rich variety of subsequent estimations and inferences.

2 Problem formulation

We represent a graph of nodes by its by adjacency matrix , where if there is an edge from node to node , and otherwise. For simplicity, in this paper we only discuss binary and symmetric graphs with independent edge generations, that is, for every pair such that , generate data by  Bernoulli, where is the edge probability matrix, and for every , where and , is independent of . This is a popular model basis studied in many network inference papers such as Bickel and Chen (2009); Wolfe and Olhede (2013); Gao et al. (2015) and Zhang et al. (2017).

If the edge probabilities are completely arbitrary numbers and unrelated to each other, no meaningful inferences would be possible, so now we quantitatively define what we mean by “network structures”. According to the Aldous-Hoover representation (Aldous, 1981; Hoover, 1979), the edge probability matrix of an exchangeable network can be written as:

Definition 1 (Aldous-Hoover).

For any exchangeable network, there exists a symmetric function

, called the “graphon function”, and a set of i.i.d. random variables

 Uniform, such that the edge probability matrix can be represented by

For directed co-exchangeable networks, simply remove the symmetry requirement on and can be represented by , where and ’s are independent standard uniform random variables. Notice that both and the latent positions , and if applicable, are not-estimable due to identifiability issues, unless some strong additional model assumptions are assumed (Airoldi et al., 2013). Indeed, many existing work on graphon estimation tend to assume smoothness on , so will this paper, but such assumptions usually only help us in indirect ways, such as elucidated in Gao et al. (2015) and Zhang et al. (2017). Notice that the smoothness in does not mean that the resulting distribution of the elements of is continuous – a quick example is the Erdös-Renyi model, in which .

The Aldous-Hoover representation has a more specific form for low-rank networks. Here we impose our low-rank assumption on , and the low-rankness is straightforwardly inherited by the probability matrix generated based on . We have the functional spectral decomposition of as follows:


where is the

th largest nonzero eigenvalue and

is its corresponding eigenfunction that is defined on

and . In this paper, we only consider piece-wise Lipschitz universally bounded between 0 and 1, and one can show that this implies the universal boundedness of all the eigenfunction ’s piece-wise Lipschitz and thus their are universal boundedness.

Now, based on (1), we may represent from a low-rank graphon as follows:


where and are defined as , , and , respectively. Similar representation also appeared recently in Lei (2018b). When the graph is positive semi-definite (PSD) then and simply

This model is called random dot-product graph (RDPG) (Young and Scheinerman, 2007; Athreya et al., 2017). For general , where , we may separately estimate for the positive- and negative-semidefinite parts. For simplicity of illustration, in the statement of our method and the theory, we focus on the PSD case for simplicity.

Now we are ready to formally introduce the graph matching problem in the low-rank setting. Suppose we have two graphs and generated based on the same low-rank probability matrix as in (2), but the rows and columns of the second network is permuted by an unknown permutation . Denoting the two edge probability matrices by and , we have

and defining , the induced data generation can be described as follows:


where ranges over all index pairs satisfying .

If we have access to and the nonzero eigenvalues are distinct, then we may exactly recover and only up to an multiplier on each of its columns. For not-too-large , we can exhaust all possible sign flip combinations on the columns of , and, for each of them, we run a Hungarian algorithm to match the rows of the column-wise sign flipped to . This further leads to an exact recovery of the correspondence true node correspondence . But the problem would seemingly grow significantly less trivial, even in the oracle, if has repeated nonzero eigenvalues. The estimation may only get to the spanning linear space of the corresponding columns in , and now the rows of and are only matchable up to an unknown orthonormal transformation on the columns, that is . Another source that contribute to the introduction of the latent orthonormal transformation is the concentration inequalities regarding and . In practice, we never observe and may only work and the estimated from decomposing . By Davis-Kahan type theorem (Yu et al., 2014) and concentration results of eigenvalues, we can only approximate by from up to an unknown transform such that .

Now it is clear that the unseeded low-rank graph matching problem can be translated into an unsupervised point registration problem. Suppose there are two sets of points in a bounded set of . The two data sets and

are i.i.d. samples random vectors

and , respectively, where is an unknown orthonormal transformation. In this paper, distinct from most existing work, we do not impose any smoothness assumption on the distribution of , but instead only assume its universal boundedness, which is naturally satisfied when the point registration problem origins from the low-rank graph matching problem. The main task is to estimate both the transform and the permutation matrix that minimize the MSE loss function:


where the rows of and are and ’s. As mentioned earlier, we may not have access to and , but instead only observe error-contaminated versions of them. Moreover, the measurement errors may be dependent across sample points, but in fact this does not pose additional challenge to our method. For this reason, when introducing our method, we focus our attentions on the vanilla form of the unsupervised point registration problem (5).

3 Related work

In this section, we briefly review some popular existing methods for point registration and graph matching, respectively. Arguably one of the most popular point registration methods is Iterative Closest Point (ICP) (Ezra et al., 2006; Du et al., 2010; Maron et al., 2016). It solves the optimization problem (5) by iteratively optimizing over and . This method is simple yet popular. An ICP equipped with Hungarian algorithm costs

in each iteration, making it hard for large data sets. Another popular method is kernel correlation (KC). KC matches the two distributions by minimizing the integrated difference between their density functions empirically approximated by kernel density estimations (KDE). KC is originally designed only for continuous distributions, and it has a distinct form for discrete distributions. It is substantively difficult to apply KC to distributions of mixed continuity types.

Many existing methods on graph matching are based on seed nodes. Representative seed nodes may significantly reduce the difficulty of the problem and allows for efficient method for estimating the matching of graphs of general structures. On the other hand, most existing methods for unseeded graph matching focus on relaxing in the following optimization problem


from permutation into a continuum such as doubly stochastic relaxations, see Lyzinski (2016) and Vogelstein et al. (2015). As suggested by Lyzinski et al. , convex relaxations on almost never find the global optimality unless initialized already close to the optimal solution.

4 Our method

To introduce our method, we start with the observation that the main challenge in solving (5) lies in the optimization over the permutation matrix . This is a chicken-and-egg problem. Notice that if either the optimal or even part of the optimal is known or well-estimated, the estimation of the remaining parameters would be greatly simplified. This motivates us to consider the possibility of estimating only one of them and bypassing the optimization over the other one. Between and , clearly is a more “essential” parameter, because may look very differently from realization to realization and even have different dimensions if we consider the more general version of the point registration problem with different sample sizes; where as determines how the two distributions should be distorted to match up with each other.

The core idea of our method is that instead of aiming at matching up the individual points, we match the two distributions. To serve this purpose, we design a discrepancy measure that describes the difference between the two distributions as a function of . This naturally introduces an optimization problem over only , circumventing the optimization over since the empirical version of any such discrepancy measure would depend on data only through the empirical distributions of and , invariant to the order in which we observe the individual points and thus invariant to .

We now focus on the design of the discrepancy measure between distributions. Recall that we desire this measure to be well-defined for all distribution continuity types. One natural choice is to match their moments. Specifically, we want to match

all their moments simultaneously, since for any , one may always find random vectors and such that all their th moments match, but at least one of their st moments do not match. This naturally leads us to consider moment-generating transformations.

Among the arguably most popular choices, including moment generating function (MGF), Laplace transform and characteristic function (CF), we choose to work with Laplace transform for its convenient inversion formula form that significantly facilitates theoretical analysis. MGF’s known inversion formula

(Post, 1930; Widder, 2015)

is an infinite series; while CF’s inversion formula for recovering cumulative distribution function (CDF) is defined in a limit form (Lévy’s theorem, see

Durrett (2010)). The complicated CDF inversion formula of CF brings technical obstacles in analysis. Conventional Laplace transforms are defined only for positive random variables and vectors, but we will see that both the Laplace transform and its inversion formula are well-defined for universally bounded random variables and vectors, too. For a random vector satisfying for a universal constant , its Laplace transform is defined by


where . The inversion formula for (7) that recovers ’s joint CDF is


where the integration limit means integrating on the line segment connecting the two points . Given two random vectors and , where the former is tuned by an orthonormal transform , we wish to estimate that matches these two distributions. For this purpose, we define a loss function that describes the discrepancy between the two functions and . Inspired by (8), we design the population version of our loss function as follows


where is a tuning parameter that will be set by the theory. Clearly, under our assumption that the two distributions under study are matchable, the only ’s that achieve the minimum of 0 of (9) for all are those that match the distribution of to , that is, . The form (9) is intractable since it contains unknown components and and their integration over a continuum. Therefore, in practice, we work with its sample version. In order to realize the integration over , we sample by the following importance sampling:

  • Define


    as with a fixed .

  • , where is independent of any other , for all

  • For each , set , where is a preset constant, and the imaginary part is sampled from the continuous distribution with the density function :

The importance sampling scheme reflects the fact that as the imaginary part of drifts away from 0, the influence of the Laplace transform on the shape of the CDF function decreases. We are now ready to define the sample version of our loss function (9) as follows:


where the rows of and are independent samples from the distributions of and , respectively, that is:

In practice, the factor introduced by importance sampling in (10) can be ignored. Notice that is smooth for all that . After is estimated, we may simply run a Hungarian algorithm to obtain the mapping between the points. In the unseeded low-rank graph matching context, we may obtain estimated latent node positions by directly decomposing the adjacency matrices:


where and . We then solve the following point registration problem:


for permutation matrix and .

Computationally, our method demands optimization of the function over , where is the collection of all orthonormal matrices. For each , the cost to evaluate is , where recall that and are sample sizes of the data sets and we have control over , the number of ’es we shall sample element-wise from . This contrasts the cost of estimating the best match within each iteration in ICP, and moreover gives us the flexibility of controlling the trade-off between computation time and accuracy. Compared to KC, our method can handle continuous, discrete or mixed distributions by a unified formulation. Compared to both ICP and KC, our method is backed by a consistency guarantee with an explicit error rate. The results will be presented in Section 5.

Before concluding the description of our method, we briefly explain two small but important details. First, our criterion (10) does not require equal sample sizes. If the sample sizes are different, we may simply bootstrap the smaller sample, and the Hungarian algorithm will naturally produce a many-to-one estimated map, which is desired. The second topic is the choice of . At first it may seem natural to choose as an increasing function of , as did in an earlier version of this paper. However, doing so would likely greatly depreciate the guaranteed error rate. If we recall the idea of matching moments that motivated our method, we realize an arguable intuition that all the moments can be determined by the curvature of the Laplace transform around , and as we travel far away with large in the tail, the shape of the Laplace transforms there might possibly grow less relevant. In Section 5, we shall see that fixing helps us to achieve a nearly tight error bound.

5 Theory

By Zhang et al. (2014) and Anderson et al. (1986), with probability where as , for the th largest nonzero eigenvalue of the matrix , denoted by , we have


Without loss of generality, we may organize the columns of and in (3), (4) and (11), (12) to be put in the order aligned to the true or estimated leading nonzero eigenvalues of the corresponding matrices from which those ’s are decomposed, then by (14) and (15), we have

Next, combining the results of Yu et al. (2014), Lei et al. (2015), (14) and (15), there exists unknown orthonormal matrices , such that with high probability,


Moreover, if  BlockDiagonal, then we may further shrink the sample spaces of and as follows:

where , and , respectively, because we can apply Yu et al. (2014)

on positive and negative eigenvalues and their corresponding eigenvectors, respectively. This reduction was not explicitly emphasized in network analysis literature, mostly works on community detection, because the orthonormal transform

is nuisance and has no impact on the subsequent estimation and inference steps. But in the matching problem, the dimensionality of is determining on both accuracy and computation cost.

We now present the consistency theory of our method. The proofs are in the Appendix. First we present the uniform concentration inequality of our proposed criterion to its population version.

Theorem 1 (Uniform concentration of the loss function).

Given the distributions of universally bounded random vectors and in , there are universal constants such that




where is a constant depending only on .

It is worth noting that (19) is stated with unknown transforms and . This means that we know that with high probability and will be close, but we cannot disentangle and mixed inside the optimal , but this is fine. Recall that our ultimate goal is to accurately estimate the point registration . Our theory only demands that is close to some optimal solution , if the optimal solution is not unique.

Next we state a crucial regularity condition regarding the shape of our loss function near its minimum. This property is satisfied by a wide range of frequently used distributions in practice.

Definition 2 (Sharp Slope Condition).

A function is said to satisfy Steep Slope Condition, if there exist universal constants such that for all , for some compact , the following properties hold:

  • The function is minimized to 0:

    and the minimum is attained only at a finite number of ’s

  • For any

    and , we have

The Sharp Slope Condition is satisfied by with respect to

restricted in orthonormal transformations for many distributions, examples include multinomial distribution and multivariate normal distribution – notice the latter is not within the range of the consideration of our current theory as the distribution is unbounded, but we believe the it can be expanded to sub-gaussian distributions. If a function satisfies Sharp Slope Condition, then optimizing a sample version of the function that has uniform convergence would yield an estimation decently close to the true optimal solution.

Theorem 2.

Suppose the function satisfies Sharp Slope Condition, and it has a uniformly concentrating sample version such that

Assume that and the time cost to evaluate is polynomial in , when is the sample size associated with . Then there exists a polynomial algorithm, such its output

is close to the optimal solution, in the sense that

If satisfies Sharp Slope Condition, where we can regard and to be some parameterization of , such as when we can parameterize  SO by the rotation angle , then using the results of Levina and Bickel (2001), Fournier and Guillin (2015) and Lei (2018a), for the equal sample size case , we have

Theorem 3.

Suppose the parameterization of as a function of satisfies the Sharp Slope Condition. Let be the output of the algorithm in Theorem 2, and then define to be the optimal permutation under MSE estimated by the Hungarian algorithm to match the rows of and , we have

Notice that when , the term is dominating. The error bound by Theorem 3 seems tight when among methods that assume population structures, as the concentration of the empirical distribution to the population distribution is likely unavoidable.

Theorem 3 immediately implies the control on graph matching error:


6 Numerical examples

In this section, we test our method and two other popular benchmark methods for unseeded graph matching on three example low-rank graphs. Graph 1 is generated from the graphon of a stochastic block model with communities of equal sizes. Within-community probabilities of community is and between-community probabilities are . Graphon 2 is a more general low-rank graph with distinct nonzero eigenvalues. The graphon function is defined by . Graphon 3 is relatively most difficult for all methods, as it has repeated nonzero eigenvalues. The leading eigenvalues are and their corresponding eigenfunctions are , , and

. Graphs generated from graphon 1 element-wise follow Bernoulli distributions, and graphs generated from both graphons 2 and 3 are element-wise contaminated by

random noises. We repeat the experiment 30 times for each graphon. In each experiment, we randomly generate two independent realizations from the graphon, and shuffle the node order of one of the adjacency matrices by a randomly chosen permutation matrix unknown to all the compared methods. We measure the performance of the methods by RMSE:

In all these experiments, we run our method with the following random starts: we initialize the orthonormal matrix in our loss function from Givens rotations (Merchant et al., 2018):

Where are defined as follows; where and , and the matrix excluding the st and

th rows and columns is an identity matrix

. We start our method with , where we choose . Notice that this is feasible since all the tested graphons have at most nonzero leading eigenvalues. If is large, we may reduce to and also considering sub-sampling from all the possible configurations of and ’s. In this simulation study, we fix and . For , we used LAPJV to perform the final Hungarian algorithm match, and for all the other (smaller) ’s, we used MUNKRES.

The bench mark methods we compared to are Fishkind et al. (2012) and Vogelstein et al. (2015) using the MATLAB codes downloaded from the authors’ websites.

Graphon net size Our method SGM FAQ
RMSE 5.02(0.28) 6.03(0.24) 31.48(0.01)
0.00(0.09) 0.00(0.05) 31.89(0.00)
0.00(0.00) 0.00(0.00) 31.59(0.00)
0.00(0.00) 0.00(0.00) 31.60(0.00)
0.00(0.00) 0.00(0.00) 31.61(0.00)
Time 5.57(0.36) 0.14(0.00) 0.17(0.00)
19.57(0.07) 0.96(0.01) 1.69(0.01)
27.25(0.06) 4.87(0.02) 8.64(0.02)
34.84(0.08) 25.22(0.04) 49.98(0.08)
55.31(0.08) 141.21(0.11) 327.48(0.26)
Table 1: Graphon 1 (100*Frobenius/n, repeat = 30 times)
Graphon net size Our method SGM FAQ
RMSE 8.62(0.17) 6.99(0.06) 70.30(0.01)
6.18(0.11) 4.96(0.02) 70.52(0.00)
4.38(0.04) 3.91(0.01) 70.60(0.00)
3.06(0.02) 3.18(0.00) 70.64(0.00)
2.35(0.01) 2.46(0.00) 70.67(0.00)
Time 5.33(0.02) 0.16(0.00) 0.28(0.00)
12.55(0.03) 1.16(0.00) 1.90(0.01)
18.20(0.10) 5.96(0.01) 9.72(0.02)
29.26(0.07) 30.67(0.03) 57.29(0.04)
50.30(0.04) 189.78(0.14) 376.96(0.18)
Table 2: Graphon 2 (100*Frobenius/n, repeat = 30 times)
Graphon net size Our method SGM FAQ
RMSE 3.10(0.05) 5.88(0.20) 11.63(0.06)
2.46(0.07) 5.74(0.17) 11.62(0.02)
1.99(0.10) 6.03(0.15) 11.64(0.01)
1.74(0.08) 12.23(0.13) 11.88(0.01)
1.51(0.07) 12.24(0.03) 11.99(0.00)
Time 18.81(0.38) 0.14(0.00) 0.82(0.01)
42.30(0.45) 1.01(0.02) 5.74(0.03)
68.00(0.56) 5.95(0.09) 27.87(0.17)
109.94(0.61) 16.95(0.32) 159.75(0.62)
194.93(0.87) 121.93(0.39) 1028.84(2.98)
Table 3: Graphon 3 (100*Frobenius/n, repeat = 30 times)

Graphon 1 is relatively easy for all methods, and we observed that our method and SGM quickly became perfectly accurate from a quite small sample size onward. Graphon 2 is slightly harder and our method performed similarly to SGM. On the most challenging model Graphon 3, our method shows its advantage in exploiting the low-rank structure and has a diminishing MSE, whereas SGM seemed to become increasingly disoriented in its growing search space of optimization. In all examples, FAQ did not perform well, possibly due to the poor initialization at .

On computational efficiency, we observe that with a fixed , our method’s time complexity increases linearly with the sample size . Recall that we have the flexibility to tune the increment rate of with , so in the extreme case if we have to handle an increasing sample size with a fixed target error bound, then we may use a fixed to achieve linear computational time. Also recall that increasing faster than , however, will not further improve error rate since is now the bottleneck factor.

7 Discussions

In this section, we present discussions on various aspects of our method and some future directions along the two lines of point registration and graph matching.

First, on the point registration side: our method can handle more general invertible linear transformations than orthonormal

, but in order to retain the satisfaction of the Steep Slope Condition and our analysis, some regularity assumptions on the family of transformations would be necessary. Similarly, further extension to parameterized nonlinear transformation can be considered. We envision the even further extension to general nonparametric transformations to remain challenging.

In this paper, we have been placing our attentions solely to matching points from universally bounded distributions, which is indeed a natural feature of positions by decomposing moderately regular graphons. We conjecture that our theoretical results might be generalizable to sub-Gaussian and and possibly other light-tailed distributions, as many preliminary empirical evidences encouragingly suggest, and we are currently working on this direction. On the other hand, if the distributions to be matched are extremely heavy-tailed that even the first moment does not exist, then we may need a new goodness measurement as the population Wasserstein distance may not be always well-defined unless the two distributions are already perfectly matched up. If the population version of some criterion is not-defined, the meaningfulness of its sample version, despite its possible existence, would be under doubts.

Matching the points generated from different graphons, however, remains a major challenge. Notice that our criterion as a discrepancy measure between the two distributions’ Laplace transforms is, by itself, a valid statistical distance, as it satisfies triangular inequality and other requirements for a distance. With non-matchable distributions, optimizing our criterion and optimizing other criteria such as Wasserstein distance may find different estimated transforms of one data set that “best matches” the other in their own senses, and subsequently, the resulting matches are likely different. The potential presence of outliers is a similar but different topic – the two underlying point generating distributions may still be matchable, but now except for a few outliers. Another closely related but different topic is outliers – our method might not be robust against outliers, and we recommend users to detect and eliminate outliers in the data pre-processing procedure.

Then, on the graph matching side. First, we made the assumption that there are optimal and that can perfectly match up the true latent node positions, that is, , under equal sample sizes. This assumption is by no means substantive and the assumption could be easily relaxed to networks whose latent node positions in the graphon model and or arbitrary dependence structure between each pair of nodes belonging to different networks, while the internal independence within each network holds: and for any and .

Our method is certainly not designed to match up general graphons of full rank, despite it may find competitive solutions under some full rank graphons, if the leading few eigenfunctions can already differentiate the different roles of the nodes in the networks well. In such cases, despite the other eigenvalues and eigenfunctions may matter much for purposes such as graphon estimation, the leading ones may already suffice to provide accurate node matching. But the general problem of matching full rank graphons is outside the scope of this paper.