Inference for multiple heterogeneous networks with a common invariant subspace

06/24/2019 ∙ by Jesús Arroyo, et al. ∙ Johns Hopkins University 0

The development of models for multiple heterogeneous network data is of critical importance both in statistical network theory and across multiple application domains. Although single-graph inference is well-studied, multiple graph inference is largely unexplored, in part because of the challenges inherent in appropriately modeling graph differences and yet retaining sufficient model simplicity to render estimation feasible. This paper addresses exactly this gap, by introducing a new model, the common subspace independent-edge (COSIE) multiple random graph model, which describes a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph. The COSIE model encompasses many popular network representations, including the stochastic blockmodel. The model is both flexible enough to meaningfully account for important graph differences and tractable enough to allow for accurate inference in multiple networks. In particular, a joint spectral embedding of adjacency matrices - the multiple adjacency spectral embedding (MASE) - leads, in a COSIE model, to simultaneous consistent estimation of underlying parameters for each graph. Under mild additional assumptions, MASE estimates satisfy asymptotic normality and yield improvements for graph eigenvalue estimation. In both simulated and real data, the COSIE model and the MASE embedding can be deployed for a number of subsequent network inference tasks, including dimensionality reduction, classification, hypothesis testing and community detection. Specifically, when MASE is applied to a dataset of connectomes constructed through diffusion magnetic resonance imaging, the result is an accurate classification of brain scans by patient and a meaningful determination of heterogeneity across scans of different subjects.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 19

page 20

page 27

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Random graph inference has witnessed extraordinary growth over the last several decades (Goldenberg et al., 2009; Kolaczyk, 2017; Abbe, 2017). To date, the majority of work has focused primarily on inference for a single random graph, leaving largely unaddressed the critical problem of modeling the structure of multiple-graph data, including multi-layered and time-varying graphs (Kivelä et al., 2014; Boccaletti et al., 2014; Holme and Saramäki, 2012). Such multiple graph data arises naturally in a wide swath of application domains, including neuroscience (Bullmore and Sporns, 2009), biology (Han et al., 2004; Li et al., 2011), and the social sciences (Szell et al., 2010; Mucha et al., 2010).

Several existing models for multiple-graph data require strong assumptions that limit their flexibility (Holland et al., 1983; Wang et al., 2017; Levin et al., 2017; Nielsen and Witten, 2018) or scalability to the size of real-world networks (Durante et al., 2017; Durante and Dunson, 2018). Indeed, principled approaches to multiple-graph inference (Tang et al., 2017b, a; Levin et al., 2017; Ginestet et al., 2017; Arroyo Relión et al., 2019; Kim and Levina, 2019) are rather scarce, in part precisely because of the challenges inherent in constructing a multiple-graph model that adequately captures real-world graph heterogeneity while remaining analytically tractable. The aim, here, is to address exactly this gap.

In this paper, we resolve the following questions: first, can we construct a simple, flexible multiple random network in networks that can be used to approximate real-world data? Second, for such a model, can we leverage common network structure for accurate, scalable estimation of model parameters, while also being able to describe distributional properties of our estimates? Third, how can we use our estimators in subsequent inference tasks, such as multiple-network testing, community detection, or graph classification? Fourth, how well do such modeling, estimation, and testing procedures perform on simulated and real data compared to state-of-the-art methods for such tasks?

Each of these questions is of theoretical and practical significance in its own right. We address all of them, and our unified framework allows for a consistent and comprehensive approach to multiple-network inference. To begin, we present a semiparametric model for multiple graphs with aligned vertices that is based on a common subspace structure between their expected adjacency matrices, but with allowance for heterogeneity within and across the graphs. The common subspace structure has a meaningful interpretation and generalizes several existing models for multiple networks; moreover, the estimation of such a common subspace in a set of networks is an inherent part of well-known graph inference problems such as community detection, graph classification, or eigenvalue estimation (Lyzinski et al., 2017).

Our model describes a class of independent-edge random graphs whose Bernoulli adjacency matrices , have expectation of the form , where is a low-dimensional matrix and is a matrix of orthonormal columns. Because the invariant subspaces defined by are common to all of the graphs, we call this model the common-subspace independent edge (COSIE) random graph model. The matrices are dimension , need not be diagonal, and can vary with each graph. In particular, despite the shared expectation matrix factor , each graph can have a different distribution. As we show in Prop. 1

, our model setting includes stochastic blockmodels (SBMs) with common block assignments but different block connection probability matrices

.

Next, we explore estimation of model parameters in COSIE, specifically the common invariant subspace and the individual matrices . In Theorem 3, we leverage techniques from Fan et al. (2017) to prove that a joint spectral embedding of the adjacency matrices leads to an accurate estimate of , up to an orthogonal transformation. We call this joint spectral embedding procedure the multiple adjacency spectral embedding (i.e., MASE), and we denote by the estimate for generated by the MASE procedure. In the special case when all the graphs have a shared community structure in the stochastic blockmodel, Theorem 5 shows that an additional clustering step applied to the rows of can be employed to accurately recover vertex community labels. In Theorem 7, we show that can also be used to estimate the matrices accurately. Under delocalization assumptions on , which are satisfied in, for example, stochastic blockmodel graphs, we prove that the estimates

have entries that are asymptotically normal with a constant variance and with a bounded bias that decays as the number of graphs

increases.

Subsequently, we compare estimation and testing of MASE to other state-of-the-art procedures for latent space modeling of multiple graphs, notably a 1) joint embedding first described in Wang et al. (2017); 2) a model for multiple random dot product graphs in Nielsen and Witten (2018); 3) the omnibus embedding of Levin et al. (2017)

in simulated stochastic blockmodels. We show that for certain graphs arising from the COSIE model, MASE empirically outperforms all of these methods for the estimation of block probability matrices, community detection, and graph classification. We also demonstrate that two-sample test statistics derived from MASE estimators outperform the omnibus embedding for graph hypothesis testing. Specifically, when the inference task is to detect whether two stochastic blockmodels have the same block probability matrices

, the MASE test statistic provides a nontrivial improvement in power. We illustrate such improvement in statistical power in a challenging setting where the block probability matrices between null and alternative differ only slightly.

Next, we use COSIE to model a collection of brain networks, specifically data from the HNU1 study (Zuo et al., 2014) consisting of diffusion magnetic resonance imaging (dMRI) of 30 different healthy human subjects. Each subject was scanned ten times over a one-month period, resulting in 300 scans. Using NeuroData’s NDMG pipeline Kiar et al. (2018) and a registration to the CC200 atlas of Craddock et al. (2012), we construct a collection of 200-vertex graphs. We apply the MASE procedure to this collection of graphs, obtaining low-dimensional representations of each graph, and then generate a dissimilarity matrix of pairwise distances between such representations. Performing classical multi-dimensional scaling (CMDS) (Borg and Groenen, 2003) on this dissimilarity results in a two-dimensional plot of these graphs, with clear separation between different subjects. Thus the composition of CMDS with the MASE procedure elucidates differences across heterogeneous networks but also recognizes important similarities, and provides a rigorous statistical framework within which biologically relevant differences can be assessed.

The paper is organized as follows. In Section 2, the goal is to briefly and succinctly encapsulate the principal contributions of this paper. As such, in this section, we introduce the COSIE model and provide basic context, and we describe spectral graph inference and the MASE procedure. Thereafter, we present our main theoretical results on consistency and asymptotic normality for the MASE procedure. We then move on to describe the use of MASE on simulated and real data, with which we conclude Section 2. In Section 3 and Section 4, we give a formal treatment of the COSIE model and MASE procedure. In Section 5, we study the theoretical statistical performance of MASE and provide a bound on the error of estimating the common subspace, and we establish asymptotic normality of the estimates of the individual graph parameters. In Section 6, we investigate the empirical performance of MASE for estimation in testing when compared to other benchmark procedures. In Section 7, we use MASE to analyze connectomic networks of human brain scans. Specifically, we show the ability of our model and estimation procedure to characterize and discern the differences in brain connectivity across subjects. In Section 8, we conclude with a discussion of open problems for future work.

2 Summary of Key Results

We begin this section by summarizing our multiple random graph model, keeping in mind the overarching goal of establishing a tractable, flexible model for graph heterogeneity. Consider a sample of observed graphs , with , where denotes a set of labeled vertices, and is the set of edges corresponding to graph . Assume the vertex sets are the same (or at least aligned). Assume the graphs are undirected, with no self-loops (it is worth emphasizing however, that these results can be easily extended to allow for loops, directed or weighted graphs). For each graph , denote by to the adjacency matrix that represents the edges; the matrix is binary, symmetric and hollow, and if . Assume that the above-diagonal entries , , of the adjacency matrix for graph

are independent Bernoulli random variables with

the probability of an edge between vertex and vertex . Consolidate these probabilities in the matrix .

In defining a model for multiple graphs, we adopt a low-rank assumption on the expected adjacency matrices. To leverage the information content across multiple graphs, we further assume common structure in that the expected adjacency matrices of the independent edge graphs share a common invariant subspace. We call this the common subspace independent edge random graph (COSIE). Specifically, we say that a collection of independent random graphs are distributed according to the bounded rank COSIE graph model with common subspace and score matrices , if each of the connection probability matrices for is given by

Under certain conditions on the rank of the score matrices, the invariant subspace of the model is identifiable up to an orthogonal transformation (see Proposition 2).

Letting denote the adjacency matrix for graph , we write

In the COSIE model, there are three primary estimation goals. First, we want to obtain an accurate estimate of the common subspace of the unobserved probability matrices . Second, we want to estimate the individual matrices .

COSIE also provides the foundations for the critical (and wide-open) question of multi-sample graph hypothesis testing, whereby we can use these estimates for and to build principled tests to compare different populations of networks.

2.1 COSIE is a flexible and estimable multiple graph model

The COSIE model is appropriate for modeling a nontrivial swath of real network phenomena. Indeed, the ubiquitous SBM is COSIE, and COSIE encompasses the important case of a collection of vertex-aligned SBMs (Holland et al., 1983) whose block assignments are the same but whose block probability matrices may differ, as we show in Proposition 1. COSIE also provides a generalization to the multiple-graph setting of latent position models such as the random dot product graph (Young and Scheinerman, 2007).

Now, to estimate and , we rely on the low-rank structure of the expectation matrices in COSIE. Specifically, we first spectrally decompose each of the adjacency matrices to obtain the adjacency spectral embedding (Sussman et al., 2012) , defined as

where is the diagonal matrix of the top eigenvectors of , sorted by magnitude, and the matrix of associated orthonormal eigenvectors.

In the COSIE model, we will leverage the common subspace structure and use these eigenvector estimates to obtain an improved estimate of the true common subspace . In fact, we use to build the multiple adjacency spectral embedding (MASE), as follows. We define the matrix by

and we let be the matrix of the top

leading left singular vectors of

. We set

The MASE algorithm, then, outputs and . Figure 1 provides a graphical illustration of the MASE procedure for a collection of stochastic blockmodels. The key point, here, is that COSIE is a flexible, tractable model for random graphs that retains enough homogeneity—via the common subspace—for ease of estimation, and permits sufficient heterogeneity—via the potentially distinct score matrices —to model important collections of different graphs.

Figure 1: Graphical representation of Algorithm 1 for estimating the common invariant subspace in a multilayer stochastic blockmodel (see Definition 3).

2.2 MASE provides a consistent estimate for the COSIE parameters

Sussman et al. (2012); Lyzinski et al. (2014); Athreya et al. (2016) show that is a consistent, asymptotically normal estimate for underlying graph parameters for the random dot product graph model (Young and Scheinerman, 2007), a model in which the connection probability matrix is given by an outer product of a low-rank matrix with itself. In Levin et al. (2017) it shown that such a spectral embedding can be profitably deployed for estimation and testing in multiple independent random dot product graphs. We show in Theorem 3 that under mild assumptions on the sparsity of the graphs,

where indicates the space of orthogonal matrices of size . This result employs bounds introduced by Fan et al. (2017) for subspace estimation in distributed covariance matrices.

Furthermore, under delocalization assumptions on , we can show that the entries of the MASE estimates are asymptotically normal, with a bias term that decays as increases. Namely, there exists a sequence of orthogonal matrices such that

as , where

is a random matrix that satisfies

. These results and simulation evidence (see Figure 5 in Section 6) suggest that the multiplicity of graphs in a COSIE can profitably reduce the bias in eigenvalue estimation as compared to a recent work of Tang (2018).

The second key point, then, is that MASE provides consistent estimates for the common subspace and the score matrices .

2.3 MASE estimates yield state-of-the-art performance on subsequent inference tasks

Section 6 examines a number of graph inference procedures, including community detection, graph classification, and testing. Here, we provide a preview: a glimpse of the impact MASE can have on graph hypothesis testing. Hypothesis testing on populations of graphs is a relatively nascent area in which methodologically sound procedures for graph comparison are scarce.

Our goal is to test the hypothesis that for a given pair of random adjacency matrices and , the underlying probability matrices and are the same. Using the COSIE model, for any pair of matrices and there exists an embedding dimension and a matrix with orthonormal columns such that . Therefore, our framework reduces the problem to testing the hypothesis . We evaluate the performance of our method and compare it with the omnibus embedding method of Levin et al. (2017), which is one of the few principled methods for this problem in latent space models.

We proceed by generating with a mixed membership SBM with three communities (Airoldi et al., 2007), such that for each , the row that corresponds to vertex is distributed as . The matrix of the first model is fixed as . We simulate two scenarios

  1. Same community assignments, but different connectivity: fix , and vary by increasing .

  2. Different community assignments, same connectivity: fix , and sample memberships of vertices independently while keeping the rest to be the same, that is, are i.i.d. random variables, and for .

The first scenario can be represented as a COSIE model with dimension , where is the rank of the matrix , but for the second one, to correctly represent these graphs in the COSIE model we use and .

For both MASE and OMNI, we obtain the distribution of the test statistic via Monte Carlo simulations using the correct expected value of each graph , and for MASE we also evaluate the parametric bootstrap approach proposed by Tang et al. (2017b), as an alternative to use this method in practice when the probability matrix is not known.

Figure 2:

Power for rejecting the null hypothesis that two graphs have the same distribution as a function of the difference in their parameters at a

significance level (black dotted line). Both graphs are sample from a mixed-membership SBM . In the left figure, the community assignments and are the same, while the connection probability changes on the x-axis. In the right figure, while the community assignments of a few nodes change. Both methods improve their power as the difference between the graphs get larger, but MASE can effectively use the common subspace structure to outperform in the empirical power.

Figure 2 shows the result in terms of power for rejecting the null hypothesis that the two graphs are equal. As the parameters increase their difference, both methods approach full power. The first scenario shows an advantadge for MASE, which exploits the fact that the common subspace is well represented on each graph, and having a smaller number of parameters than OMNI. The second scenario is more challenging for MASE since the number of parameters is increased, but it still shows a competitive performance. In both cases the bootstrap method performed very similar to the true Monte Carlo power. Thus the third key point is that MASE performs competitively with respect to (and in some cases significantly better than) a number of existing procedures for estimation, community detection, graph classification, and testing in simulated stochastic blockmodels.

2.4 COSIE and MASE perform well on real data challenges

COSIE and MASE can be deployed effectively in a pressing real-data problem: that of identifying differences in brain connectivity in medical imaging on human subjects. The graphs we examine arise from the HNU1 study (Zuo et al., 2014) which consists of diffusion magnetic resonance imaging (dMRI) records of 30 different healthy subjects that were scanned ten times each over a period of one month. The pre-processing of the data (see Section 7 for details) resulted in 300 graphs with 200 aligned vertices .

After applying the MASE method to the HNU1 data, we obtain a matrix of joint latent positions and a set of symmetric score matrices of size for each individual graph, yielding 120 parameters to represent each graph. Figure 3 shows the latent positions of the vertices for the first three dimensions of . Most of the vertices of the CC200 atlas are labeled according to their spatial location as left or right hemisphere, and this structure is reflected in Figure 3, which suggests that the embedding obtained by MASE is anatomically meaningful.

Figure 3: Estimated latent positions obtained by MASE on the HNU1 data graphs with a 3-dimensional embedding.

Now, the matrix of Frobenius distances such that , is invariant to any rotations on according to Proposition 2. To visualize the geometry of imposed by the distances on the space of subjects, we perform multidimensional scaling (MDS) on the matrix , with 5 dimensions for the scaling. Figure 4 shows scatter plots of the positions discovered by MDS for a subset of the graphs corresponding to 5 subjects, chosen because the distances between them represent a useful snapshot of the variability in the data. We observe that the points representing graphs of the same subject usually cluster together. The location of the points corresponding to subjects 12 and 14 suggests some similarity between their connectomes, while subject 13 shows a larger spread, and hence more variability in the brain network representations.

To conclude, the final key point is that COSIE and MASE are robust to the challenges of real data, and provide domain-relevant insights for real data analysis.

Figure 4: Multidimensional scaling of the individual graph representations obtained by MASE for the HNU1 data. The plot shows the positions discovered by MDS for the graphs corresponding to 5 subjects of the data, showing that graphs corresponding to the same subject are more similar between each other.

3 The model: Common subspace independent-edge random graphs (COSIE)

To model a sample of graphs with aligned vertices we consider an independent-edge random graph framework, in which the edges of a graph are conditionally independent given a probability matrix , so that each entry of denotes the probability of a Bernoulli random variable representing an edge between the corresponding vertices and , and hence, the probability of observing a graph is given by

We also write . This framework encompasses many popular statistical models for graphs (Holland et al., 1983; Hoff et al., 2002; Bickel and Chen, 2009) which differ in the way the structure of the matrix is defined. In the single graph setting, imposing some structure in the matrix is necessary to make the estimation problem feasible, as there is only one graph observation. Under this framework, we can characterize the distribution of a sample of graphs by modeling their expectations . As our goal is to model a heterogenous population of graphs with possible different distributions, we do not assume that the expected matrices are all equal, so as in the single graph setting, further assumptions on the structure of these matrices are necessary.

To introduce our model, we start by reviewing some models for a single graph that motivate our approach. One of such models is the random dot product graph (RDPG) defined below.

Definition 1 (Random dot product graph model (Young and Scheinerman, 2007)).

Let be a matrix such that the inner product of any two rows satisfy . We say that a random adjacency matrix is distributed as a random dot product graph with latent positions , and write , if the conditional distribution of given is

The RDPG is a type of latent space model (Hoff et al., 2002) in which the rows of correspond to latent positions of the vertices, and the probability of an edge between a pair of vertices is proportional to the angle between their corresponding latent vectors, which gives an appealing interpretation of the matrix of latent positions . Under the RDPG model, the matrix of edge probabilities satisfies , so the RDPG framework contains the class of independent-edge graph distributions for which is a positive semidefinite matrix with rank at most . More recently, Rubin-Delanchy et al. (2017) introduced the generalized RDPG (GRDPG) model, which extends this framework to the whole class of matrices with rank at most , by introducing a diagonal matrix to the model, such that is of size and the diagonal entries are either or . Given and , the edge probabilities are modeled as , which keeps the interpretation of the latent space while extending the class of graphs that can be represented within this formulation.

The stochastic blockmodel (SBM) (Holland et al., 1983) is a particular example of a RDPG in which there are only different rows in , aiming to model community structure in a graph. In a SBM, vertices are partitioned into different groups, so each vertex has a corresponding label . The probability of an edge appearing between two vertices only depends on their corresponding labels, and these probabilities are encoded in a matrix such that

To write the SBM as a RDPG, let be a binary matrix that denotes the community memberships of the vertices, with if , and 0 otherwise. Then, the edge probability matrix of the SBM is

(1)

and denote . Let be the eigendecomposition of . From this representation, it is easy to see that if we write , then the distribution of a SBM corresponds to a GRPDG graph with latent positions . In particular, if is a positive semidefinite matrix, this is also a RDPG. Multiple extensions to the SBM have been proposed in order to produce a more realistic and flexible model, including degree heterogeneity (Karrer and Newman, 2011), multiple community membership (Airoldi et al., 2007), or hierarchical partitions (Lyzinski et al., 2017). These extensions usually fall within the same framework of Equation (1) by placing different constraints on the matrix , and hence they can also be studied within the RDPG model.

In defining a model for multiple graphs, we adopt a low-rank assumption on the expected adjacency matrices, as the RDPG model and all the other special cases do. To leverage the information of multiple graphs, a common structure among them is necessary. We thus assume that all the expected adjacency matrices of the independent edge graphs share a common invariant subspace, but allow each individual matrix to be different within that subspace.

Definition 2 (Common Subspace Independent Edge graphs, a.k.a COSIE).

Let

be a matrix with orthonormal columns, and be symmetric matrices such that for all , . We say that the random adjacency matrices

are jointly distributed according to the

common subspace independent-edge graph model with bounded rank and parameters and if for each , given and the entries of each are independent and distributed according to

We denote by to the joint model for the adjacency matrices.

Under the COSIE graph model, the expected adjacency matrices of the graphs share a common invariant subspace that can be interpreted, upon scaling by the score matrices, as the common joint latent positions of the graphs. This invariant subspace can only be identified up to an orthogonal transformation, so the interpretation of the latent positions is preserved. Note that each graph is marginally distributed as a GRDPG. The score matrices can be expressed as as the eigendecomposition of , such that

is a orthogonal matrix of size

, and is a diagonal matrix containing the eigenvalues. Then, the corresponding latent positions of graph in the GRDPG model are given by , with .

The model also introduces individual score matrices that control the connection probabilities between the edges of each graph. When an is diagonal, its entries contain the eigenvalues of the corresponding probability matrix , but in general does not have to be diagonal. Formal identifiability conditions of the model are discussed in Section 3.1.

The COSIE model can capture any distribution of multiple graphs with aligned vertices, so for any set of probability matrices there exist an embedding dimension such that those matrices can be represented with the COSIE model. Indeed, it is trivial to note that if , we can set and . However, for many classes of interest, the embedding dimension necessary to exactly or approximately represent the graphs is usually much smaller than . In those cases, the COSIE model can effectively reduce the dimensionality of the problem from different parameters to only .

To motivate our model, we consider the multilayer stochastic blockmodel for multiple graphs, originally introduced by Holland et al. (1983). In this model, the community labels of the vertices remain fixed across all the graphs in the population, but the connection probability between and within communities can be different on each graph. The parsimony and simplicity of the model, while maintining heterogeneity across the graphs, has allowed its use in different statistical tasks, including community detection (Han et al., 2014), multiple graph inference (Pavlović et al., 2019) and modelling time-varying networks (Matias and Miele, 2017). Formally, the model is defined as follows.

Definition 3 (Multilayer stochastic blockmodel (Holland et al., 1983)).

Let be a matrix such that for each , and be symmetric matrices. The random adjacency matrices are jointly distributed as a multilayer SBM, denoted by , if each is independently distributed as .

The next proposition formalizes our intuition statement about the semiparametric aspect of the model. Namely, the COSIE graph model can represent a multilayer SBM with communities using a dimension that is at most . The proof can be found in the appendix.

Proposition 1.

Suppose that and are the parameters of the multilayer SBM. Then, for some , there exists a matrix with orthonormal columns and symmetric matrices such that

The previous result shows that the multilayer SBM is a special case of the COSIE model. Conversely, if is the invariant subspace of the COSIE model, and has only different rows, then the COSIE model is equivalent to the multilayer SBM. Furthermore, several extensions of the single-graph SBM can be translated directly into the multilayer setting, and are also contained within the COSIE model. By allowing the matrix to have more than one non-zero value on each row, overlapping memberships can be incorporated (Latouche et al., 2011), and if the rows of are nonnegative real numbers such that then we obtain an extension of the mixed membership model (Airoldi et al., 2007), which can further incorporate degree heterogeneity by multiplying the rows of by a constant (Zhang et al., 2014). More broadly, if the rows of in a COSIE graph are characterized by a hierarchical structure, then the graphs correspond to a multilayer extension of the hierarchical SBM (Lyzinski et al., 2017). Some of these extensions have not been presented before, and hence our work provides a road for studying these models, which are interesting in various applications.

Other latent space approaches for multiple graph data setting have been presented before, but they either tend to limit the heterogeneity in the distributions across the graphs, or include a large number of parameters, which complicate the scalability and interpretation. Levin et al. (2017) presents a method to estimate the latent positions for a set of graphs. Although the method can in principle obtain different latent positions for each graph that do not require any further Procrustes aligment, the method is only studied under a joint RDPG model which assumes that the latent positions of all the graphs are the same. Wang et al. (2017) introduced a semiparametric model for graphs that is able to effectively handle heterogeneous distributions. However, their model usually requires a larger number of parameters to represent the same distributions than COSIE, which complicates the interpretation, and the non-convexity of the problem makes estimation harder. In fact, an equivalent statement to Proposition 1 will require to embed the graphs in a latent space of dimension , compared to

for COSIE. Other related models that are based on tensor decompositions

(Zhang et al., 2019; Wang et al., 2019) present similar challenges on estimation and interpretation. More recently, Nielsen and Witten (2018) studied a multiple RDPG model that is a special case of the model of Wang et al. (2017) by imposing further identifiability constraints. These constraints limit the ability of their model to represent heterogeneous distributions, and, in fact, it is not possible to obtain equivalent statements to Proposition 1 under this model. Bayesian formulations of this problem have also been introduced (Durante et al., 2017; Durante and Dunson, 2018), but computational methods to fit these models limit their applicability to much smaller graphs, with vertices numbering only in the dozens.

3.1 Identifiability

In the COSIE model, note that any orthogonal transformation of the parameters keeps the probability matrix of the model unchanged. Indeed, observe that

and therefore the most we can hope is that the parameters of the model are identifiable within the equivalence class

This non-identifiability is unavoidable in many latent space models, including the RDPG. However, with multiple graphs the situation is more nuanced, as we do not require the probability matrix of each graph to have the same rank. Note that Definition 2 does not restrict the matrices to be full rank, so the rank of each individual graph can be smaller than the dimension of the joint model.

The following proposition characterizes the identifiability of the model. The proof is given on the appendix.

Proposition 2 (Model identifiability.).

Let be a matrix with orthonormal columns and be symmetric matrices such that these are the parameters of the bounded rank COSIE model.

  1. For any for any pair of indices , the pairwise spectral and Frobenius distances

    are identifiable.

  2. Define . If , then is identifiable up to an orthogonal transformation.

  3. Given , the matrices are identifiable.

The previous results present identifiability characterizations at different levels. At the weakest level, the first part of Proposition 2 ensures that even if the parameters are not identifiable, the Frobenius or spectral distances between the score matrices of the graphs are unique. This property allows the use of distance-based methods for any subsequent inference in multiple graph problems, such as multidimensional scaling, -means or -nearest neighbors; some examples are shown in Section 6 and 7.

Proposition 2 also provides an identifiability condition for the invariant subspace . The matrices do not need to have the same rank, but the joint model may require a larger dimension to represent all the graphs, which is given by the rank of . To illustrate this scenario, consider the multilayer SBM with 3 communities. Let be real matrices, a community membership matrix, and some constants so that

Although the model is represented with three communities, each matrix is rank 2, so each graph individually can in fact be represented with only two communities. However, since the concatenated matrix whose columns are given by has full rank (i.e rank 3), the three communities of the model can be identified in the joint model, and thus this can be represented as a rank-3 COSIE model.

The last part of Proposition 2 shows that the individual parameters of a graph

are only identifiable with respect to a given basis of the eigenspace

, and hence, all the interpretations that can be derived from depending only on . Some instances of the COSIE model, including the multilayer SBM, provide specific characterizations of that can facilitate further interpretation of the parameters.

4 Fitting COSIE by spectral embedding of multiple adjacency matrices

This section presents a method to fit the model provided in Definition 2 to a sample of adjacency matrices . Fitting the model requires an estimator for the common subspace , for which we present a spectral approach. Given an estimated common subspace, we estimate the individual score matrices for each graph by least squares, which has a simple solution in out setting. In all of our analysis, we assume that the dimension of the model is known in advance, but we present a method to estimate it in practice.

We start by defining the adjacency spectral embedding (ASE) of an adjacency matrix , which is a standard tool for estimating the latent positions of a RDPG.

Definition 4 (Adjacency spectral embedding (Sussman et al., 2012; Rubin-Delanchy et al., 2017)).

For an adjacency matrix , let be the eigendecomposition of such that is the orthogonal matrix of eigenvectors, with , , and is a diagonal matrix containing the largest eigenvalues in magnitude. The scaled adjacency spectral embedding of is defined as . We refer to as the unscaled adjacency spectral embedding, or simply as the leading eigenvectors of .

The ASE provides a consistent and asymptotically normal estimator of the corresponding latent positions under the RDPG and GRDPG models as the number of vertices increases (Sussman et al., 2014; Athreya et al., 2017; Rubin-Delanchy et al., 2017). Therefore, under the COSIE model for which , the matrix obtained by ASE is a consistent estimator of , up to an orthogonal transformation, provided that the corresponding has full rank. This suggests that given a sample of adjacency matrices, one can simply use the unscaled ASE of any single graph to obtain an estimator of . However, this method is not leveraging the information about on all the graphs.

To give an intuition in how to fit the joint model for the data, we first consider working with the expected probability matrix of each graph . For simplicity in all our analysis here, we assume that all the matrices have full rank, so each matrix has exactly non-zero eigenvalues. Let and be the unscaled and scaled ASE of the matrix . Then

for some orthogonal matrix , and a diagonal matrix containing the eigenvalues of . Note that the matrices are possibly different for each graph, which is problematic when we want to leverage the information of all the graphs. Consider the matrix

that is formed by concatenating the unscaled ASEs of each graph, resulting in matrices of size . Alternatively, consider the same matrix but concatenating the scaled ASEs, given by

Note that both matrices and are rank , and the left singular vectors of either of them corresponds to the common subspace , up to some orthogonal transformation. The SVD step effectively aligns the multiple ASEs to a common spectral embedding. In the scaled ASE, the SVD also eliminates the effect of the eigenvalues of each graph, which are nuisance parameters in the common subspace estimation problem.

In practice, we do not have access to the expected value of the matrices , but we use the procedure described above with the adjacency matrices to obtain an estimator of the common subspace. Given , we proceed to estimate the individual parameters of the graphs by minimizing a least squares function,

and so, the estimator has a closed form given by

We call this procedure the multiple adjacency spectral embedding (MASE), and it is summarized in Algorithm 1.

Sample of graphs ; embedding dimensions and .
  1. For each , obtain the adjacency spectral embedding of on dimensions, and denote it by .

  2. Let be the matrix of concatenated spectral embeddings.

  3. Define as the matrix containing the

    leading left singular values of

    .

  4. For each , set .

.
Algorithm 1 Multiple adjacency spectral embedding (MASE)

The MASE method presented for estimating the parameters of the COSIE model only relies on singular value decompositions, and hence it is simple and computationally scalable. This method extends the ideas of a single spectral graph embeeding to a multiple graph setting. The first step of Algorithm 

1, which is tipically the major burden of the method, can be carried in parallel. The estimation of the score matrices only requires to know an estimate for , and hence allows to easily compute an out-of-sample-embedding for a new graph without having to update all the parameter estimates.

Figure 1 shows a graphical representation of MASE for estimating the common subspace of a set of four graphs. The graphs correspond to the multilayer SBM with two communities, and the connection matrices are selected in a way that the graphs have an heterogeneous structure within the sample. Note that after step 1 of Algorithm 1 the latent positions obtained by ASE are slightly rotated relatively to each graph. After estimating the common subspace by SVD, a common set of latent positions is found, which look tightly clustered within their community, showing that MASE is able to leverage the information of all the graphs in this example.

The first step of Algorithm 1 corresponds to a separate ASE of each graph. We stated the algorithm using the unscaled version of the ASE, and later in Section 5, we show that this version of the method is consistent in estimating the common subspace . In practice, this step can be replaced with other spectral embeddings of , including the scaled version described above. Note that the scaled ASE also makes use of the eigenvalues, and thus it puts more weight onto the columns of the matrix that correspond to the eigenvectors associated with the largest eigenvalues, which can be convenient, especially if the dimension is overestimated. Other spectral embeddings that also aim to estimate the eigenspace of the adjacency matrices, including the Laplacian spectral embedding or a regularized version of it (Priebe et al., 2019; Le et al., 2017), might be preferred in some circumstances, but we do not explore this direction further.

The method for estimating the common invariant subspace is related to other similar methods in the literature. Crainiceanu et al. (2011) proposed a population singular value decomposition method for representing a sample of rectangular matrices with the same dimensions, and use it to study arrays of images. In a different work, Fan et al. (2017) introduced a method for estimating the principal components of distributed data. Their method computes the leading eigenvectors of the covariance matrix on each server, and then obtains the leading eigenvectors of the average of the subspace projections, which is shown to converge to the solution of PCA with the same rate as if the data were not distributed. Both methods correspond to the unscaled ASE version of MASE. In our case, we show that this particular way of estimating the parameters of the COSIE model results in a consistent estimator of the common invariant subspace, and asymptotically normal estimators of the individual parameters.

4.1 Choice of embedding dimension

In practice, the dimension of the common subspace is usually unknown. Moreover, each individual might not be full rank, and so estimating a different embedding dimension for each graph might be necessary. The actual values of and correspond to the ranks of and respectively, and so they can be approximated by estimating the ranks of and . The scree plot method, which consists in looking for an elbow in the plot of ordered singular values of a matrices, provides an estimator of these quantities, and can be automatically performed using the method proposed by Zhu and Ghodsi (2006). We use this method to fit the model to real data in Section 7.

5 Theoretical results

In this section, we study the statistical performance of Algorithm 1 in estimating the parameters of the COSIE model when a sample of graphs is given. We first study the expected error in estimating the common subspace, and show that this error decreases as a function of the number of graphs. This result demonstrates that our method is able to leverage the information of the multiple graphs to improve the estimation of .

To study the estimation error of our method, we consider a sequence of parameters of the COSIE model . The magnitude of the entries of the parameters typically changes with , and in order to obtain consistent estimators we will require that . This is a natural assumption considering the fact that contains the eigenvalues of the expected adjacency matrix of a graph, and requiring these eigenvalues to grow as increases is necessary in order to control the error of ASE (Athreya et al., 2017). In the following, we omit the dependence of the parameters in . To simplify the analysis, we assume that all the score matrices have full rank.

Our next theorem introduces a bound on the expected error in estimating the common subspace of the COSIE model, which is the basis of our theoretical results. The proof of this result relies on bounds from Fan et al. (2017) for studying the eigenvector estimation error in distributed PCA. However, our setting presents some substantial differences, including the distribution of the data and the fact that the expected adjacency matrices are not jointly diagonalizable. Moreover, the scaled version of the algorithm adds the complication of working with eigenvalues, and extending the theory to this case is more challenging. We thus work with the unscaled version of the ASE in Algorithm 1. Before stating the theorem, we introduce some notation. For a given square symmetric matrix , denote by to the ordered eigenvalues of , and define and as the smallest and largest eigenvalue in magnitude of , and the largest sum of the rows of .

Theorem 3.

Let be a collection of full rank symmetric matrices of size , a matrix with orthonormal columns, and a sample of random adjacency matrices, and set . Define

(2)

Let be the estimator of obtained by Algorithm 1. Suppose that and . Then,

(3)

The above theorem requires two conditions on the expected adjacency matrices, that control the estimation error of each ASE for a single graph. First, the largest expected vertex degree needs to grow at rate to ensure the concentration in spectral norm of the adjacency matrix to its expectation (see Theorem 20 of Athreya et al. (2017)). Second, the condition is required to control the average estimation error of each ASE. For this condition to hold, it is enough to have that for each graph . In particular, observe that

so it is sufficient that . Specific conditions for the multilayer SBM that relate the community sizes and the eigenvalues of the connection probability matrices are presented in Section 5.1.

Theorem 3 theorem bounds the expected error in estimating the subspace of by the sum of two terms in Equation (3). To understand these terms, note that the error can be partitioned as

(4)

where is the matrix containing the leading eigenvectors of . The first term of the right hand side of Equation (4) measures the variance of the estimated projection matrix, which corresponds to the first term in Equation (3). Thus, this term converges to zero as the number of graphs increases. The estimated projection is not unbiased, and therefore the second term of Equation 4 is always positive, but its order is usually smaller than the variance. This bound is a function of , which quantifies the average error in estimating by using the leading eigenvectors of on each network. Note that the inclusion of an orthogonal transformation has to do with the fact that the common subspace is only identifiable up to such transformation.

The above result establishes that when the sample size is small (), the variance term dominates the error as observed in the left hand side of Equation (3), but as the sample size increases this term vanishes. The second term is the bias of estimating by using the leading eigenvectors of each network, and this term dominates when the sample size is large. These type of results are common in settings when a global estimator is obtained from the average of local estimators (Lee et al., 2017; Arroyo and Hou, 2016; Fan et al., 2017).

To illustrate the result of Theorem 3, consider the following example on the Erdös-Rényi (ER) model (Erdős and Rényi, 1959; Gilbert, 1959). In the ER random graph model , the edges between any pair of the vertices are formed independently with a constant probability . We consider a sample of adjacency matrices , each of them having a different edge probability so that . By defining , where is the -dimensional vector with all entries equal to one, we can represent these graphs under the COSIE model by setting . Hence, and

Therefore, if , Theorem 3 implies that

In the dense regime when , the error of estimating the common subspace is of order . An alternative estimator for the common subspace is given by the ASE of the mean adjacency matrix, , for which the expected subspace estimation error is (Tang et al., 2018). When , the error rate of both estimators coincide. However, the estimator of Tang et al. (2018) is only valid when all the graphs have the same expected adjacency matrices, and can perform poorly in heterogeneous populations of graphs (see Section 6). The ER model described above can be regarded as a special case of the multilayer SBM with only one community, which we consider next.

5.1 Perfect recovery in community detection

Estimation of the common subspace in the multilayer SBM is of particular interest due to its relation with the community detection problem (Girvan and Newman, 2002; Abbe, 2017). In the multilayer SBM with communities, the matrix of the common subspace basis has different rows, each of them corresponding to a different community. An accurate estimator will then reflect this structure, and when the community labels of the vertices are not known, clustering the rows of into groups can reveal these labels. Different clustering procedures can be used for this goal, and in this section we focus on -means clustering defined next. Suppose that is the set of valid membership matrices for vertices and communities, that is,

The community assignments are obtained by solving the -means problem

(5)

The matrix thus contains the estimated community assignments, and their corresponding centroids.

To understand the performance in community detection of the procedure described above, consider a multilayer SBM with parameters and . Denote the community sizes by , such that , and define the quantities , , and . To obtain a consistent estimator of the common invariant subspace, the following assumption is required.

Assumption 1.

There exist some absolute constants such that for all ,

The previous assumption requires a uniform control on ratios of the community sizes and the eigenvalues of the connectivity matrices. When all communities have comparable sizes (i.e., for some absolute constant ), then the smallest eigenvalue of each is allowed to approach zero at a rate

On the other hand, if the smallest eigenvalue of each is bounded away from zero by an absolute constant, then the size of the largest community can be at most .

The next Corollary is a consequence of Theorem 3 for the multilayer SBM setting. The proof is given on the Appendix.

Corollary 4.

Consider a sample of adjacency matrices from the multilayer SBM

and let the estimated subspace from Algorithm 1. Suppose that , , and that Assumption 1 holds. Then, for sufficiently large ,

(6)

The above corollary presents sufficient conditions for consistent estimation of the common invariant subspace, which contains the information of the community labels encoded in the matrix . The corollary requires the size of the smallest community to grow as increases, which is something commonly assumed in the literature (Rohe et al., 2011; Lyzinski et al., 2014).

The following result provides a bound on the expected number of misclustered vertices after clustering the rows of according to Equation (5). Due to the non-identifiability of the community labels, these are recovered up to a permutation , where