# On spectral embedding performance and elucidating network structure in stochastic block model graphs

Statistical inference on graphs often proceeds via spectral methods involving low-dimensional embeddings of matrix-valued graph representations, such as the graph Laplacian or adjacency matrix. In this paper, we analyze the asymptotic information-theoretic relative performance of Laplacian spectral embedding and adjacency spectral embedding for block assignment recovery in stochastic block model graphs by way of Chernoff information. We investigate the relationship between spectral embedding performance and underlying network structure (e.g. homogeneity, affinity, core-periphery, (un)balancedness) via a comprehensive treatment of the two-block stochastic block model and the class of K-block models exhibiting homogeneous balanced affinity structure. Our findings support the claim that, for a particular notion of sparsity, loosely speaking, "Laplacian spectral embedding favors relatively sparse graphs, whereas adjacency spectral embedding favors not-too-sparse graphs." We also provide evidence in support of the claim that "adjacency spectral embedding favors core-periphery network structure."

## Authors

• 8 publications
• 34 publications
• 84 publications
09/16/2017

### Statistical inference on random dot product graphs: a survey

The random dot product graph (RDPG) is an independent-edge random graph ...
09/28/2018

### Weighted Spectral Embedding of Graphs

We present a novel spectral embedding of graphs that incorporates weight...
10/04/2021

### Unraveling the graph structure of tabular datasets through Bayesian and spectral analysis

In the big-data age tabular datasets are being generated and analyzed ev...
09/29/2019

### Limit theorems for out-of-sample extensions of the adjacency and Laplacian spectral embeddings

Graph embeddings, a class of dimensionality reduction techniques designe...
08/23/2018

### On a 'Two Truths' Phenomenon in Spectral Graph Clustering

Clustering is concerned with coherently grouping observations without an...
03/26/2021

### Beyond the adjacency matrix: random line graphs and inference for networks with edge attributes

Any modern network inference paradigm must incorporate multiple aspects ...
07/28/2016

### Limit theorems for eigenvectors of the normalized Laplacian for random graphs

We prove a central limit theorem for the components of the eigenvectors ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Preface

The stochastic block model (SBM) (Holland et al., 1983) is a simple yet ubiquitous network model capable of capturing community structure that has been widely studied via spectral methods in the mathematics, statistics, physics, and engineering communities. Each vertex in an -vertex -block SBM graph belongs to one of the

blocks (communities), and the probability of any two vertices sharing an edge depends exclusively on the vertices’ block assignments (memberships).

This paper provides a detailed comparison of two popular spectral embedding procedures by synthesizing recent advances in random graph limit theory. We undertake an extensive investigation of network structure for stochastic block model graphs by considering sub-models exhibiting various functional relationships, symmetries, and geometric properties within the inherent parameter space consisting of block membership probabilities and block edge probabilities. We also provide a collection of figures depicting relative spectral embedding performance as a function of the SBM parameter space for a range of sub-models exhibiting different forms of network structure, specifically homogeneous community structure, affinity structure, core-periphery structure, and (un)balanced block sizes (see Section 5).

The rest of this paper is organized as follows.

• Section 2 introduces the formal setting considered in this paper and contextualizes this work with respect to the existing statistical network analysis literature.

• Section 3 establishes notation, presents the generalized random dot product graph model of which the stochastic block model is a special case, defines the adjacency and Laplacian spectral embeddings, presents the corresponding spectral embedding limit theorems, and specifies the notion of sparsity considered in this paper.

• Section 4 motivates and formulates a measure of large-sample relative spectral embedding performance via Chernoff information.

• Section 5 presents a treatment of the two-block SBM and certain -block SBMs whereby we elucidate the relationship between spectral embedding performance and network model structure.

• Section 6 offers further discussion and some concluding remarks.

• Section 7 provides additional details intended to supplement the main body of this paper.

## 2 Introduction

Formally, we consider the following stochastic block model setting.

###### Definition 1 (K-block stochastic block model (SBM)).

Let be a positive integer and

be a vector in the interior of the

-dimensional unit simplex in . Let be a symmetric matrix with distinct rows. We say with scaling factor provided the following conditions hold. Firstly, where

are independent and identically distributed (i.i.d.) random variables with

. Then, denotes a symmetric (adjacency) matrix such that, conditioned on , for all , the entries are independent Bernoulli random variables with . If only is observed, namely if is integrated out from , then we write .111The distinct row assumption removes potential redundancy with respect to block connectivity and labeling. Namely, if rows and of are identical, then their corresponding blocks are indistinguishable and can without loss of generality be merged to form a reduced block edge probability matrix with corresponding combined block membership probability . We also remark that Definition 1 implicitly permits vertex self-loops, a choice that we make for mathematical expediency. Whether or not self-loops are disallowed does not alter the asymptotic results and conclusions presented here.

The SBM is an example of an inhomogeneous Erdős–Rényi random graph model (Bollobás et al., 2007) and reduces to the classical Erdős–Rényi model (Erdős and Rényi, 1959) in the degenerate case when all the entries of are identical. This model enjoys an extensive body of literature focused on spectral methods (von Luxburg, 2007)

for statistical estimation, inference, and community detection, including

Fishkind et al. (2013); McSherry (2001); Lei and Rinaldo (2015); Rohe et al. (2011); Sussman et al. (2014); Sarkar and Bickel (2015). Considerable effort has also been devoted to the information-theoretic and computational investigation of the SBM as a result of interest in the community detection problem; for an overview see Abbe (2018). Popular variants of the SBM include the mixed-membership stochastic block model (Airoldi et al., 2008) and the degree-corrected stochastic block model (Karrer and Newman, 2011).

Within the statistics literature, substantial attention has been paid to the class of -block SBMs with positive semidefinite block edge probability matrices . This is due in part to the extensive study of the random dot product graph (RDPG) model (Nickel, 2006; Young and Scheinerman, 2007; Athreya et al., 2018), a latent position random graph model (Hoff et al., 2002) which includes positive semidefinite SBMs as a special case. Notably, it was recently shown that for the random dot product graph model, both Laplacian spectral embedding (LSE; see Definition 3) and adjacency spectral embedding (ASE; see Definition 3

) behave approximately as random samples from Gaussian mixture models

(Athreya et al., 2016; Tang and Priebe, 2016). In tandem with these limit results, the concept of Chernoff information (Chernoff, 1952) was employed in Tang and Priebe (2016) to demonstrate that neither Laplacian nor adjacency spectral embedding dominates the other for subsequent inference as a spectral embedding method when the underlying inference task is to recover vertices’ latent block assignments. In doing so, the results in Tang and Priebe (2016) clarify and complete the groundbreaking work in Sarkar and Bickel (2015)

on comparing spectral clusterings for stochastic block model graphs.

In Tang and Priebe (2016) the authors leave open the problem of comprehensively investigating Chernoff information as a measure of relative spectral embedding performance for stochastic block model graphs. Moreover, they do not investigate how relative spectral embedding performance corresponds to underlying network model structure. This is understandable, since the positive semidefinite restriction on limits the possible network structure that can be investigated under the random dot product graph model.

More recently, the limit theory in Tang and Priebe (2016) was extended in Rubin-Delanchy et al. (2017) to hold for all SBMs within the more flexible framework of the generalized random dot product graph (GRDPG) model. These developments now make it possible to conduct a more comprehensive Chernoff-based analysis, and that is precisely the aim of this paper. We set forth to formulate and analyze a criterion based on Chernoff information for quantifying relative spectral embedding performance which we then further consider in conjunction with underlying network model structure. The investigation carried out in this paper is, to the best of our knowledge, among the first of its kind in the study of statistical network analysis and random graph inference.

This paper focuses on the following two models which have garnered widespread interest (e.g. see Abbe (2018) and the references therein).

1. The two-block SBM with and where ;

2. The block SBM exhibiting homogeneous balanced affinity structure, i.e.  for all , for all , , and .

Using the concept of Chernoff information (Section 4), we obtain an information-theoretic summary characteristic such that the cases , , and correspond to the preference of spectral embedding procedure based on approximate large-sample relative performance, summarized as ASE  LSE, ASE  LSE, and ASE  LSE, respectively. The above models’ low-dimensional parameter spaces facilitate visualizing and analyzing the relationship between network structure (i.e. ) and embedding performance (i.e. ).

This paper considers the task of performing inference on a single large graph. As such, we interpret the notion of sparsity in reference to the magnitudes of probability parameters, namely the magnitudes of the entries of . This notion of sparsity corresponds to the interpretation and intuition of a practitioner wanting to do statistics with an observed graph. We shall, with this understanding in mind, subsequently demonstrate that LSE is preferred as an embedding method in relatively sparse regimes, whereas ASE is preferred as an embedding method in not-too-sparse regimes.

By way of contrast, the scaling factor in Definition 1, which is included for the purpose of general presentation, indexes a sequence of models wherein edge probabilities change with . We take to be constant in which by rescaling is equivalent to setting . Limit theorems are known for regimes where as , but these regimes are uninteresting for single graph inference from the perspective of relative spectral embedding performance (Tang and Priebe, 2016).

## 3 Preliminaries

### 3.1 Notation

In this paper, all vectors and matrices are real-valued. The symbols and are used to assign definitions and to denote formal equivalence, respectively. Given a symmetric positive definite matrix , let denote the real inner product induced by . Similarly, define the induced norm as . In particular, given the identity matrix , denote the standard Euclidean inner product and Euclidean norm by and , respectively. Given an underlying matrix, and denote the matrix determinant and matrix trace operator, respectively. Given a diagonal matrix , denotes the entrywise absolute value (matrix) of .

The vector of all ones in is denoted by

, whereas the zero matrix in

is denoted by . We suppress the indices for convenience when the underlying dimensions are understood, writing instead and .

Let denote the set of natural numbers so that for , . For integers , , and , let be the direct sum (diagonal) matrix with identity matrices and together with the convention that . For example, .

For integers , the set of all real matrices with orthonormal columns shall be denoted by . Let denote the indefinite orthogonal group with signature , and let denote the orthogonal group in . In particular, has the characterization . In the case of the orthogonal group, this characterization reduces to the relationship .

### 3.2 The generalized random dot product graph model

A growing corpus has emerged within the statistics literature focused on the development of theory and applications for the random dot product graph (RDPG) model (Nickel, 2006; Young and Scheinerman, 2007). This latent position random graph model associates to each vertex in a graph an underlying low-dimensional vector. These vectors may be viewed as encoding structural information or attributes possessed by their corresponding vertices. In turn, the probability of two vertices sharing an edge is specified through the standard Euclidean inner (dot) product of the vertices’ latent position vectors. While simple in concept and design, this model has proven successful in real-world applications in the areas of neuroscience and social networks (Lyzinski et al., 2017). On the theoretical side, the RDPG model enjoys some of the first-ever statistical theory for two-sample hypothesis testing on random graphs, both semiparametric (Tang et al., 2017) and nonparametric (Tang et al., 2017). For more on the RDPG model, see the survey Athreya et al. (2018) and the references therein.

More recently, the generalized random dot product graph (GRDPG) model was introduced as an extension of the RDPG model that includes as special cases the mixed membership stochastic block model as well as all (single membership) stochastic block models (Rubin-Delanchy et al., 2017). Effort towards the development of theory for the GRDPG model has already raised new questions and produced new findings related to the geometry of spectral methods, embeddings, and random graph inference. The present paper further contributes to these efforts.

###### Definition 2 (The generalized random dot product graph (GRDPG) model).

For integers and such that , let be a distribution on a set such that for all . We say that with signature and scaling factor if the following hold. Let be independent and identically distributed random (latent position) vectors with

 X:=[X1|⋯|Xn]⊤∈Rn×d and P:=ρnXId+d−X⊤∈[0,1]n×n. (1)

For each , the entries of the symmetric adjacency matrix are then generated in a conditionally independent fashion given the latent positions, namely

 {Aij|Xi,Xj}∼Bernoulli(ρn⟨Id+d−Xi,Xj⟩). (2)

In this setting, the conditional probability can be computed explicitly as a product of Bernoulli probabilities.

To reiterate, we consider the regime and therefore suppress dependencies on later in the text. When no confusion can arise, we also use adorned versions of the symbol to denote Chernoff-related quantities unrelated to in a manner consistent with the notation in Tang and Priebe (2016) (see Section 4).

When , the GRDPG model reduces to the RDPG model. When the distribution is a discrete distribution on a finite collection of vectors in , then the GRDPG model coincides with the SBM, in which case the edge probability matrix arises as an appropriate dilation of the block edge probability matrix . Given any valid as in Definition 1, there exist integers , and a matrix such that has the (not necessarily unique) factorization , which follows since the spectral decomposition of can be written as . This demonstrates the ability of the GRDPG framework in Definition 2 to model all possible stochastic block models formulated in Definition 1.

###### Remark 1 (Non-identifiability in the GRDPG model).

The GRDPG model possess two intrinsic sources of non-identifiability, summarized as “uniqueness up to indefinite orthogonal transformations” and “uniqueness up to artificial dimension blow-up”. More precisely, for with signature , the following considerations must be taken into account.

1. For any , whenever , where denotes the distribution of the latent position vector and denotes equality in distribution. This source of non-identifiability cannot be mitigated. See Eq. (2).

2. There exists a distribution on for some such that where . This source of non-identifiability can be avoided by assuming, as we do in this paper, that is non-degenerate in the sense that for

, the second moment matrix

is full rank.

###### Definition 3 (Adjacency and Laplacian spectral embeddings).

Let be a symmetric adjacency matrix with eigendecomposition

and with ordered eigenvalues

corresponding to orthonormal eigenvectors

. Given a positive integer such that , let and . The adjacency spectral embedding (ASE) of into is then defined to be the matrix . The matrix serves as a consistent estimator for up to indefinite orthogonal transformation as .

Along similar lines, define the normalized Laplacian of as

 L(A):=(diag(A1n))−1/2A(% diag(A1n))−1/2∈Rn×n (3)

whose eigendecomposition is given by with ordered eigenvalues corresponding to orthonormal eigenvectors . Given a positive integer such that , let and let . The Laplacian spectral embedding (LSE) of into is then defined to be the matrix . The matrix serves as a consistent estimator for the matrix up to indefinite orthogonal transformation as .

###### Remark 2 (Consistent estimation and parametrization involving latent positions).

The matrices and , which are one-to-one invertible transformations of each other, may be viewed as providing different parametrizations of GRDPG graphs. As such, comparing and as estimators is non-trivial. In order to carry out such a comparison, we subsequently adopt an information-theoretic approach in which we consider a particular choice of -divergence which is both analytically tractable and statistically interpretable in the current setting.

For the subsequent purposes of the present work, Theorems 4 and 5 (below) state slightly weaker formulations of the corresponding limit theorems obtained in Rubin-Delanchy et al. (2017) for adjacency and Laplacian spectral embedding.

###### Theorem 4 (ASE limit theorem for GRDPG, adapted from Rubin-Delanchy et al. (2017)).

Assume the -dimensional GRDPG setting in Definition 2 with . Let be the adjacency spectral embedding into with -th row denoted by . Let

denote the cumulative distribution function of the centered multivariate normal distribution in

with covariance matrix . Then, with respect to the adjacency spectral embedding, there exists a sequence of matrices such that, for any ,

 P[√n(QˆXi−Xi)≤z]→∫XΦ(z,Σ(x))dF(x) (4)

as , where for ,

 Σ(x):=Id+d−Δ−1E[g(x,X1)X1X⊤1]Δ−1Id+d−,

with and .

###### Theorem 5 (LSE limit theorem for GRDPG, adapted from Rubin-Delanchy et al. (2017)).

Assume the -dimensional GRDPG setting in Definition 2 with . Let be the Laplacian spectral embedding into with -th row denoted by . Let denote the cumulative distribution function of the centered multivariate normal distribution in with covariance matrix . Then, with respect to the Laplacian spectral embedding, there exists a sequence of matrices such that, for any ,

 P⎡⎢⎣n⎛⎜⎝˜Q˘Xi−Xi√∑j⟨Id+d−Xi,Xj⟩⎞⎟⎠≤z⎤⎥⎦→∫XΦ(z,˜Σ(x))dF(x) (5)

as , where for and ,

 ˜Σ(x):=Id+d−˜Δ−1E⎡⎣˜g(x,X1)(X1⟨Id+d−μ,X1⟩−˜ΔId+d−x2⟨Id+d−μ,x⟩)(X1⟨Id+d−μ,X1⟩−˜ΔId+d−x2⟨Id+d−μ,x⟩)⊤⎤⎦˜Δ−1Id+d−,

with and .

## 4 Spectral embedding performance

We desire to compare the large- sample relative performance of adjacency and Laplacian spectral embedding for subsequent inference, where the subsequent inference task is naturally taken to be the problem of recovering latent block assignments. Here, measuring spectral embedding performance will correspond to estimating the large-sample optimal error rate for recovering the underlying block assignments following each of the spectral embeddings. Towards this end, we now introduce Chernoff information and Chernoff divergence as appropriate information-theoretic quantities.

Given independent and identically distributed random vectors arising from one of two absolutely continuous multivariate distributions and on with density functions and

, respectively, we are interested in testing the simple null hypothesis

against the simple alternative hypothesis . In this framework, a statistical test can be viewed as a sequence of mappings indexed according to sample size such that returns the value two when is rejected in favor of and correspondingly returns the value one when is favored. For each

, the corresponding significance level and type-II error are denoted by

and , respectively.

Assume that the prior probability of

being true is given by . For a given , let

denote the type-II error associated with the corresponding likelihood ratio test when the type-I error is at most

. Then, the Bayes risk in deciding between and given independent random vectors is given by

 infα⋆m∈(0,1)πα⋆m+(1−π)β⋆m. (6)

The Bayes risk is intrinsically related to Chernoff information (Chernoff, 1952, 1956), , namely

 limm→∞1m[infα⋆m∈(0,1)log(πα⋆m+(1−π)β⋆m)]=−C(F1,F2), (7)

where

 C(F1,F2) :=−log[inft∈(0,1)∫Rdft1(\boldmathx)f1−t2(\boldmathx)d\boldmathx]=supt∈(0,1)[−log∫Rdft1(\boldmathx)f1−t2(\boldmathx)d\boldmathx].

In words, the Chernoff information between and is the exponential rate at which the Bayes risk decreases as . Note that the Chernoff information is independent of the prior probability . A version of Eq. (7) also holds when considering hypothesis with distributions , thereby introducing the quantity (see for example Tang and Priebe (2016)).

Chernoff information can be expressed in terms of the Chernoff divergence between distributions and , defined for as

 Ct(F1,F2)=−log∫Rdft1(x)f1−t2(x)dx, (8)

which yields the relation

 C(F1,F2)=supt∈(0,1)Ct(F1,F2). (9)

The Chernoff divergence is an example of an -divergence and as such satisfies the data processing lemma (Liese and Vajda, 2006) and is invariant with respect to invertible transformations (Devroye et al., 2013). One could instead use another -divergence for the purpose of comparing the two embedding methods, such as the Kullback-Liebler divergence. Our choice is motivated by the aforementioned relationship with Bayes risk in Eq. (7).

In this paper we explicitly consider multivariate normal distributions as a consequence of Theorems 4 and 5 when conditioning on the individual underlying latent positions for stochastic block model graphs. In particular, given , , and , then for , the Chernoff information between and is given by

 C(F1,F2) =supt∈(0,1)[t(1−t)2(μ2−μ1)⊤Σ−1t(μ2−μ1)+12log(det(Σt)det(Σ1)t%det(Σ2)1−t)] =supt∈(0,1)[t(1−t)2∥μ2−μ1∥2Σ−1t+12log(% det(Σt)det(Σ1)tdet(Σ2)1−t)].

Let and denote the matrix of block edge probabilities and the vector of block assignment probabilities for a -block stochastic block model as before. This corresponds to a special case of the GRDPG model with signature , , and latent positions . For an -vertex SBM graph with parameters , the large-sample optimal error rate for recovering block assignments when performing adjacency spectral embedding can be characterized by the quantity defined by

 (10)

where for .

Similarly, for Laplacian spectral embedding, , one has

 ρL:=mink≠lsupt∈(0,1)[nt(1−t)2∥˜νk−˜νl∥2˜Σ−1kl(t)+12log(det(˜Σkl(t))det(˜Σk)tdet(˜Σl)1−t)], (11)

where and .

The factor in Eqs. (1011) arises from the implicit consideration of the appropriate (non-singular) theoretical sample covariance matrices. To assist in the comparison and interpretation of the quantities and , we assume throughout this paper that for . The logarithmic terms in Eqs. (1011) as well as the deviations of each term from are negligible for large , collectively motivating the following large-sample measure of relative performance, , where

 ρAρL≡ρA(n)ρL(n)→ρ⋆≡ρ⋆Aρ⋆L:=mink≠lsupt∈(0,1)[t(1−t)∥νk−νl∥2Σ−1kl(t)]mink≠lsupt∈(0,1)[t(1−t)∥˜νk−˜νl∥2˜Σ−1kl(t)]. (12)

Here we have suppressed the functional dependence on the underlying model parameters and . For large , observe that as increases, also increases, and therefore the large-sample optimal error rate corresponding to adjacency spectral embedding decreases in light of Eq. (7) and its generalization. Similarly, large values of correspond to good theoretical performance of Laplacian spectral embedding. Thus, if , then ASE is to be preferred to LSE, whereas if , then LSE is to be preferred to ASE. The case when indicates that neither ASE nor LSE is superior for the given parameters and . To reiterate, we summarize these preferences as ASE  LSE, ASE  LSE, and ASE  LSE, respectively.

In what follows, we fixate on the asymptotic quantity . For the two-block SBM and certain -block SBMs exhibiting symmetry, Eq. (12) reduces to the simpler form

 ρ⋆=supt∈(0,1)[t(1−t)∥ν1−ν2∥2Σ−11,2(t)]supt∈(0,1)[t(1−t)∥˜ν1−˜ν2∥2˜Σ−11,2(t)] (13)

for canonically specified latent positions and . In some cases it is possible to concisely obtain analytic expressions (in ) for both the numerator and denominator. In other cases this is not possible. A related challenge with respect to Eq. (12

) is analytically inverting the interpolated block conditional covariance matrices

and . Section 7 provides additional technical details and discussion addressing these issues.

## 5 Elucidating network structure

### 5.1 The two-block stochastic block model

Consider the set of two-block SBMs with parameters and . For , then without loss of generality by symmetry. In general, for any fixed choice of , the class of models can be partitioned according to matrix rank, namely

 B ≡B1⨆B2:={B:rank(B)=1;a,b,c∈(0,1)}⨆{B:rank(B)=2;a,b,c∈(0,1)}.

The collection of sub-models further decomposes into the disjoint union of the Erdős–Rényi model with homogeneous edge probability and its relative complement in satisfying the determinant constraint . These partial sub-models can be viewed as one-dimensional and two-dimensional (parameter) regions in the open unit cube, , respectively.

Similarly, the collection of sub-models further decomposes into the disjoint union of and , where denotes the set of positive definite matrices in and . Here only and are necessary for computing edge probabilities via inner products of the latent positions. Both of these partial sub-models can be viewed as three-dimensional (parameter) regions in .

###### Remark 3 (Latent position parametrization).

One might ask whether or not for our purposes there exists a “best” latent position representation for some or even every SBM. To this end and more generally, for any and , there exists a unique lower-triangular matrix with positive diagonal entries such that by the Cholesky matrix decomposition. This yields a canonical choice for the matrix of latent positions when is positive definite. In particular, for , then with . In contrast, for , then with , keeping in mind that in this case . The latter factorization may be viewed informally as an indefinite Cholesky decomposition under . For the collection of rank one sub-models , the latent positions and are simply taken to be scalar-valued.

#### 5.1.1 Homogeneous balanced network structure

We refer to the two-block SBM sub-model with and as the homogeneous balanced two-block SBM. The cases when , , and correspond to the cases when is positive definite, indefinite, and reduces to Erdős–Rényi, respectively. The positive definite parameter regime has the network structure interpretation of being assortative in the sense that the within-block edge probability is larger than the between-block edge probability , consistent with the affinity-based notion of community structure. In contrast, the indefinite parameter regime has the network structure interpretation of being disassortative in the sense that between-block edge density exceeds within-block edge density, consistent with the “opposites attract” notion of community structure.

For this SBM sub-model, can be simplified analytically (see Section 7 for additional details) and can be expressed as a translation with respect to the value one, namely

 ρ⋆≡ρ⋆a,b=1+(a−b)2(3a(a−1)+3b(b−1)+8ab)4(a+b)2(a(1−a)+b(1−b)):=1+ca,b×ψa,b, (14)

where and . By recognizing that functions as a discriminating term, it is straightforward to read off the relative performance of ASE and LSE according to Table 1.

Further investigation of Eq. (14) leads to the observation that ASE  LSE for all , thereby yielding a parameter region for which LSE dominates ASE. On the other hand, for any fixed there exist values such that ASE  LSE under , whereas ASE  LSE under . Figure 1 demonstrates that for homogeneous balanced network structure, LSE is preferred to ASE when the entries in are sufficiently small, whereas conversely ASE is preferred to LSE when the entries in are not too small.

###### Remark 4 (Model spectrum and ASE dominance I).

In the current setting , hence implies ASE  LSE by Eq. (14). This observation amounts to a network structure-based (i.e. -based) spectral sufficient condition for determining when ASE is preferred to LSE.

###### Remark 5 (A balanced one-dimensional SBM restricted sub-model).

When , the homogeneous balanced sub-model further reduces to a one-dimensional parameter space such that simplifies to

 ρ⋆=1+14(2a−1)2≥1, (15)

demonstrating that ASE uniformly dominates LSE for this restricted sub-model. Additionally, it is potentially of interest to note that in this setting the marginal covariance matrices from Theorem 4 for ASE coincide for each block. In contrast, the same behavior is not true for LSE.

#### 5.1.2 Core-periphery network structure

We refer to the two-block SBM sub-model with and as the core-periphery two-block SBM. We explicitly consider the balanced (block size) regime in which and an unbalanced regime in which . Here, the cases , , and correspond to the cases when is positive definite, indefinite, and reduces to the Erdős–Rényi model, respectively.

For this sub-model, the ratio is not analytically tractable in general. That is to say, simple closed-form solutions do not simultaneously exist for the numerator and denominator in the definition of . As such, Figure 2 is obtained numerically by evaluating on a grid of points in followed by smoothing.

For , graphs generated from this SBM sub-model exhibit the popular interpretation of core-periphery structure in which vertices forming a dense core are attached to surrounding periphery vertices with comparatively smaller edge connectivity. Provided the core is sufficiently dense, namely for in the balanced regime and in the unbalanced regime, Figure 2 demonstrates that ASE  LSE. Conversely, ASE  LSE uniformly in for small enough values of in both the balanced and unbalanced regime.

In contrast, when , the sub-model produces graphs whose network structure is interpreted as having a comparatively sparse induced subgraph which is strongly connected to all vertices in the graph but for which the subgraph vertices exhibit comparatively weaker connectivity. Alternatively, the second block may itself be viewed as a dense core which is simultaneously densely connected to all vertices in the graph. Figure 2 illustrates that for the balanced regime, LSE is preferred for sparser induced subgraphs. Put differently, for large enough dense core with dense periphery, then ASE is the preferable spectral embedding procedure. LSE is preferred to ASE in only a relatively small region corresponding approximately to the triangular region where , which as a subset of the unit square has area . Similar behavior holds for the unbalanced regime for approximately the (enlarged) triangular region of the parameter space where , which as a subset of the unit square has area .

Figure 2 suggests that as decreases from to , LSE is favored in a growing region of the parameter space, albeit still in a smaller region than that for which ASE is to be preferred. Together with the observation that LSE dominates in the lower-left corner of the plots in Figure 2 where and have small magnitude, we are led to say in summary that LSE favors relatively sparse core-periphery network structure. To reiterate, sparsity is interpreted with respect to the parameters and , keeping in mind the underlying simplifying assumption that for .

###### Remark 6 (Model spectrum and ASE dominance II).

For , then . Numerical evaluation (not shown) yields that implies ASE  LSE. Along the same lines as the discussion in Section 5.1.1, this observation provides a network structure (i.e. -based) spectral sufficient condition for this sub-model for determining the relative embedding performance ASE  LSE.

#### 5.1.3 Two-block rank one sub-model

The sub-model for which with and can be re-parameterized according to the assignments and , yielding with . Here and is positive semidefinite, corresponding to the one-dimensional RDPG model with latent positions given by the scalars and with associated probabilities and , respectively. Explicit computation yields the expression

 (16)

whereby is given as an explicit, closed-form function of the parameter values , , and with . The simplicity of this sub-model together with its analytic tractability with respect to both and makes it particularly amenable to study for the purpose of elucidating network structure. Below, consideration of this sub-model further illustrates the relationship between (parameter-based) sparsity and relative embedding performance.