DeepAI

# Optimal Bayesian Estimation for Random Dot Product Graphs

We propose a Bayesian approach, called the posterior spectral embedding, for estimating the latent positions in random dot product graphs, and prove its optimality. Unlike the classical spectral-based adjacency/Laplacian spectral embedding, the posterior spectral embedding is a fully-likelihood based graph estimation method taking advantage of the Bernoulli likelihood information of the observed adjacency matrix. We develop a minimax-lower bound for estimating the latent positions, and show that the posterior spectral embedding achieves this lower bound since it both results in a minimax-optimal posterior contraction rate, and yields a point estimator achieving the minimax risk asymptotically. The convergence results are subsequently applied to clustering in stochastic block models, the result of which strengthens an existing result concerning the number of mis-clustered vertices. We also study a spectral-based Gaussian spectral embedding as a natural Bayesian analogy of the adjacency spectral embedding, but the resulting posterior contraction rate is sub-optimal with an extra logarithmic factor. The practical performance of the proposed methodology is illustrated through extensive synthetic examples and the analysis of a Wikipedia graph data.

• 13 publications
• 19 publications
05/16/2019

### Privacy Preserving Adjacency Spectral Embedding on Stochastic Blockmodels

For graphs generated from stochastic blockmodels, adjacency spectral emb...
05/23/2014

### Empirical Bayes Estimation for the Stochastic Blockmodel

Inference for the stochastic blockmodel is currently of burgeoning inter...
08/10/2011

### A consistent adjacency spectral embedding for stochastic blockmodel graphs

We present a method to estimate block membership of nodes in a random gr...
10/10/2019

### Efficient Estimation for Random Dot Product Graphs via a One-step Procedure

We propose a one-step procedure to efficiently estimate the latent posit...
04/27/2018

### On the Estimation of Latent Distances Using Graph Distances

We are given the adjacency matrix of a geometric graph and the task of r...
09/27/2021

### Graph Encoder Embedding

In this paper we propose a lightning fast graph embedding method called ...
12/08/2021

### Consistency of Spectral Seriation

Consider a random graph G of size N constructed according to a graphon w...

## 1 Introduction

Using graphs as a data structure to represent network data with the vertices denoting entities and the edges encoding relationships between vertices, has become increasingly important in a broad range of applications, including social networks (Young and Scheinerman, 2007), brain imaging (Priebe et al., 2017), and neuroscience (Lyzinski et al., 2017b; Tang et al., 2018). For example, in a facebook network, vertices represent users, and the occurrence of an edge linking any two users indicates that they are friends on facebook. When one collects random graph data, it may be costly or even infeasible to collect individual-specific attributes that are heterogeneous across individuals, while only the adjacency matrix of the graph is accessible. For example, in studying the structure of a wikipedia page network, collecting the hyperlinks among articles is much more feasible than collecting the attributes associated with each individual article. To address such a challenge arising in real-world network data, the authors of Hoff et al. (2002) proposed latent positions graphs

, in which each vertex is associated with an unobserved Euclidean vector called the

latent position

, and the edge probability between any two vertices only depends on their latent positions. Formally, each vertex

is associated with a vector in some latent space , and there exists a symmetric function called graphon (Lovász, 2012), such that an edge between vertices and occurs with probability , and the occurrences of these edges are independent given the latent positions. There is vast literature addressing statistical inference on latent positions graphs. For an incomplete list of references, see Bickel and Chen (2009); Goldenberg et al. (2010); Fortunato (2010); Bickel et al. (2011); Choi et al. (2012), among others.

In this paper we focus on a specific example of latent positions graphs: the random dot product graph model (Young and Scheinerman, 2007), in which the graphon function is simply the dot product of two latent positions: . The random dot product graph model enjoys several nice properties. Firstly, the well-known stochastic block model, in which the vertices are grouped into several blocks, is a special case of the random dot product graph model and can be represented with the latent positions of vertices in the same block being identical. Secondly, the architecture of the random dot product graph is simple, as the expected value of the adjacency matrix is a symmetric low-rank matrix, motivating the use of a wide range of tractable spectral-based techniques for statistical analysis. Furthermore, the random dot product graph can provide accurate approximation to more general latent positions graphs when the dimension of the latent positions grows with the number of vertices at a certain rate (Tang et al., 2013). In addition, its modeling mechanism is convenient for interpretation: In a social network, the latent position for an individual could represent the amount of social activities he/she tends to join, and individuals that are more involved in the same activity are more likely to make acquaintance. In light of the structural simplicity and the approximation power of the random dot product graph model, it has become an object that worth studying by itself as well as a useful building block for inferences in more general latent positions graphs. For a thorough review of recent advances in statistical inference on the random dot product graph model, the readers are referred to Athreya et al. (2018a).

The techniques for statistical analysis of the random dot product graph model so far have been focusing on spectral methods based on the observed adjacency matrix or its graph Laplacian matrix. For example, the authors of Sussman et al. (2014) proposed to directly estimate the latent positions using the adjacency spectral embedding and proved its consistency. In Athreya et al. (2016) the authors further established the asymptotic distribution of the adjacency spectral embedding. For the normalized graph Laplacian matrix of the adjacency matrix, the authors of Tang and Priebe (2018) developed the asymptotic distribution of the Laplacian spectral embedding

, and made a thorough comparison between the adjacency spectral embedding and the Laplacian spectral embedding under various contexts. The well-developed theory for spectral methods for the random dot product graph model lays a theoretical foundation for a variety of subsequent inference tasks, including spectral clustering for stochastic block models

(Sussman et al., 2012; Lyzinski et al., 2014, 2017b), vertex classification and nomination (Sussman et al., 2014; Lyzinski et al., 2017a, b), nonparametric graph hypothesis testing (Tang et al., 2017a), multiple graph inference (Levin et al., 2017; Wang et al., 2017; Tang et al., 2017b), and manifold learning (Athreya et al., 2018b).

Despite the marvelous success of spectral methods for the random dot product graph model, it remains open whether these spectral estimators are minimax-optimal for estimating the latent positions with respect to suitable loss functions. Taking one more step back, a more fundamental question is: What is the minimax risk for estimating the latent positions, and how can one achieve it by constructing a useful estimator? In this paper we provide a detailed answer to this question. Unlike the aforementioned spectral-based approaches, we take advantage of the Bernoulli likelihood information of the observed graph adjacency matrix and design a fully likelihood-based Bayesian approach, referred to as the

posterior spectral embedding. Not only do we establish a minimax lower bound for estimating the latent positions, but we also show that this lower bound is achievable through the proposed estimation procedure. Specifically, we show that under mild conditions, for a wide class of prior distributions of the latent positions, the posterior spectral embedding both yields the rate-optimal contraction and produces a minimax-optimal point estimator for the latent positions. To the best of our knowledge, our work represents the first effort in the literature of the random dot product graph model that leverages a likelihood-based Bayesian approach with theoretical guarantee. In addition, as a sample application, we improve an existing result regarding clustering in stochastic block models by showing that the number of mis-clustered vertices can be reduced from (Sussman et al., 2012) to using the proposed posterior spectral embedding.

The rest of this paper is arranged as follows. In Section 2 we review the random dot product graph model and establish a minimax lower bound for estimating the latent positions. Section 3 elaborates on the proposed posterior spectral embedding for the random dot product graph model and its theoretical properties. The application to clustering in stochastic block models is discussed in Section 4. Section 5 presents the analysis of a spectral-based Gaussian spectral embedding approach that can be treated as a Bayesian analogy of the adjacency spectral embedding. We illustrate the proposed approach through extensive simulation studies and the analysis of a Wikipedia graph dataset in Section 6. Further discussion is provided in Section 7.

Notations: We use to denote the identity matrix. For an integer , and a -dimensional Euclidean vector , we use to denote its -norm, and when , . For a vector , the vector inequality represents for . For an matrix we use to denote the -dimensional vector formed by the th column of . For a positive integer , we denote the set of integers . For any two positive integers with , denotes the set of all orthogonal -frames in , i.e., , and when , we use the notation . The symbols and mean the corresponding inequality up to a constant, i.e., (, resp.) if () for some constant . We write if and . For a positive definite matrix , we use to denote its

th largest eigenvalue, and for any rectangular matrix

, we use to denote its

th largest singular value. We say that a sequence of events

occurs “almost always”, if .

## 2 Preliminaries

We first introduce the background of the random dot product graph model. Let the space of -dimensional latent positions be , where is the -norm of an Euclidean vector. Let be an matrix, where represent the latent positions of vertices in a graph. A symmetric random binary matrix is said to be the adjacency matrix of a random dot product graph with latent position matrix , denoted by

, if the random variables

independently, . Namely,

 p(Y∣X)=∏i≤j(xTixj)yij(1−xTixj)1−yij.
###### Example (Stochastic block model).

The most popular example of the random dot product graph model is the stochastic block model with a positive semidefinite block probability matrix. Formally, a symmetric random adjacency matrix is drawn from a stochastic block model with a block probability matrix and a block assignment function , denoted by , if the random variables independently for . Namely, vertices in the same block have the same connecting probability. When is positive semidefinite with rank (and hence implicitly ), there exists a matrix such that . By converting the block assignment function into an matrix , we obtain , and therefore, coincides with through the reparametrization . The positive semidefinite stochastic block model will be revisited in Section 4.

###### Remark 1 (Intrinsic non-identifiability).

We remark that the latent position matrix cannot be uniquely determined by the distribution , i.e.,

is not identifiable. In fact, for any orthogonal matrix

, the two distributions and are identical, since for any , . In addition, any -dimensional random dot product graph model can be embedded into a -dimensional random dot product graph model for any , in the sense that there exists a -dimensional latent position matrix , such that the two distributions and are identical. The latter source of non-identifiability, however, can be eliminated by requiring the columns of being linearly independent.

###### Remark 2 (Choice of orthogonal transformation and loss function).

Since the latent position matrix can only be identified up to an orthogonal transformation, one needs to properly rotate any estimator to align with the underlying true . The alignment matrix can be found by the solution to the orthogonal Procrustes problem , where the infimum ranges over the set of all orthogonal matrices in (Athreya et al., 2018a). In particular, has a closed-form expression. Consequently, in this work we consider the loss function

 LF(ˆX,X)=1ninfW∈O(d)∥ˆX−XW∥2F=infW∈O(d)1nn∑i=1∥ˆxi−WTxi∥22,

where . This loss function can also be interpreted as the average error of the estimated latent positions of all vertices after the appropriate orthogonal alignment.

Since the adjacency matrix can be viewed as the sum of a low-rank signal matrix and a noise matrix , the elements of which are centered Bernoulli random variables ( independently for ), the authors of Sussman et al. (2014) argued for embedding the adjacency matrix into by solving the least-squared problem . The resulting estimator is referred to as the adjacency spectral embedding of (Sussman et al., 2012) and denoted by . Theoretical properties of the adjacency spectral embedding have been explored (Sussman et al., 2012; Lyzinski et al., 2014, 2017b). Notably, the following convergence result of was established in Sussman et al. (2014).

###### Theorem 1 (Sussman et al., 2014).

Suppose for some and for some positive definite with distinct eigenvalues as . Assume that there exists such that and . Then with probability greater than ,

 1ninfW∈O(d)∥ˆXASE−XW∥2F≤12d2lognδ3n. (1)

Theorem 1 implies that after an orthogonal alignment of towards , the adjacency spectral embedding yields a convergence rate for arbitrary ( should be interpreted as a sequence converging to arbitrarily slowly). Nevertheless, as will be seen in Section 3, this rate is sub-optimal and, interestingly, can be improved by a Bayes estimator instead. Furthermore, it is unclear what is the minimax risk for estimating the latent position matrix with respect to the loss and how to construct a estimator to achieve the minimax rate, which we will address in this paper. The distinct eigenvalues condition will also be relaxed in Section 3. We begin approaching our main goal by first establishing the following minimax lower bound.

###### Theorem 2.

Let for some , . Assume that is fixed and does not change with . Let be an estimator of the latent position matrix satisfying with probability one. Then

 infˆXsupX∈XnEX{1ninfW∈O(d)∥ˆX−XW∥2F}≳1n. (2)

The above minimax lower bound does not necessarily result in a minimax rate of convergence for estimating the latent positions. Nevertheless, if we assume the existence of an estimator with (which will be rigorously proved in Section 3), then simply applying Markov’s inequality yields for arbitrary sequence . This observation suggests that the convergence rate derived in Sussman et al. (2014) for the adjacency spectral embedding might be sub-optimal and motivates us to pursue an estimator achieving the minimax lower bound (2).

## 3 A Likelihood-based Posterior Spectral Embedding

Although it is intuitive and computationally convenient to directly estimate the latent position matrix by the popular spectral-based approaches (e.g., the adjacency spectral embedding), the Bernoulli likelihood information of the adjacency matrix is neglected. On the other hand, likelihood-based methods for the random dot product graph model remain under-explored. In particular, neither the existence nor the uniqueness of the maximum likelihood estimator for is addressed. In this section, we develop a Bayesian approach for estimating the latent positions by taking advantage of the Bernoulli likelihood information.

Recall that the space of the latent positions is . Let be the true latent position matrix, and be the latent position matrix to be assigned a prior distribution . Whenever we consider the distribution ,

is treated as a random matrix taking values in the space

. The prior distribution on is constructed by assuming that independently follow a distribution with a density function supported on , and we denote it by . In this work we only require to be bounded away from and over (e.g.

, the uniform distribution on

). It follows directly from the Bayes formula that the posterior distribution of is

 Π(X∈A∣Y)=Nn(A)Dn,where Nn(A)=∫A∏i≤jp(yij∣X)p(yij∣X0)Π(dX),Dn=Nn(X),

and , for any measurable set . Clearly, the posterior distribution of incorporates the Bernoulli likelihood information through the Bayes formula, and we refer to as the posterior spectral embedding of .

The following theorem, which is the key result of this work, shows that under mild regularity conditions, the posterior contraction of the latent positions is minimax-optimal. The proof is deferred to the supplementary material.

###### Theorem 3.

Let for some , and the prior be described as above. Assume that as for some positive definite . If is fixed (i.e., does not change with ), and for some constant independent of , then there exist some large constants depending on and the prior , such that

 E0{Π(1n∥XXT−X0XT0∥F>M1√n∣∣Y)} ≤8exp(−12nd), E0{Π(1ninfW∈O(d)∥X−X0W∥2F>M2n∣∣Y)} ≤8exp(−12nd)

for sufficiently large .

###### Remark 3.

The assumption as in Theorem 3 can be equivalently written as as for some positive definite . An intuitive interpretation of this condition is that the true latent positions

can be regarded as “random” samples drawn from some non-degenerate distribution with a positive definite second-moment matrix

. By the law of large numbers, the “sample” version of the second-moment matrix “converges” to the “population” version of the second-moment matrix. An illustrative example is the stochastic block model: Suppose the distinct latent positions of

are , and let be the number of vertices corresponding to the latent position . Assume that is fixed, as , and ’s, ’s are fixed (do not vary with ), . Then

as . Therefore, with the above assumption, the stochastic block model satisfies this condition provided that is positive definite.

Theorem 3 claims that under appropriate regularity conditions, the posterior spectral embedding yields a rate-optimal posterior contraction for the latent positions in the Bayesian sense. The following theorem shows that one can use the posterior spectral embedding to construct a (frequentist) estimator that exactly achieves the minimax lower bound (2).

###### Theorem 4.

Let the conditions in Theorem 3 hold, and let constant be given by Theorem 3. Consider the posterior mean of the edge probability matrix

 ˜P=∫X∈XnXXTΠ(dX∣Y).

Suppose yields spectral decomposition , where are eigenvalues of arranged in non-increasing order, and

are the associated eigenvectors. Let

, , , and be the left-singular vector matrix of . Then for sufficiently large ,

 E0{1ninfW∈O(d)∥ˆX−X0W∥2F} ≲1n. (3)

Furthermore, for sufficiently large ,

 P0(infW∈O(d)∥ˆU−U0W∥2F>128M21dλ2d(Δ)n) ≤2exp(−14M1d√n). (4)

We briefly compare the results of Theorem 4 with those in Sussman et al. (2014). The convergence rate (3) shows that not only achieves the minimax lower bound (2), but also yields a convergence rate for any , improving the rate (1) obtained in Sussman et al. (2014). The convergence rate of the unscaled eigenvectors given by (4) also improves its counterpart in Sussman et al. (2014), which is explained as follows. Denote the left-singular vector matrix of , and that of . Then in Sussman et al. (2014) the authors show that under the assumptions of Theorem 1, there exists a diagonal matrix , the diagonal entries of which are either or , such that

 P0(∥(ˆUASE)∗k−(WU0)∗k∥22>3lognδ2n)≤2(d2+1)n2 (5)

for . In contrast, the eigenvector estimate derived using the posterior spectral embedding improves the convergence rate (5): Not only do we improve the rate from to , but we also sharpen the large deviation probability from to for some constant . The distinct eigenvalues condition for required in Sussman et al. (2014) is also relaxed.

## 4 Application: Clustering in Stochastic Block Models

This section presents an application of the posterior spectral embedding to clustering in stochastic block models. In particular, we show that the result obtained in this section strengthens an existing result related to the number of mis-clustered vertices. In preparation for doing so, let us first review the -means clustering procedure in general (see, for example, Lloyd, 1982). Suppose that data points in are to be assigned into clusters, and denote the corresponding data matrix. The -means clustering centroids of , represented by an matrix with distinct rows, are given by

The corresponding cluster assignment function is defined to be any function such that if and only if . Given two cluster assignment functions and , the Hamming distance between and is defined by . To avoid the labeling issue, we use as the measurement for clustering performance, where is the set of all permutations in .

A clustering procedure for stochastic block models is called consistent if the resulting fraction of mis-clustered vertices is asymptotically zero. Consistent clustering procedure in stochastic block models have been investigated in earlier work, including likelihood-based methods (Choi et al., 2012), spectral clustering based on the Laplacian spectral embedding (Rohe et al., 2011), -means clustering based on the adjacency spectral embedding (Sussman et al., 2012), and modularity maximization (Girvan and Newman, 2002), among others. In particular, the authors of Sussman et al. (2012) argue that by directly applying the -means procedure to the adjacency spectral embedding (i.e., replacing the aforementioned by ), the number of mis-clustered vertices can be upper bounded by . In what follows we show that this result can be strengthened by taking advantage of the -convergence rate of the posterior spectral embedding.

Our method for clustering is straightforward: Similar to the -means clustering based on , we directly apply the -means clustering procedure to the posterior samples collected from the posterior spectral embedding. Specifically, for each realization drawn from the posterior spectral embedding, we obtain a cluster assignment function by applying the aforementioned -means clustering procedure to . This results in a posterior distribution of the cluster assignment function , which is induced from the map and the posterior spectral embedding . The below theorem shows that we can recover the clustering structure through the -means procedure even when we assume that the working model is the random dot product graph model, which is more general than the stochastic block model.

###### Theorem 5.

Assume the conditions in Theorem 3 hold, and let the constants be provided by Theorem 3. Further assume that has distinct rows for some , they satisfy for some , and as for all . Then for sufficiently large ,

 E0{Π(infσ∈SKdH(σ∘τ(⋅;X0),τ(⋅;X))≥16M22ξ2∣∣Y)}≤8exp(−12nd).

Let be the left-singular vector matrix of defined in Theorem 4, and be that of . Then it almost always holds that

 infσ∈SKdH(σ∘τ(⋅;ˆU),τ(⋅;U0))≤16ξ2{8M1√2dλd(Δ)}2.
###### Remark 4.

In Sussman et al. (2012), the authors directly apply the -means clustering procedure to the , and show that almost always. Namely, the number of vertices that are incorrectly clustered is eventually. The results obtained in Theorem 5 is stronger, since it shows that this number can be further reduced to in the following two senses: If the

-means clustering procedure is applied to the posterior samples drawn from the posterior spectral embedding, then with posterior probability tending to one in

-probability, the posterior number of mis-clustered vertices is upper bounded by a constant; If the -means clustering procedure is directly applied to the unscaled left-singular vector of the point estimator obtained in Theorem 4, then it almost always holds that this number can be upper bounded by a constant as well.

## 5 A Spectral-based Gaussian Spectral Embedding

We have seen in Sections 3 and 4 the advantages of the posterior spectral embedding over the adjacency spectral embedding for the random dot product graph model. The major difference is that the posterior spectral embedding is a fully likelihood-based approach taking the Bernoulli likelihood information into account, while the adjacency spectral embedding only leverages the low-rank structure of the expected value of the adjacency matrix . Recall that the adjacency spectral embedding is the solution to the minimization problem . Equivalently, we can also view as the maximum likelihood estimator of using a Gaussian likelihood function:

 ˆXASE=argminX∈Rn×d∥Y−XXT∥2F=argmaxX∈Rn×dn∑i=1n∑j=1{−12log(2π)−12(yij−xTixj)2}.

The above maximum likelihood interpretation of the adjacency spectral embedding through the Gaussian likelihood function motivates us to study a Bayesian analogy of the adjacency spectral embedding, referred to as the Gaussian spectral embedding, introduced as follows.

Assume that is some prior distribution on the latent position matrix . We consider the following pseudo-posterior distribution by taking the Gaussian density as the working model:

 ΠG(X∈A∣Y)=NGn(A)DGn,where NGn(A)=∫A∏i,j∈[n]ϕ(yij−xTixj)ϕ(yij−xT0ix0j)ΠG(dX),DGn=Nn(Rn×d), (6)

for any measurable set , where is the density function of . The formulation of (6) is completely based on the spectral property of and , and does not incorporate the Bernoulli likelihood information. We refer to the (pseudo) posterior distribution (6) as the Gaussian spectral embedding of . Observe that when

 ΠG(dX)=n∏i=1(1√2πσ2)dexp(−xTixi2σ2)dxi (7)

for some , the maximum a posteriori estimator of (6) is the same as the solution to the minimization problem . In particular, when (corresponding to a non-informative flat prior), the maximum a posteriori estimator of (6) coincides with the adjacency spectral embedding

. Therefore, one can heuristically view the Gaussian spectral embedding defined through (

6) as a direct Bayesian analogy of the adjacency spectral embedding.

###### Remark 5 (Generality of the Gaussian spectral embedding).

Recall that the random dot product graph model can be alternatively regarded as a low-rank matrix model: for some low-rank matrix and some noise matrix . Note that in the formulation of the Gaussian spectral embedding, we do not constrain the latent positions to lie in the space , and do not assume a parametric form for the distribution of the entries of . Namely, the Gaussian spectral embedding (6) is well-defined not only for the random dot product graph model, but also for a more general class of low-rank matrix models. In the theoretical analysis below, we also assume that the sampling model for is a more general low-rank matrix model for some , and the entries of are only required to be sub-Gaussian.

###### Theorem 6.

Let be a symmetric random matrix with being independent, and let for some , where . Assume that as for some positive definite , and the entries of are sub-Gaussian, i.e., there exists some constant , such that for all with , and all , . Then there exist some and a constant only depending on and , such that for sufficiently large ,

 E{ΠG(1ninfW∈O(d)∥X−X0W∥2F>Mdlognn∣∣Y)}≤14exp(−CτM2nlogn).

On one hand, when the sampling model is restricted to the random dot product graph model, the posterior contraction rate for the latent positions under the Gaussian spectral embedding is slower than the optimal rate with an extra logarithmic factor, while the posterior spectral embedding yields a rate-optimal contraction. On the other hand, the Gaussian spectral embedding can be applied to more general low-rank matrix models, while the posterior spectral embedding is specifically designed for the random dot product graph model. In addition, the posterior spectral embedding requires the latent positions to lie in the space

. Such a restriction could potentially lead to a cumbersome Markov chain Monte Carlo sampler for posterior inference. In contrast, the Gaussian spectral embedding has no constraint on the latent positions, making the corresponding posterior computation relatively convenient.

## 6 Numerical Examples

In this section, we evaluate the performance of the proposed posterior spectral embedding in comparison with the spectral-based Gaussian/adjacency spectral embedding through synthetic examples and the analysis of a Wikipedia graph dataset. For each of the numerical setups, the posterior inferences for the posterior spectral embedding and the Gaussian spectral embedding are carried out through a standard Metropolis-Hastings sampler with iterations, where the first iterations are discarded as burn-in, and post-burn-in samples are collected every iterations. Throughout this section, the prior distribution on the latent positions is set to be the uniform distribution for the posterior spectral embedding, and the Gaussian prior in (7) with for the Gaussian spectral embedding.

### 6.1 Stochastic Block Models

We first consider stochastic block models with positive semidefinite block probability matrices. Three simulation setups are considered, and the number of communities and the unique values of their latent positions are tabulated in Table 1. In each simulation setup, the numbers of vertices in different clusters are drawn from a multinomial distribution with the probability vector .

For the posterior spectral embedding, we compute the point estimator given in Theorem 4. A point estimator for the Gaussian spectral embedding is also obtained in a similar fashion. Note that although the data generating models are stochastic block models, the posterior inferences are performed under the more general random dot product graph models as the working models. We perform the subsequent clustering based on the -means procedure, as described in Section 4.

In Rand (1971) the author suggested using the Rand index to evaluate the performance of clustering. Specifically, given two partitions and of (i.e., for , ’s are disjoint and their union is ), denote the number of pairs of elements in that are both in the same set in and in the same set in , and the number of pairs in that are neither in the same set in nor in the same set in . Then the Rand index is defined as . The Rand index is a quantity between and , with a higher value suggesting better accordance between the two partitions. In particular, when and are identical up to relabeling, .

The comparisons of the Rand indices and the embedding errors for the three embedding approaches are tabulated in Table 2 and Table 3, respectively. We see that the point estimates of the posterior spectral embedding are superior than the other two competitors in terms of higher Rand indices and lower embedding errors, whereas the point estimates of the Gaussian spectral embedding perform the worst in all three setups. All the three embedding approaches perform better as the number of vertices increases. In particular, the Gaussian spectral embedding does not produce satisfactory results when and , but performs decently well when .

We also visualize the three embeddings of the observed adjacency matrix for the three setups in Figures 1, 2, and 3, respectively. The estimation errors of the point estimates under the Gaussian spectral embedding can be clearly recognized from the figures when and . We also observe that for the underlying true latent position when , the adjacency spectral embedding and the point estimator of the Gaussian spectral embedding produce estimates that may stay outside the latent position space , whereas the point estimates of the posterior spectral embedding always lie the space . This agrees with the fact that the posterior spectral embedding requires the latent positions to stay inside , whereas the Gaussian spectral embedding and the adjacency spectral embedding do not have such constraints.

### 6.2 A Hardy-Weinberg Curve Example

We next consider the following Hardy-Weinberg curve example presented in Athreya et al. (2018b). Specifically, the observed adjacency matrix is drawn from the random dot product graph model with a latent position matrix , where and . The latent positions ’s are drawn from the Hardy-Weinberg curve as follows: , where are independently drawn from . The latent positions ’s can also be viewed as random samples drawn from the one-dimensional Hardy-Weinberg curve , , as depicted in Panel (a) of Figure 4. We plot the embeddings of the observed adjacency matrix under the three approaches in panels (b), (c), and (d) of Figure 4, showing that the point estimates of the posterior spectral embedding produce embeddings of the latent positions that are closer to their true values than the other two competitors do. In particular, the point estimates of the Gaussian spectral embedding are not able to capture the shape of the Hardy-Weinberg curve. The embedding errors for the three embedding approaches are also presented in Table 4, which is in accordance with the aforementioned observation.

### 6.3 Wikipedia Graph Data

Our final example is the analysis of a Wikipedia graph dataset available at http://www.cis.jhu.edu/~parky/Data/data.html. Specifically, the dataset we consider consists of a network of articles that are within two hyperlinks of the article “Algebraic Geometry”, resulting in vertices. In addition, the articles involved are manually labeled as one of the following classes: People, Places, Dates, Things, Math, and Categories.

We first estimate the embedding dimension by an ad-hoc method: We examine the plot of the singular values of the observed adjacency matrix (see Figure 5), and directly locate an “elbow” that suggests a cut-off between the signal dimension and the noise dimension. For this Wikipedia dataset, the “elbow” is located at .

We then conduct the posterior inferences under the posterior spectral embedding, the Gaussian spectral embedding, along with the adjacency spectral embedding to obtain the estimates of the latent positions based on . To obtain the clustering results, we further apply the mclust package in R (Fraley et al., 2012) to these embedding estimates with , as discussed in Section 4, and compute their Rand indices with the manually labeled classes. The results are presented in Table 5, and we see that the point estimates of the posterior spectral embedding outperform the other two approaches.

## 7 Discussion

There are several potential extensions of the proposed methodology and the corresponding theory. Firstly, the framework we have considered so far are based on the fact that the observed adjacency matrix of the network are Bernoulli random variables (i.e., a unweighted network). It is also common to encounter weighted network data in a wide range of applications (Schein et al., 2016; Tang et al., 2017b)

. Our theory and method tailored for Bernoulli distributed unweighted adjacency matrix can be easily extended to weighted adjacency matrix, the elements of which typically follow distributions of more general forms. In particular, for a weighted adjacency matrix with a specific distribution, the posterior spectral embedding can be generalized similarly to accommodate the corresponding likelihood information. Alternatively, the Gaussian spectral embedding proposed in Section

5 can be applied when the elements of the weighted adjacency matrix are sub-Gaussian random variables after centering. Secondly, the latent positions of the vertices are considered as deterministic parameters to be estimated throughout this work. On the other hand, it is also useful to model the latent positions in the random dot product graph model as random variables independently sampled from an underlying distribution supported on (Tang et al., 2017a). We can directly apply the technique for estimating in this work to the case where ’s are random, but it requires more effort to explore the theoretical properties of the resulting estimator. Last but not least, we assume that the embedding dimension