Exponential error rates of SDP for block models: Beyond Grothendieck's inequality

05/23/2017 ∙ by Yingjie Fei, et al. ∙ cornell university 0

In this paper we consider the cluster estimation problem under the Stochastic Block Model. We show that the semidefinite programming (SDP) formulation for this problem achieves an error rate that decays exponentially in the signal-to-noise ratio. The error bound implies weak recovery in the sparse graph regime with bounded expected degrees, as well as exact recovery in the dense regime. An immediate corollary of our results yields error bounds under the Censored Block Model. Moreover, these error bounds are robust, continuing to hold under heterogeneous edge probabilities and a form of the so-called monotone attack. Significantly, this error rate is achieved by the SDP solution itself without any further pre- or post-processing, and improves upon existing polynomially-decaying error bounds proved using the Grothendieck' s inequality. Our analysis has two key ingredients: (i) showing that the graph has a well-behaved spectrum, even in the sparse regime, after discounting an exponentially small number of edges, and (ii) an order-statistics argument that governs the final error rate. Both arguments highlight the implicit regularization effect of the SDP formulation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper, we consider the cluster/community111The words cluster and community are used interchangeably in this paper. estimation problem under the Stochastic Block Model (SBM) [33] with a growing number of clusters. In this model, a set of nodes are partitioned into unknown clusters of equal size; a random graph is generated by independently connecting each pair of nodes with probability  if they are in the same cluster, and with probability  otherwise. Given one realization of the graph represented by its adjacency matrix , the goal is to estimate the underlying clusters.

Much recent progress has been made on this problem, particularly in identifying the sharp thresholds for exact/weak recovery when there are a few communities of size linear in . Moving beyond this regime, however, the understanding of the problem is much more limited , especially in characterizing its behaviors with a growing (in ) number of clusters with sublinear sizes, and how the cluster errors depend on the signal-to-noise ratio (SNR) in between the exact and weak recovery regimes [1, 44]. We focus on precisely these questions.

Let the ground-truth clusters be encoded by the cluster matrix defined by

We consider a now standard semidefinite programming (SDP) formulation for estimating the ground-truth :

(1)
s.t.

where is the all-one matrix and denotes the trace inner product. We seek to characterize the accuracy of the SDP solution as an estimator of the true clustering. Our main focus is the error , where denotes the entry-wise norm. This error is a natural metric that measures a form of pairwise cluster/link errors. In particular, note that the matrix represents the pairwise cluster relationship between nodes; an estimator of such is given by the matrix obtained from rounding element-wise. The above error satisfies , and therefore upper bounds the number of pairs whose relationships are incorrectly estimated by the SDP.

In a seminal paper [29], Guédon and Vershynin exhibited a remarkable use of the Grothendieck’s inequality, and obtained the following high-probability error bound for the solution of the SDP:

(2)

where is a measure of the signal-to-noise ratio. This bound holds even in the sparse graph regime with , manifesting the power of the Grothendieck’s inequality.

In this paper, we go beyond the above results, and show that in fact satisfies (with high probability) the following exponentially-decaying error bound

(3)

as long as (Theorem 1). The bound is valid in both the sparse and dense regimes. Significantly, this error rate is achieved by the SDP (1) itself, without the need of a multi-step procedure, even though we are estimating a discrete structure by solving a continuous optimization problem. In particular, the SDP approach does not require pre-processing of graph (such as trimming and splitting) or an initial estimate of the clusters, nor any non-trivial post-processing of  (such as local cluster refinement or randomized rounding).

If an explicit clustering of the nodes is concerned, the result above also yields an error bound for estimating , the true cluster labels. In particular, an explicit cluster labeling can be obtained efficiently from . Let denote the fraction of nodes that are labeled differently by and (up to permutation of the labels). This mis-classification error can be shown to be upper bounded by the error , and therefore satisfies the same exponential bound (Theorem 2):

(4)

Specialized to different values of the errors, this single error bound (3) implies sufficient conditions for achieving exact recovery (strong consistency), almost exact recovery (weak consistency) and weak recovery; see Section 1.2 for the definitions of these recovery types. More generally, the above bound yields SNR conditions sufficient for achieving any error. As to be discussed in details in Section 3.1.1, these conditions are (at least order-wise) optimal, and improve upon existing results especially when the number of clusters is allowed to scale with . In addition, we prove that the above guarantees for SDP are robust against deviations from the standard SBM. The same exponential bounds continue to hold in the presence of heterogeneous edge probabilities as well as a form of monotone attack where an adversary can modify the graph (Theorem 3). In addition, we show that our results readily extend to the Censored Block Model, in which only partially observed data is available (Corollary 1).

In addition to improved error bounds, our results also involve the development of several new analytical techniques, as are discussed below. We expect these techniques to be more broadly useful in the analysis of SDP and other algorithms for SBM and related statistical problems.

1.1 Technical highlights

Our analysis of the SDP formulation builds on two key ingredients. The first argument involves showing that the graph can be partitioned into two components, one with a well-behaved spectrum, and the other with an exponentially small number of edges; cf. Proposition 2. Note that this partitioning is done in the analysis, rather than in the algorithm. It ensures that the SDP produces a useful solution all the way down to the sparse regime with . The second ingredient is an order-statistics argument that characterizes the interplay between the error matrix and the randomness in the graph; cf. Proposition 1. Upper bounds on the sum of the top order statistics are what ultimately dictate the exponential decay of the error. In both arguments, we make crucial use of the entry-wise boundedness of the SDP solution , which is a manifest of the implicit regularization effect of the SDP formulation.

Our results are non-asymptotic in nature, valid for finite values of ; letting gives asymptotic results. All other parameters and are allowed to scale arbitrarily with . In particular, the number of clusters may grow with , the clusters may have size sublinear in , and the edge probabilities and may range from the sparse case to the densest case . Our results therefore provide a general characterization of the relationship between the SNR, the cluster sizes and the recovery errors. This is particularly important in the regime of sublinear cluster sizes, in which case all values of and

are of interest. The price of such generality is that we do not seek to obtain optimal values of the multiplicative constants in the error bounds, doing which typically requires asymptotic analysis with scaling restrictions on the parameters. In this sense, our results complement recent work on the fundamental limits and sharp recovery thresholds of SBM 

[1].

1.2 Related work

The SBM [33, 13], also known as the planted partition model in the computer science community, is a standard model for studying community detection and graph clustering. There is a large body of work on the theoretical and algorithmic aspects of this model; see for example [20, 1, 45, 6] and the references therein. Here we only briefly discuss the most relevant work, and defer to Section 3 for a more detailed comparison after stating our main theorems.

Existing work distinguishes between several types of recovery [1, 23], including: (a) weak recovery, where the fraction of mis-clustered nodes satisfies and is hence better than random guess; (b) almost exact recovery (weak consistency), where ; (c) exact recovery (strong consistency), where . The SDP relaxation approach to SBM has been studied in [10, 9, 18, 20, 49, 36, 17, 19], which mostly focus on exact recovery in the logarithmic-degree regime . Using the Grothendieck’s inequality, the work in [29] proves for the first time that SDP achieves a non-trivial error bound in the sparse regime with bounded expected degrees. In the two-cluster case, it is further shown in [43] that SDP in fact achieves the optimal weak recovery threshold as long as the expected degree is large (but still bounded). Our single error bound implies exact and weak recovery in the logarithmic and bounded degree regimes, respectively. Our result in fact goes beyond these existing ones and applies to every setting in between the two extreme regimes, capturing the exponential decay of error rates from to zero.

A very recent line of research aims to precisely characterize the fundamental limits and phase transition behaviors of SBM — in particular, what are the sharp SNR thresholds (including the leading constants) for achieving the different recovery types discussed above. When the number

of clusters is bounded, many of these questions now have satisfactory answers. Without exhausting this still growing line of remarkable work, we would like to refer to the papers [45, 41, 5, 46, 47] for weak recovery, [48, 4, 10, 26, 55] for almost exact recovery, and [48, 3, 4] for exact recovery. SDP has in fact been shown to achieve the optimal exact recovery threshold [31, 50, 11, 7]

. Our results imply sufficient conditions for SDP achieving these various types of recovery, and moreover interpolate between them. As mentioned, we are mostly concerned with the non-asymptotic setting with a growing number of clusters, without attempting to optimize the values of the leading constants. Therefore, our results focus on somewhat different regimes from the work above.

Particularly relevant to us is the work in [21, 55, 56, 4, 26, 29, 40], which provides explicit bounds on the error rates of other algorithms for estimating the ground-truth clustering in SBM. The Censored Block Model is studied in the papers [2, 30, 32, 21, 37, 53]. Robustness issues in SBM are considered in the work in [15, 25, 24, 36, 50, 51, 22, 42, 40]. We discuss these results in more details in Section 3.

1.3 Notations

Column vectors are denoted by lower-case bold letters such as

, where is its -th entry. Matrices are denoted by bold capital letters such as , with denoting the transpose of , its trace, its -th entry, and the vector of its diagonal entries. For a matrix , is its entry-wise norm, the entry-wise norm, and

the spectral norm (the maximum singular value). Denote by

the -th row of the matrix and its -th column. We write if is symmetric and positive semidefinite. With another matrix of the same dimension as , we let denote their trace inner product, and use to mean that for all . Let and be the identity matrix and all-one matrix, respectively, and the all-one column vector of length .

We use

to denote the Bernoulli distribution with rate

. For a positive integer , let . For a real number , denotes its ceiling. Throughout a universal constant  means a fixed number that is independent of the model parameters (, , , , etc.) and the graph distribution. We use the following standard notations for order comparison of two non-negative sequences and : We write , or if there exists a universal constant such that for all . We write or if and . We write or if .

2 Problem setup

In this section, we formally set up the problem of cluster estimation under SBM and describe the SDP approach.

2.1 The Stochastic Block Model

Given nodes, we assume that each node belongs to exactly one of ground truth clusters, where the clusters have equal size. This ground truth is encoded in the cluster matrix as defined in Section 1. We do not know , but we observe the adjacency matrix of a graph generated from the following Stochastic Block Model (SBM).

Model 1 (Standard Stochastic Block Model).

The graph adjacency matrix is symmetric with its entries generated independently by

where

The values of the diagonal entries of are inconsequential for the SDP formulation (1) due to the constraint . Therefore, we assume without loss of generality that independently for all , which simplifies the presentation of the analysis

Our goal is to estimate given the observed graph . Let be the vector of ground-truth cluster labels, where is the index of the cluster that contains node . (The cluster labels are unique only up to permutation; here is defined with respect to an arbitrary permutation.) Playing a crucial role in our results is the quantity

(5)

which is a measure of the SNR of the model. In particular, the numerator of

is the squared expected difference between the in- and cross-cluster edge probabilities, and the denominator is essentially the average variance of the entries of

. The quantity has been shown to capture the hardness of SBM, and defines the celebrated Kesten-Stigum threshold [47]. To avoid cluttered notation, we assume throughout the paper that , and there exists a universal constant that ; this setting encompasses most interesting regimes of the problem, as clustering is more challenging when is large.

2.2 Semidefinite programming relaxation

We consider the SDP formulation in (1), whose optimal solution serves as an estimator of ground-truth cluster matrix . This SDP can be interpreted as a convex relaxation of the maximum likelihood estimator, the modularity maximization problem, the optimal subgraph/cut problem, or a variant of the robust/sparse PCA problem; see [10, 13, 18, 20, 17] for such derivations. Our goal is to study the recovery error of in terms of the number of nodes , the number of clusters and the SNR measure  defined above.

Note that there is nothing special about the particular formulation in (1). All our results apply to, for example, the alternative SDP formulation below:

(6)
s.t.

This formulation was previously considered in [20]. We may also replace the third constraint above with the row-wise constraints , akin to the formulation in [10] motivated by weak assortative SBM. Under the standard assumption of equal-sized clusters, the values and are known. Therefore, the formulation (6) has the advantage that it does not require knowledge of the edge probabilities and , but instead the number of clusters .222Note that the constraint in the formulations (1) and (6) is in fact redundant, as it is implied by the constraints and . We still keep this constraint as the property plays a crucial role in our analysis.

The optimization problems in (1) and (6) can be solved in polynomial time using any general-purpose SDP solvers or first order algorithms. Moreover, this SDP approach continues to motivate, and benefit from, the rapid development of efficient algorithms for solving structured SDPs. For example, the algorithms considered in [34, 51] can solve a problem involving nodes within seconds on a laptop. In addition to computational efficiency, the SDP approach also enjoys several other desired properties including robustness, conceptual simplicity and applicability to sparse graphs, making it an attractive option among other clustering and community detection algorithms. The empirical performance of SDP has been extensively studied, both under SBM and with real data; see for example the work in [10, 17, 19, 34, 51]. Here we focus on the theoretical guarantees of this SDP approach.

2.3 Explicit clustering by -medians

After solving the SDP formulations (1) or (6), the cluster membership can be extracted from the solution . This can be done using many simple procedures. For example, when , simply rounding the entries of will exactly recover , from which the true clusters can be extracted easily. In the case with

clusters, one may use the signs of the entries of first eigenvector of

, a procedure analyzed in [29, 43]. More generally, our theoretical results guarantee that the SDP solution is already close to true cluster matrix ; in this case, we expect that many local rounding/refinement procedures, such as Lloyd’s-style greedy algorithms [39], will be able to extract a high-quality clustering.

For the sake of retaining focus on the SDP formulation, we choose not to separately analyze these possible extraction procedures, but instead consider a more unified approach. In particular, we view the rows of as points in , and apply -medians clustering to them to extract the clusters. While exactly solving the -medians problem is computationally hard, there exist polynomial-time constant-factor approximation schemes, such as the -approximation algorithm in [16], which suffices for our purpose. Note that this algorithm may not be the most efficient way to extract an explicit clustering from ; rather, it is intended as a simple venue for deriving a clustering error bound that can be readily compared with existing results.

Formally, we use to denote a -approximate -median procedure applied to the rows of ; the details are provided in Section A. The output

is a vector in such that node is assigned to the -th cluster by the procedure. We are interested in bounding the clustering error of relative to the ground truth . Let denote the symmetric group consisting of all permutations of ; we consider the metric

(7)

which is the proportion of nodes that are mis-classified, modulo permutations of the cluster labels.


Before proceeding, we briefly mention several possible extensions of the setting discussed above. The number in the SDP (1) can be replaced by a tuning parameter ; as would become evident from the proof, our theoretical results in fact hold for an entire range of values, for example . Our theory also generalizes to the setting with unequal cluster sizes; in this case the same theoretical guarantees hold with replaced by , where is any lower bound of the cluster sizes.

3 Main results

We present in Section 3.1 our main theorems, which provide exponentially-decaying error bounds for the SDP formulation under SBM. We also discuss the consequences of our results, including their implications for robustness in Section 3.2 and applications to the Censored Block Model in Section 3.3. In the sequel, denotes any optimal solution to the SDP formulation in either (1) or (6).

3.1 Error rates under standard SBM

In this section, we consider the standard SBM setting in Model 1. Recall that and are respectively the numbers of nodes and clusters, and is the ground-truth cluster matrix defined in Section 1 with being the corresponding vector of true cluster labels. Our results are stated in terms of the SNR measure given in equation (5).

The first theorem, proved in Section 4, shows that the SDP solution achieves an exponential error rate.

Theorem 1 (Exponential Error Rate).

Under Model 1, there exist universal constants for which the following holds. If , then we have

with probability at least .

Our next result concerns the explicit clustering extracting from , using the approximate -medians procedure given in Section 2.3, where . As we show in the proof of the following theorem, the error rate in is always upper-bounded by the error in :

cf. Proposition 3. Consequently, the number of misclassified nodes also exhibits an exponential decay.

Theorem 2 (Clustering Error).

Under Model 1, there exist universal constants for which the following holds. If , then we have

with probability at least .

We prove this theorem in Section D.

Theorems 1 and 2 are applicable in the sparse graph regime with bounded expected degrees. For example, suppose that , and for two constants ; the results above guarantee a non-trivial accuracy for SDP (i.e., or ) as long as for some constant . Another interesting regime that our results apply to, is when there is a large number of clusters. For example, for any constant , if and , then SDP achieves exact recovery () provided that .

Below we provide additional discussion of our results, and compare with existing ones.

3.1.1 Consequences and Optimality

Theorems 1 and 2 immediately imply sufficient conditions for the various recovery types discussed in Section 1.2.

  • Exact recovery (strong consistency): When , Theorem 1 guarantees that with high probability, in which case element-wise rounding exactly recovers the true cluster matrix .333In fact, a simple modification of our analysis proves that . We omit the details of such refinement for the sake of a more streamlined presentation. This result matches the best known exact recovery guarantees for SDP (and other polynomial-time algorithms) when is allowed to grow with ; see [20, 10] for a review of theses results.

  • Almost exact recovery (weak consistency): Under the condition , Theorem 2 ensures that with high probability as , hence SDP achieves weak consistency. This condition is optimal (necessary and sufficient), as has been proved in [48, 4].

  • Weak recovery: When , Theorem 2 ensures that with high probability, hence SDP achieves weak recovery. In particular, in the setting with clusters, SDP recovers a clustering that is positively correlated with the ground-truth under the condition . This condition matches up to constants the so-called Kesten-Stigum (KS) threshold , which is known to be optimal [45, 41, 5, 46, 47].

  • Recovery with error: More generally, for any number , Theorem 2 implies that if , then with high probability. In the case with , the minimax rate result in [26] implies that is necessary for any algorithm to achieve a clustering error. Our results are thus optimal up to a multiplicative constant.

Our results therefore cover these different recovery regimes by a unified error bound, using a single algorithm. This can be contrasted with the previous error bound (2) proved using the Grothendieck’s inequality approach, which fails to identify the exact recovery condition above. In particular, the bound (2) decays polynomially with the SNR measure ; since is at most and , the smallest possible error that can be derived from this bound is .

Our results apply to general values of , which is allowed to scale with , hence the size of the clusters can be sublinear in . We note that in this regime, a computational-barrier phenomenon seems to take place: there may exist instances of SBM in which cluster recovery is information-theoretically possible but cannot be achieved by computationally efficient algorithms. For example, the work in [20] proves that the intractable maximum likelihood estimator succeeds in exact recovery when ; it also provides evidences suggesting that all efficient algorithms fail unless . Note that the latter is consistent with the condition derived above from our theorems.

The above discussion has the following implications for the optimality of Theorems 1 and 2. On the one hand, the general minimax rate result in [26] suggests that all algorithms (regardless of their computational complexity) incur at least error. Our exponential error rate matches this information-theoretic lower bound. On the other hand, in view of the computational barrier discussed in the last paragraph, our SNR condition is likely to be unimprovable if efficient algorithms are considered.

3.1.2 Comparison with existing results

We discuss some prior work that also provides efficient algorithms attaining an exponentially-decaying rate for the clustering error

. To be clear, these algorithms are very different from ours, often involving a two-step procedure that first computes an accurate initial estimate (typically by spectral clustering) followed by a “clean-up” process to obtain the final solution. Some of them require additional steps of sample splitting and graph trimming/regularization. As we discussed in Section 

3.2 below, many of these procedures rely on delicate properties of the standard SBM, and therefore are not robust against model deviation.

Most relevant to us is the work in [21], which develops a spectral algorithm with sample splitting. As stated in their main theorem, their algorithm achieves the error rate when , as long as is a fixed constant when . The work in [55] and [56] also considers spectral algorithms, which attain exponential error rates assuming that is a constant and . The algorithms in [26, 27] involves obtaining an initial clustering using spectral algorithms, which require ; a post-processing step (e.g., using a Lloyd’s-style algorithm [39]) then outputs a final solution that asymptotically achieves the minimax error rate where is an appropriate form of Renyi divergence and satisfies . The work in [4] proposes an efficient algorithm called Sphere Comparison, which achieves an exponential error rate in the constant degree regime when The work [40] uses SDP to produce an initial clustering solution to be fed to another clustering algorithm; their analysis extends the techniques in [29] to the setting with corrupted observations, and their overall algorithm attains an exponential error rate assuming that .

3.2 Robustness

Compared to other clustering algorithms, one notable advantage of the SDP approach lies in its robustness under various challenging settings of SBM. For example, standard spectral clustering is known to be inconsistent in the sparse graph regime with due to the existence of atypical node degrees, and alleviating this difficulty generally requires sophisticated algorithmic techniques. In contrast, as shown in Theorem 1 as well as other recent work [43, 29, 17], the SDP approach is applicable without change to this sparse regime. SDP is also robust against the existence of outlier nodes and/or edge modifications, while standard spectral clustering is fairly fragile in these settings [15, 50, 42, 51, 43, 40].

Here we focus on another remarkable form of robustness enjoyed by SDP with respect to heterogeneous edge probabilities and monotone attack, captured in the following generalization of the standard SBM.

Model 2 (Heterogeneous Stochastic Block Model).

Given the ground-truth clustering (encoded in the cluster matrix ), the entries of the graph adjacency matrix are generated independently with

where .

The above model imposes no constraint on the edge probabilities besides the upper/lower bounds, and in particular the probabilities can be non-uniform. This model encompasses a variant of the so-called monotone attack studied extensively in the computer science literature [24, 36, 22]: here an adversary can arbitrarily set some edge probabilities to or , which is equivalent to adding edges to nodes in the same cluster and removing edges across clusters.444We do note that here the addition/removal of edges are determined before the realization of the random edge connections, which is more restrictive than the standard monotone attack model. We believe this restriction is an artifact of the analysis, and leave further improvements to future work. Note that the adversary can make far more than edge modifications — to be precise — in a restrictive way that seems to strengthen the clustering structure (hence the name). Monotone attack however does not necessarily make the clustering problem easier. On the contrary, the adversary can significantly alter some predictable structures that arise in standard SBM (such as the graph spectrum, node degrees, subgraph counts and the non-existence of dense spots [24]), and hence foil algorithms that over-exploit such structures. For example, some spectral algorithms provably fail in this setting [22, 51]. More generally, Model 2 allows for unpredictable, non-random deviations (not necessarily due to an adversary) from the standard SBM setting, which has statistical properties that are rarely possessed by real world graphs.

It is straightforward to show that when exact recovery is concerned, SDP is unaffected by the heterogeneity in Model 2; see [24, 31, 19]. The following theorem, proved in Section 5, shows that SDP in fact achieves the same exponential error rates in the presence of heterogeneity.

Theorem 3 (Robustness).

The conclusions in Theorems 1 and 2 continue to hold under Model 2.

Consequently, under the same conditions discussed in Section 3.1.1, the SDP approach achieves exact recovery, almost exact recovery, weak recovery and a -error in the more general Model 2.

As a passing note, the results in [42] show that when exact constant values are concerned, the optimal weak recovery threshold changes in the presence of monotone attack, and there may exist a fundamental tradeoff between optimal recovery in standard SBM and robustness against model deviation.

3.3 Censored block model

The Censored Block Model [2] is a variant of the standard SBM that represents the scenario with partially observed data, akin to the settings of matrix completion [35] and graph clustering with measurement budgets [20]. In this section, we show that Theorems 1 and 2 immediately yield recovery guarantees for the SDP formulation under this model.

Concretely, again assume a ground-truth set of equal-size clusters over nodes, with the corresponding label vector . These clusters can be encoded by the cluster matrix as defined in Section 1, but it is more convenient to work with its version . Under the Censored Block Model, one observes the entries of restricted to the edges of an Erdos-Renyi graph , but with each entry flipped with probability . The model is described formally below.

Model 3 (Censored Block Model).

The observed matrix is symmetric and has entries generated independently across all with

where and .

The goal is again to recovery (equivalently ), given the observed matrix .

One may reduce this problem to the standard SBM by constructing an adjacency matrix with ; that is, we zero out the unobserved entries in the binary representation of . The upper-triangular entries of are independent Bernoulli variables with

Therefore, the matrix can be viewed as generated from the standard SBM (Model 1) with and . We can then obtain an estimate of by solving the SDP formulation (1) or (6) with as the input, possibly followed by the approximate -medians procedure to get an explicit clustering . The error rates of and can be derived as a corollary of Theorems 1 and 2.

Corollary 1 (Censored Block Model).

Under Model 3, there exist universal constants for which the following holds. If then

with probability at least .

Specializing this corollary to the different types of recovery defined in Section 1.2, we immediately obtain the following sufficient conditions for SDP under the Censored Block Model: (a) exact recovery is achieved when ; (b) almost exact recovery is achieved when , (c) weak recovery is achieved when ; (d) a clustering error is achieved when .

Several existing results focus on the Censored Block Model with clusters in the asymptotic regime . In this setting, the work in [2] proves that exact recovery is possible if and only if in the limit , and provides an SDP-based algorithm that succeeds twice the above threshold; a more precise threshold is given in [30]. For weak recovery under a sparse graph , it is conjectured in [32] that the problem is solvable if and only if . The converse and achievability parts of the conjecture are proved in [37] and [53], respectively. Corollary 1 shows that SDP achieves (up to constants) the above exact and weak recovery thresholds; moreover, our results apply to the more general setting with clusters.

4 Proof of Theorem 1

In this section we prove our main theoretical results in Theorem 1 for Model 1 (Standard SBM). While Model 1 is a special case of Model 2 (Heterogeneous SBM), we choose not to deduce Theorem 1 as a corollary of Theorem 3 which concerns the more general model. Instead, to highlight the main ideas of the analysis and avoid technicalities, we provide a separate proof of Theorem 1. In Section 5 to follow, we show how to adapt the proof to Model 2.

Before going into the details, we make a few observations that simplify the proof. First note that it suffices to prove the theorem for the first SDP formulation (1). Indeed, the ground-truth matrix is also feasible to second formulation (6); moreover, thanks to the equality constraint , subtracting the constant-valued term from the objective of (6) does not affect its optimal solutions. The two formulations are therefore identical except for the above equality constraint, which is never used in the proof below. Secondly, under the assumption for a universal constant , we have . Therefore, it suffices to prove the theorem with the SNR redefined as , which only affects the universal constant in the exponent of the error bound. Thirdly, it is in fact sufficient to prove the bound

(8)

Suppose that this bound holds; under the premise of the theorem with the constant sufficiently large, we have hence the RHS of the bound (8) is at most

which implies the error bound in theorem statement again up to a change in the universal constant . Finally, we define the convenient shorthands for the error and for the cluster size, which will be used throughout the proof.

Our proof begins with a basic inequality using optimality. Since is feasible to the SDP (1) and is optimal, we have

(9)

A simple observation is that the entries of the matrix have matching signs with those of . This observation implies the following relationship between the first term on the RHS of equation (9) and the error .

Fact 1.

We have the inequality

(10)

The proof of this fact is deferred to Section C.1. Taking this fact as given and combining with the inequality (9), we obtain that

(11)

To bound the error , it suffices to control the RHS of equation (11), where we depart from existing analysis. The seminal work in [29] bounds the RHS by a direct application of the Grothendieck’s inequality. As we discuss below, this argument fails to expose the fast, exponential decay of the error . Our analysis develops a more precise bound. To describe our approach, some additional notation is needed. Let be the matrix of the left singular vectors of . Define the projection and its orthogonal complement for any . Our crucial observation is that we should control by separating the contributions from two projected components of defined by and . In particular, we rewrite the inequality (11) as

(12)

The first term involves the component of that is “aligned” with ; in particular, is the orthogonal projection onto the subspace spanned by matrices with the same column or row space as . The second terms involves the orthogonal component , whose column and row spaces are orthogonal to those of . The main steps of our analysis consist of bounding and separately.

The following proposition bounds the term and is proved in Section 4.2 to follow.

Proposition 1.

Under the conditions of Theorem 1, with probability at least , at least one of the following inequalities hold:

(13)

where .

Our next proposition, proved in Section 4.3 to follow, controls the term .

Proposition 2.

Under the conditions of Theorem 1, with probability , at least one of the following inequalities hold:

(14)

where is a universal constant.

Equipped with these two propositions, the desired bound (8) follows easily. If the first inequality in the two propositions holds, then we are done. Otherwise, there must hold the inequalities (13) and (14), which can be plugged into the RHS of equation (12) to get

Under the premise of Theorem 1, we know that , whence

Doing some algebra yields the inequality , so the desired bound (8) again holds.

The rest of this section is devoted to establishing Propositions 1 and 2. Before proceeding to the proofs, we remark on the above arguments and contrast them with alternative approaches.

Comparison with the Grothendieck’s inequality approach: The arguments in the work [29] also begin with a version of the inequality (11), and proceed by observing that

(15)

where step follows from the triangle inequality and the feasibility of and . Therefore, this argument reduces the problem to bounding the RHS of (15), which can be done using the celebrated Grothendieck’s inequality. One can already see at this point that this approach yields sub-optimal bounds. For example, SDP is known to achieve exact recovery () under certain conditions, yet the inequality (15) can never guarantee a zero . Sub-optimality arises in step : the quantity diminishes when is small, but the triangle inequality and the worse-case bound used in  are too crude to capture such behaviors. In comparison, our proof takes advantage of the structures of the error matrix and its interplay with the noise matrix .

Bounding the term: A common approach involves using the generalized Holder’s inequality . Under SBM, one can show that with high probability, hence yielding the bound . Variants of this approach are in fact common (sometimes implicitly) in the proofs of exact recovery for SDP [8, 18, 10, 31, 20]. However, when (where exact recovery is impossible), applying this bound for to the inequality (12) would yield a vacuous bound for . In comparison, Proposition 1 gives a strictly sharper bound (13), which correctly characterizes the behaviors of beyond the exact recovery regime.

Bounding the term: Note that since , we have the equality . It is easy to show that the matrix is positive semidefinite and has diagonal entries at most (cf. Fact 3). Therefore, one may again attempt to control using the Grothendieck’s inequality, which would yield the bound for some function whose exact form is not important for now. The bound (14) in Proposition 2 is much stronger — it depends on , which is in turn proportional to the trace of the matrix (cf. Fact 2).

4.1 Preliminaries and additional notation

Recall that is the matrix of the left singular vectors of . We observe that if node is in cluster and otherwise. Therefore, is a block diagonal matrix with all entries inside each diagonal block equal to .

Define the “noise” matrix . The matrix is symmetric, which introduces some minor dependency among its entries. To handle this, we let be the matrix obtained from with its entries in the lower triangular part set to zero. Note that , and has independent entries (with zero entries considered ). Similarly, we define as the upper triangular part of the adjacency matrix .

In the proof we frequently use the inequalities . Consequently, the assumption implies that . We also record an elementary inequality that is used multiple times.

Lemma 1.

For any number , there exists a number such that if , then

Proof.

Note that As long as is sufficiently large, we have . These inequalities imply that

Multiplying both sides by yields the claimed inequality. ∎

Finally, we need a simple pilot bound, which ensures that the SDP solution satisfies a non-trivial error bound .

Lemma 2.

Under Model 1, if then we have

with probability at least , where . In particular, if , we have with high probability with

This theorem is a variation of Theorem 1.3 in [29], and a special case of Theorem 1 in [17] applied to the non-degree-corrected setting. For completeness we provide the proof in Section B. The proof uses the Grothendieck’s inequality, an approach pioneered in [29].

4.2 Proof of Proposition 1

In this section we prove Proposition 1, which controls the quantity . Using the symmetry of and the cyclic invariance of the trace, we obtain the identity

It follows that

(16)

Note that since