# Local Statistics, Semidefinite Programming, and Community Detection

We propose a new hierarchy of semidefinite programming relaxations for inference problems, inspired by recent ideas of `pseudocalibration' in the Sum-of-Squares literature. As a test case, we consider the problem of community detection in a distribution of random regular graphs we'll call the Degree Regular Block Model, wherein the vertices are partitioned into k communities, and a graph is sampled conditional on a prescribed number of inter- and intra-community edges. The problem of detection, where we are to decide with high probability whether a graph was drawn from this model or the uniform distribution on regular graphs, is conjectured to undergo a computational phase transition at a point called the Kesten-Stigum (KS) threshold, and we show (i) that sufficiently high constant levels of our hierarchy can perform detection arbitrarily close to this point, (ii) that our algorithm is robust to o(n) adversarial edge perturbations, and (iii) that below Kesten-Stigum no level constant level can do so. In the more-studied case of the (irregular) Stochastic Block Model, it is known that efficient algorithms exist all the way down to this threshold, although none are robust to adversarial perturbations of the graph when the average degree is small. More importantly, there is little complexity-theoretic evidence that detection is hard below Kesten-Stigum. In the DRBM with more than two groups, it has not to our knowledge been proven that any algorithm succeeds down to the KS threshold, let alone that one can do so robustly, and there is a similar dearth of evidence for hardness below this point. Our SDP hierarchy is highly general and applicable to a wide range of hypothesis testing problems.

## Authors

• 7 publications
• 12 publications
• 12 publications
• ### Bayesian estimation from few samples: community detection and related problems

We propose an efficient meta-algorithm for Bayesian estimation problems ...
09/30/2017 ∙ by Samuel B. Hopkins, et al. ∙ 0

• ### Robustness of spectral methods for community detection

The present work is concerned with community detection. Specifically, we...
11/14/2018 ∙ by Ludovic Stephan, et al. ∙ 0

• ### Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs

We study the problem of efficiently refuting the k-colorability of a gra...
08/27/2020 ∙ by Afonso S. Bandeira, et al. ∙ 0

• ### How Robust are Reconstruction Thresholds for Community Detection?

The stochastic block model is one of the oldest and most ubiquitous mode...
11/04/2015 ∙ by Ankur Moitra, et al. ∙ 0

• ### A Thorough View of Exact Inference in Graphs from the Degree-4 Sum-of-Squares Hierarchy

Performing inference in graphs is a common task within several machine l...
02/16/2021 ∙ by Kevin Bello, et al. ∙ 0

• ### Performance of a community detection algorithm based on semidefinite programming

The problem of detecting communities in a graph is maybe one the most st...
03/30/2016 ∙ by Adel Javanmard, et al. ∙ 0

• ### Computational Lower Bounds for Community Detection on Random Graphs

This paper studies the problem of detecting the presence of a small dens...
06/25/2014 ∙ by Bruce Hajek, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Community detection in graphs

is a canonical and widely applicable problem in computer science and machine learning. The setup is both simple and flexible: we are shown a graph and asked for a coarse-grained description in the form of a partition of the vertices into ‘communities’ with atypically many internal edges. The literature contains innumerable algorithms and approaches for this task, but perhaps the most fruitful has been a Bayesian perspective wherein we treat the graph as the output of some generative model, whose unknown parameters we attempt to estimate. In other words, we assume that there are some true and hidden community labels, and that the graph has been drawn probibalistically in a way that respects this ‘planted’ structure.

Much of the existing literature on community detection concerns the stochastic block model (SBM). For now let us discuss the symmetric setting where we first partition vertices in to groups, and include each edge independently and with probability or depending on whether or not the labels of its endpoints coincide. Research in this area spans several decades, and it will not be fruitful to attempt a thorough review of the literature here; we refer the reader to [Abb17]

for a survey. Most salient to us, however, is a rich theory of computational threshold phenomena which has emerged out of the past several years of collaboration between computer scientists, statisticians, and statistical physicists.

The key computational tasks associated with the SBM are recovery and detection: we attempt either to reconstruct the planted communities from the graph, or to decide whether a graph was drawn from the planted model or the Erdős-Rényi model with the same average degree. A set of fascinating conjectures were posed in Decelle et al.[DKMZ11], regarding these tasks in the case of ‘sparse’ models where and the average degree is as the number of vertices diverges.

It is typical to parametrize the symmetric SBM in terms of , the average degree

 d=npin+(k−1)npoutk,

and a ‘signal-to-noise ratio

 λ≜npin−npoutkd.

In this setup, it is believed that as we hold and constant, then there is an information-theoretic threshold , in the sense that when both detection and recovery are impossible for any algorithm. Moreover, Decelle et al. conjecture that efficient algorithms for both tasks exist only when the degree is larger than a point known as the Kesten-Stigum threshold . Much of this picture is now rigorous [MNS18, Mas14, BLM15, ABH16]. Still, fundamental questions remain unanswered. What evidence can we furnish that detection and recovery are indeed intractible in the so-called ‘hard regime’ ? How robust are these thresholds to adversarial noise or small deviations from the model?

Zooming out, this discrepancy between information-theoretic and computational thresholds is conjectured to be quite universal among planted problems, where we are to reconstruct or detect a structured, high-dimensional signal observed through a noisy channel [citaions]. The purpose behind our work is to begin developing a framework capable of providing evidence for average case computational intractability in such settings. To illustrate this broader motivation, consider a different average-case problem also conjectured to be computationally intractable: refutation of random -SAT. A random instance of -SAT with literals and, say clauses is unsatisfiable with high probability. However, it is widely conjectured that the problem of certifying that a given random -SAT instance is unsatisfiable is computationally intractable (all the way up to clauses) [Fei02]. While proving intractability remains out of reach, the complexity theoretic literature now contains ample evidence in support of this conjecture. Most prominently, exponential lower bounds are known for the problem in restricted computational models such as linear and semidefinite programs [Gri01] and resolution based proofs [BSW01]

. Within the context of combinatorial optimization, the Sum-of-Squares (SoS) SDPs yield a hierarchy of sucessively more powerful and complex algorithms which capture and unify many other known approaches. A lower bound against the SoS SDP hierarchy such as

[Gri01] provides strong evidence that this refutation problem is computationally intractable. This paper is a step towards developing a similar framework to reason about the computational complexity of detection and recovery in stochastic block models specifically, and planted problems generally.

A second motivation is the issue of robustness of computational thresholds under adversarial perturbations of the graph. Spectral algorithms based on non-backtracking walk matrix [BLM15] achieve weak-detection as soon as , but are not robust in this sense. Conversely, robust algorithms for recovery are known, but only when the edge-densities are significantly higher than Kesten-Stigum [GV16, MMV16, CSV17, SVC16]. The positive result that gets closest to robustly achieving the conjectured computational phase transition at is the work of Montanari and Sen [MS15] who observe that their SDP-based algorithm for testing whether the input graph comes from the Erdős-Rényi distribution or a Stochastic Block Model with communities also works in presence of

edge outlier errors. On the negative side, Moitra et al.

[Moi12] consider the problem of weak recovery in a SBM with two communities and in the presence of monotone errors that add edges within communities and delete edges between them. Their main result is a statistical lower bound indicating the phase transition for weak recovery changes in the presence of monotone errors. This still leaves open the question of whether there exist algorithms that weakly recover right at the threshold and are robust to perturbations in the graph.

## 2 Main Results

We define a new hierarchy of semidefinite programming relaxations for inference problems that we refer to as the Local Statistics hierarchy, denoted and indexed by parameters . This family of SDPs is inspired by the technique of pseudocalibration in proving lower bounds for sum-of-squares (SoS) relaxations, as well as subsequent work of Hopkins and Steurer [HS17] extending it to an SoS SDP based approach to inference problems. The

hierarchy can be defined for a broad range of inference problems involving a joint distribution

on an observation and hidden parameter.

As a test case, we apply our SDP relaxations to community detection in the Degree Regular Block Model (DRBM), a family of distributions over degree regular graphs with planted community structure. The degree-regularity will simplify some aspects of our analysis, allowing us to illustrate key features of the hierarchy without a proliferation of technicalities. We will comment later on about the possibilities for extension to the irregular case. As an aside, we cannot help but editorialize briefly that, although the DRBM is less useful in practice than the standard block model discussed above, its combinatorics are intricate and beautiful in their own right, and the related case of -regular graphs with planted colorings have been quite well-studied t.

We will specify the DRBM on vertices in full generality by several parameters: the number of communities , degree , and a transition matrix

for a reversible Markov chain, with stationary distribution

. In other words, has row sums equal to one, and is a symmetric matrix. To sample a graph —we will use bold-face type for random objects throughout the paperh—first partition the vertices randomly into groups with , and then choose a -regular random graph conditional on there being edges between groups and internal to each group . As is symmetric, this process is well-defined. We will assume always that the parameters are set to make these quantities integer-valued; settings for which this holds infinitely often as are dense in the parameter space.

###### Remark 2.1.

The DRBM as we have defined it differs from the Regular Stochastic Block Model of [BDG16], in which each vertex has a prescribed number of neighbors in every community. Although superficially similar, the behavior of this ‘equitable’ model (as it is known in the physics literature [NM14]) is quite different from ours. For instance, [BDG16] show that whenever detection is possible, one can recover the community labels exactly. This is not true in our case.

The DRBM contains several more familiar distributions as special cases, and the reader is welcome to focus on her favorite for concreteness. When for every , we have the DRBM with equal groups. Setting and , we are in a somewhat restrictive case of the planted -coloring model, where each pair of color classes has the same number of edges between them. We will refer to the case when and otherwise as the symmetric DRBM. As

describes a reversible Markov chain, its spectrum is real, and we will write its eigenvalues as

. The second eigenvalue can be thought of as a kind of signal-to-noise ratio, and will be repeatedly important to our analysis. One can verify, for instance, that in the case of the symmetric DRBM, .

It is widely believed that the threshold behavior of the DRBM is similar to that of the SBM, though the inhomogeneities in group size and edge density we allow for make the situation somewhat more complicated than in the symmetric case discussed earlier. This phenomenology includes an information-theoretic threshold for the symmetric DRBM (and a more complicated characterization in general that will not be relevant to us here). In the general model, the Kesten-Stigum threshold for detection is , and we expect recovery of all communities once . However, most formal treatment in the literature has been limited to the distribution of -regular graphs conditional on having a planted -coloring, a case not fully captured by our model. Characterization of the information-theoretic threshold, even for the symmetric DRBM remains largely folklore, and in Appendix [ref] we will for good measure provide a few rigorous pieces of the picture.

Our main theorem is that the the Local Statistics hierarchy can robustly solve the detection problem on the DRBM whenever , but that otherwise any constant level fails to do so.

###### Theorem 2.2.

For every , and set of parameters satisfying , there exists sufficiently large so that with probability the SDP, given an input graph , can distinguish in time [need] whether

• is a uniformly random -regular graph

• is sampled from the DRBM with parameters

and is robust to adversarial addition or deletion of edges. On the other hand, for any constant and , the SDP fails with probability to distinguish.

We also prove a stronger robustness guarantee, in particular that that can tolerate adversarial edge perturbations, although as we move up the hierarchy. This creates a trade-off between robustness, which we lose as added information is incorporated to the SDP at each successive level, and fidelity to the threshold, which we approach as .

###### Theorem 2.3.

For every , there exists and sufficiently large, so that even given a graph which is a -perturbation of the edges of some , can be used to distinguish whether is a uniformly random -regular graph or was drawn from a DRBM -away from the threshold.

Along the way we will inadvertently prove that standard spectral detection using the adjacency matrix succeeds above , but cannot have the same robustness guarantee. It is a now-classic result of Friedman that, with probability , the spectrum of a uniformly random -regular graph is within of . Conversely, we show:

###### Corollary 2.4.

Let be drawn from the DRBM with parameters satisfying . There exists some such that, for each eigenvalue of satisfying , the adjacency matrix is guaranteed one eigenvalue satisfying .

Regrettably, we do not resolve to similar satisfaction the issue of efficient or robust recovery above Kesten-Stigum. However, in Appendix A we will reduce some central aspects of this issue to the following conjecture regarding the spectrum of for drawn from the planted model.

###### Conjecture 2.5.

Let be any DRBM with . Then, for any , with high probability, has only eigenvalues with modulus larger than .

We will discuss in Appendix A

that, conditional on this conjecture (or even a weaker version in which we are guaranteed only constantly many eigenvalues outside the bulk), (i) the span of the corresponding eigenvectors is correlated to the community structure, and (ii) the Local Statistics hierarchy can robustly produce vectors with macroscopic correlation to this span. From weak convergence of the empirical spectral distribution of

to the Kesten-McKay law, we know that there must be eigenvalues with modulus larger than , it will take substantial technical work to push this down to . We believe the most feasible approach is a careful mirror of the techniques in [BLM15], but the execution of this is beyond the scope of this paper. These issues and a related conjecture are discussed in LABEL:mossel2015reconstruction in the context of the DRBM with two groups.

#### Related Work.

Semidefinite programming approaches have been most studied in the dense, irregular case, where exact recovery is possible (for instance [ABH16, AS15]), and it has been shown that an SDP relaxation can achieve the information-theoretically optimal threshold [HWX16]. However, in the sparse regime we consider, the power of SDP relaxations for weak recovery remains unclear. Guedon and Vershynin [GV16] show upper bounds on the estimation error of a standard SDP relaxation in the sparse, two-community case of the SBM, but only when the degree is roughly times the information theoretic threshold. More recently, in a tour-de-force, Montanari and Sen [MS15] showed that for two communities, the SDP of Guedon and Vershynin achieves the information theoretically optimal threshold for large but constant degree, in the sense that the performance approaches the threshold if we send the number of vertices, and then the degree, to infinity. Semi-random graph models have been intensively studied in [BS95, FK00, FK01, CO04, KV06, CO07, MMV12, CJSX14, GV16] and we refer the reader to [MMV16] for a more detailed survey. In the logarithmic-degree regime, robust algorithms for community detection are developed in [CL15, KK10, AS12]. Far less is known in the case of regular graphs.

## 3 Technical Overview

Denote by the uniform distribution on -vertex -regular graphs, and write the DRBM. We will use bold face font for random objects sampled from these distributions. Because we care only about the case when the number of vertices is very large, we will use with high probability (w.h.p) to describe any sequence of events with probability in or as . We will write , and in general use the letters to refer to elements of and for elements of

. The identity matrix will be denoted by

, and we will write for the transpose of a matrix , for the standard matrix inner product, and for the associated Frobenius norm. Positive semidefiniteness will be indicated with the symbol . The standard basis vectors will be denoted , the all-ones vector written as , and the all-ones matrix as . Finally, let be the function extracting the diagonal of a matrix, and be the one which populates the nonzero elements of a diagonal matrix with the vector it is given as input.

### 3.1 Detection, Refutation, and Sum-of-Squares

We will begin the discussion of the Local Statistics algorithm by briefly recalling Sum-of-Squares programming. Say we have a constraint satisfaction problem presented as a system of polynomial equations in variables that we are to simultaneously satisfy. In other words, we are given a set

 S={x∈Rn:f1(x),...,fm(x)=0}

and we need to decide if it is non-empty. Whenever the problem is satisfiable, any probability distribution supported on

gives rise to an operator mapping a polynomial to its expectation. Trivially, obeys

 Normalization E1 =1 (1) Satisfaction of S Efi(x)⋅p(x) =0 ∀i∈[m],∀p∈R[x] (2) Positivity Ep(x)2 ≥0 ∀p∈R[x] (3)

In general, we will say that an operator mapping some subset of to the reals is normalized, satisfies , or is positive if it obeys (1), (2), or (3), respectively, on all polynomials in its domain.

Proving that , and thus that our problem is unsatisfiable, is equivalent to showing that no operator obeying (1)-(3) can exist. The key insight of SoS is that, at least sometimes, one can do this by focusing only on polynomials of some bounded degree. Writing for the polynomials of degree at most , we call an operator a degree- pseudoexpectation if it is normalized, and for every polynomial in its domain satisfies and is positive. It is well-known that one can search for a degree pseudoexpectation with a semidefinite program of size , and if this smaller, relaxed problem is infeasible, we’ve shown that is empty. This is the degree- Sum-of-Squares relaxation of our CSP.

A naive way to employ SoS for hypothesis testing or reconstruction problems such as community detection is to choose some statistic known to distinguish the planted and null distributions, and write down a relaxed sum-of-squares search algorithm for this statistic. In the case of the DRBM, a graph drawn from the planted model is guaranteed a partition of the vertices into groups of sizes , with edges between groups and . Let us refer to such a partition as -good

. A routine first moment calculation shows that when

is sufficiently large, uniformly random -regular graphs from the null distribution, , are exponentially unlikely to have an -good partition.

###### Proposition 3.1.

With probability (in fact, exponentially close to one) a graph from the null model has no -good partitions whenever

 d−1>H(π)+H(π,M)H(π)−H(π,M), (4)

where is the standard Shannon entropy, and is the average with respect to of the entropy of the rows of .

Thus we can solve detection in exponential time above this first moment threshold by exhaustively searching for even one -good division of the vertices. In other words, detection in this regime is no harder than refutation of an -good partition. This refutation problem can be encoded with variables , describing whether each vertex is in group , subject to the polynomial constraints

 Boolean x2u,i =xu,i ∀u∈[n] and i∈[k] Single Color ∑ixu,i =1 ∀u∈[n] Group size ∑uxu,i =π(i)n ∀i∈[k] M-good ∑(u,v)∈Exu,ixv,j =π(i)Mi,jdn ∀i,j∈[k]

It will be useful later to denote by the set described by the Boolean and Single Color equations above. Each level of the SoS Hierarchy, applied to the polynomial system described above, immediately gives us a one-sided detection algorithm: if given a graph the degree- SoS relaxation is infeasible, we can be sure that there are no -good partitions, and thus that graph came from the null model and not the planted one. However, as it is a relaxation, if this SDP is feasible we have not a priori learned anything at all. For a two-sided test we need to prove that with high probability there is no feasible solution for graphs drawn from the null model.

There are two fundamental limitations to this approach. First, statistics like existence of an -good partition are in some cases not optimal for differentiating the null and planted distributions. Consider for simplicity a less constrained version of the symmetric DRBM, where for a parameter we partition the vertices into equal sized groups, and sample a -regular graph conditional on there being edges among vertices in the same community, with the remaining connecting vertices in different groups. Both the information theoretic and Kesten-Stigum thresholds in this case occur when . Such graphs are guaranteed to have a maximum cut of at least , so we can distinguish the null and planted models for any making this larger than the maximum cut in a -regular random graph. However, we know from work of Dembo et al. [DMS17] that the maximum cut in -regular random graphs is, with high probability,

 (1+2P∗√d+od(√d))dn4+on(n),

where is twice the vaunted Parisi constant from statistical physics. Thus, when is large, the maximum cut cannot distinguish the null and planted distributions until roughly , i.e . This same phenomenon holds in the irregular SBM with two groups.

The second issue is that even in regimes where we know detection can be performed by exhaustive search for an -good partition, low-degree SoS relaxations of this search problem are known to fail. In the case of the symmetric DRBM, with , a similar first moment bound to the one above shows that at roughly the same threshold, random -regular graphs are exponentially unlikely to have any -way cut with the same total number of between-group edges as the hidden partition in the planted model. Banks et al. [BKM17] show that, for the degree-two SoS relaxation of -way cut, detection is only possible once : for smaller degree, when is sampled from the null model, there exists a feasible degree-two pseudoexpectation. A similar result for a slightly weaker SDP holds in the case of Erdős-Rényi graphs with planted -colorings [BT19].

This is not the only case where degree-two SoS for refutation does not succeed all the way down to the conjectured computational threshold for detection. Consider for instance the Rademacher-spiked Wigner model, where our goal is to distinguish whether an observed matrix is either (Null) an Wigner matrix , with and , or (Planted) of the form for some uniformly random hypercube vector . Results of Feral and Peche [FP07] tell us that detection is possible simply be examining the spectrum of , whenever , and Perry et al. [PWBM16] show that this is in fact the information-theoretic threshold. On the other hand, the planted model satisfies , so we can could try and solve detection by refuting the existence of a hypercube vector with a large quadratic form. Unfortunately, in the null model , degree-two SoS can only refute the existence of some satisfying [MS15]

. Bandeira et al. provide evidence, using ideas of Hopkins and Steurer regarding low-degree test statistics, that there is a fundamental computational barrier to outperforming degree-two SoS at this refutation task

[BKW19]; quite recently, [KB19] show that this gap persists for degree-four SoS, and conjecture that refutation of any smaller maximum is impossible for SoS of constant degree.

These results fit into a broader current in the literature probing the nature and origin of computational barriers in random refutation problems. In the preceding discussion, we were attempting to solve detection in the DRBM, for in the conjectured computationally feasible regime, by refuting the existence of some combinatorial structure in the observed graph. However, refutation is essentially a prior-free task! There are, at least potentially, many planted distributions for producing graphs with

-good partitions—just as there are many ways to produce a Gaussian random matrix whose maximum quadratic form over the hypercube is atypically large—and

they need not all have the same computational phase transition. The idea is that refutation in the null model is hard exactly when it would allow us to solve detection in the computationally hard or information-theoretically impossible regime of some ‘quietly’ planted distribution, whose low degree moments mimic those of the null model (see [BKW19], for example).

All of this is bad news for refutation, but not necessarily for detection. The problem of detection and the related one on reconstruction are in a Bayesian setting, where the prior distribution is completely specified. Yet, the semi-definite programs described above use little information from the prior distribution in their formulation. Why not include information about the prior distribution in our SDP?

### 3.2 The Local Statistics Hierarchy

Let us regard the planted model as a joint distribution on random variables

encoding the group labels, and indexed by and describing which edges of the graph are present. Instead of our somewhat ad-hoc SDP relaxing the problem of searching for an -good partition, we will try and find a pseudoexpectation on the variables which (i) satisfies —the Boolean and Single-Color constraints—and (ii) matches certain low-degree moments of the planted distribution. To a first approximation, we will add constraints of the form

 ˜Ep(G,x)≃E(G,x)∼Pp(G,x),

for a restricted class of polynomials in variables and . The exact meaning of will depend on the concentration of with respect to the randomness in and ; we will make it precise below.

The DRBM has a natural symmetry: we can freely permute the vertices, and the distribution is unchanged. This gives us an action of , the symmetric group on elements, on the random variables and describing our random graphs, and their non-random counterparts and appearing in the polynomials in the domain of . In particular, acts as and . It is only meaningful to consider polynomials in and that are fixed under this action; these roughly correspond to counting the instances of subgraphs of with vertices constrained to have particular labels. Note that unless we are in the case of the symmetric DRBM, the community labels do not have a similar symmetry.

Since the random variables are all zero-one indicators, we only need consider polynomials that are multilinear in . We claim that every such polynomial in fixed under this action, and with degrees ad in the and variables respectively, is of the following form. Let be a graph with at most edges, a designated subset of at most vertices, and a set of labels on these distinguished vertices. Write for the set of all injective homomorphisms , i.e. maps for which (1) for every distinct and (2) implies . The image of each is a copy of inside . For each, there is a corresponding polynomial

 pH,S,τ(x,G)=∑φ∈ΦH∏u∈Sxφ(u),τ(u), (5)

that counts occurrences in which conform, on the vertices in , to the labels specified by . One can check that these polynomials are a basis for the vector space of polynomials in fixed under the action above.

###### Definition 3.2.

The degree level of the Local Statistics hierarchy is the following SDP: find a degree- pseudoexpectation satisfying , such that

 ˜EpH,S,τ(x,G)≈E(x,G)∼PpH,S,τ(x,G) (6)

for every and .

Note that, among many new constraints that this this SDP imposes on , it recovers the conditions on group size and -good-ness from our earlier SoS relaxation, as

 ∑ixu,iand∑(u,v)∈Exu,ixv,j

are both of the form (5). We obtain the first when is the graph on one vertex with label , and the second when is a single edge, with endpoints labeled and .

###### Remark 3.3.

Although we have stated it in the specific context of the DRBM, the local statistics framework extends readily to any planted problem involving a joint distribution on pairs of a hidden structure and observed signal, if we take appropriate account of the natural symmetries in . For a broad range of such problems, including spiked random matrix models [AKJ18, PWBM16], compressed sensing [ZK16, Ran11, KGR11] and generalized linear models [BKM19] (to name only a few) there are conjectured computational thresholds where the underlying problem goes from being efficiently solvable to computationally intractable, and the algorithms which are proven or conjectured attain this threshold are often not robust. We hope that the local statistics hierarchy can be harnessed to design robust algorithms up to these computational thresholds, as well as to provide evidence for computational intractibility in the conjectured hard regime. The relation (if any) between the local statistics SDP hierarchy and iterative methods such as belief propagation or AMP is also worth investigating.

The remainder of the paper will be laid out as follows. In Section 4 we will collect some preliminary results, including several standard and useful observations on non-backtracking walks and reversible Markov chains. Section 5 contains the proof that our SDP can distinguish the null and planted models above the KS threshold, and Section 6 adapts this proof to show that spectral distinguishing is possible in this regime as well. In Section 7 we prove the other half of Theorem 2.2, namely that no constant level of our hierarchy succeeds below this threshold. Section 8 concerns the robustness guarantees of our algorithm. Finally, in Appendix B, we will perform several calculations on the DRBM, including the first moment bound of Proposition 3.1, and the explicit computation of the local statistics appearing in the LoSt hierarchy.

## 4 Preliminaries

### 4.1 Nonbacktracking Walks and Orthogonal Polynomials

The central tool in our proofs will be non-backtracking walks on —these are walks which on every step are forbidden from visiting the vertex they were at two steps previously. We will collect here some known results on these walks specific to the case of -regular graphs. Write for the matrix whose entry counts the number of length- non-backtracking walks between verties and in . One can check that the satisfy a two-term linear recurrence,

 A(0)G =1 A(1)G =AG A(2)G =A2G−d1 A(s)G =AA(s−1)G−(d−1)A(s−2)Gs>2,

since to enumerate non-backtracking walks of length , we can first extend each such walk of length in every possible way, and then remove those extensions that backtrack.

On -regular graphs, the above recurrence immediately shows that for a family of monic, scalar ‘non-backtracking polynomials’ , where . To avoid a collision of symbols, we will use as the variable in all univariate polynomials appearing in the paper. It is well known that these polynomials are an orthogonal polynomial sequence with respect to the Kesten-McKay measure

 dμ\textsckm(z)=12πd√d−1√4(d−1)−z2d2−z2dz1[|z|<2√d−1],

with its associated inner product

 ⟨f,g⟩\textsckm≜∫f(z)g(z)dμ\textsckm(z)

on the vector space of square integrable functions on . One can again check that

 ∥qs∥2\textsckm≜∫qs(z)2dμ\textsckm=qs(d)={1s=0d(d−1)s−1s≥1=1n(\# length-s n.b. % walks on G)

in the normalization we have chosen [ABLS07]. Thus any function in this vector space can be expanded as

 f=∑s≥0⟨f,qs⟩\textsckm∥qs∥2\textsckmqs.

We will also need the following lemma of Alon et al. [ABLS07, Lemma 2.3] bounding the size of the polynomials :

###### Lemma 4.1.

For any , there exists an such that for ,

 |qs(z)|≤2(s+1)∥qs∥\textsckm+ε.

The behavior of the non-backtracking polynomials with respect to the inner product idealizes that of the under the trace inner product. In particular, if

 ⟨A(s)G,A(t)G⟩=n⟨qs,qt⟩\textsckm={n(\# length-s n.b. walks on G)s=t0s≠t.

This is because the diagonal entries of count pairs of non-backtracking walks with length and respectively: if any such pair induces a cycle of length at most , or perhaps is a pair of identical walks in the case . Above the girth, if we can control the number of cycles, we can quantify how far the are from orthogonal in the trace inner product.

Luckily for us, sparse random graphs have very few cycles. To make this precise, call a vertex bad if it is at most steps from a cycle of length at most . These are exactly the vertices for which the diagonal entries of are nonzero, when .

###### Lemma 4.2.

For any constant and , with high probability any graph has at most bad vertices.

We will defer the proof of this lemma to the appendix, but two nice facts follow from it immediately. First, from the above discussion,

 ⟨A(s)G,A(t)G⟩=O(logn)

for any . The second useful corollary is more or less that in random graphs we can use non-backtracking walks as a proxy for self-avoiding ones.

###### Lemma 4.3.

Write for the matrix whose entry is a one exactly when and are connected by a self-avoiding walk of length . Then with high probability, for any graph ,

 ∥∥A⟨s⟩G−A(s)G∥∥2F=O(logn) (7)
###### Proof.

Every row of both and have norm , and they differ only in the rows corresponding to, say, the -bad vertices, of which there are here are only . ∎

### 4.2 Reversible Markov Chains

We will need standard fact about reversible Markov chains. Let us maintain the notation for , its eigenvalues , and its stationary distribution . Recall from above that , , and the reversibility condition on means is symmetric.

###### Lemma 4.4.

Let be the matrix of right eigenvectors, normalized so that the columns have unit norm (note that the first column of is, up to scaling, the all-ones vector). Then .

###### Proof.

First, reversibility tells us is symmetric, and thus by the spectral theorem that it satisfies

 Diag(π)1/2MDiag(π)1/2O=OΛ

for some orthogonal . It is readily seen that , so contains, up to scaling, the right eigenvectors of . ∎

### 4.3 Local Statistics in the Planted Model

The Local Statistics SDP that we are studying includes constraints that our pseudoexpectation match certain low-degree moments in the planted distribution. As we discussed in the technical overview, these correspond to the counts of partially labelled subgraphs in . To set some notation, a partially labelled graph is a graph , together with a distinguished subset of vertices , and a labelling of these distinguished vertices. We’ll say a graph is unlabeled or fully labelled if or , and in these cases abuse notation and simply refer to or respectively. At times it will also be useful to refer to graphs with distinguished vertices, but no labelling; we will write these as . An occurrence of a partially labelled graph in a fully labelled one is an injective homomorphism , that agrees on labels, i.e. vertices in are mapped to ones in with the same label.

The low-degree moment constraints in are exactly the counts of occurrences of partially labelled subgraphs in a graph , for which has at most edges and distinguished vertices. The following theorem characterizes these counts in any planted model; we will discuss it briefly below and remit the proof to the appendix.

###### Definition 4.5.

Let be a connected graph on edges, with distinguished vertices . Define to be the number of occurrences of in an infinite -regular tree in which some vertex in is mapped to the root. If , choose some distinguished vertex arbitrarily—the count will be the same no matter which one is chosen; we will at times use as shorthand in this case. Finally, if has connected components, take . We note for later use that if contains a cycle, , and if it is a path of length with endpoints distinguished, , the number of vertices at depth in the tree.

###### Theorem 4.6 (Local Statistics).

If is a partially labelled graph with edges, then in any planted model ,

1. If is unlabelled, i.e. , then

2. If is labelled, with , , and , then

 n−ℓEpH,S,τ(x,G)→π(i)Mdist(α,β)i,jCH,S,d,

and enjoys concentration up to an additive . We say that if these two vertices lie in disjoint components of , and we interpret .

###### Remark 4.7.

In our Local Statistics SDP 3.2, we promised to formalize the symbol appearing in the affine moment-matching constraints on the pseudoexpectation; let’s do so now. Throughout the paper, fix a very small error tolerance , and write to mean “equal up to ”. Then the constraint for each partially labelled subgraph with connected components should read . We will write instead of whenever there is no chance for confusion. Finally, because we have defined our model quite rigidly, whenever consists of a single vertex with label , . Similarly when consists of two distinguished vertices with labels respectively,

 pH,S,τ(x,G)={π(i)π(j)n2i≠jπ(i)2n2−π(i)ni=j

and the moment-matching constraints in our SDP will accordingly include instead of .

Let’s take a moment and get a feel for Theorem 4.6. As a warm-up, consider the case when is a path of length with the endpoints labelled as , and we simply need to count the number of pairs of vertices in with labels and respectively that are connected by a path of length . As -regular random graphs from models like have very few short cycles, assume for simplicity that the girth is in fact much larger than , so that the depth- neighborhood about every vertex is a tree. If we start from a vertex and follow a uniformly random edge, the parameter matrix from our model says that, on average at least, the probability of arriving at a vertex in group is roughly , and similarly if we take (non-backtracking) steps, this probability is roughly . There are starting vertices in group , and vertices at distance from any such vertex.

If is a tree in which the two distinguished vertices are at distance , then we can enumerate occurrences of in by first choosing the image of the path connecting these two, and then counting the ways to place the remaining vertices. If we again assume that the girth is sufficiently large, it isn’t too hard to see that the number of ways to do this second step is a constant independent of the number of ways to place the path, so we’ve reduced to the case above. The idea for the cases is similar. We’ll prove Theorem 4.6 in Appendix B.1.

## 5 Distinguishing with Local Statistics

Throughout this section, fix the parameters of a planted model . We’ll prove half of our main theorem, namely that for any , if

 d>dKS+ϵ=1+1λ22+ϵ

then there exists some so that the SDP can distinguish the planted and null models. When , the SDP is surely feasible as we can simply set

 ˜Ep(x,G)=p(x,G)

for any polynomial we choose. We will thus be done if we can show infeasibility when is above the KS threshold, is sufficiently large, and . Our strategy will be to first reduce to the problem of designing a univariate polynomial with particular properties, and then to solve this design problem using some elementary results from Section 4.

Let , and assume we had a viable pseudoexptation for the SDP. Write for the matrix whose entry is (it is routine that implies positive semidefiniteness of ). It will at times be useful to think of as a matrix of blocks , and at others as an matrix of blocks . Recall also the matrices from Section 4 that count self-avoiding walks of length . Our strategy will be to first write the moment-matching constraints on as affine constraints of the form , and then combine these affine constraints to contradict feasibility of .

###### Lemma 5.1.

For any , and any , recalling that is the matrix counting non-backtracking walks of length , and is the all-ones matrix,

 ⟨Xi,j,A(s)G⟩ ≃π(i)Msi,j∥qs∥2\textsckmn ⟨Xi,j,J⟩ =π(i)π(j)n2.
###### Proof.

For the first assertion, let be the path of length whose endpoints are labelled . Each self-avoiding walk of length in is an occurrence of , so from Theorem 4.6

 ⟨Xi,j,A⟨s⟩G⟩=˜EpH,S,τ(x,G)≃π(i)Msi,j∥qs∥2\textsckm.

We can now use Lemma 4.3 to replace the self-avoiding walk matrices with their non-backtracking counterparts. The matrix has diagonal elements by the Boolean constraint, and by the Single Color constraint. By PSD-ness of , every is nonnegative, so each is between zero and one. It is a standard fact that the off-diagonal entries of such a PSD matrix have magnitude at most one, so from Lemma 4.1

 ⟨Xi,j,A(s)G⟩=⟨Xi,j,A⟨s⟩G⟩+⟨Xi,jA⟨s⟩G−A(s)G⟩=⟨Xi,j,A⟨s⟩G⟩±O(logn)≃π(i)Msi,j∥qs∥2\textsckm

for . For the second assertion, when take to be the partially labelled graph on two disconnected vertices, with labels and respectiveely. From Remark 4.7 we have

 ⟨Xi,j,J⟩=˜EpH,S,τ(x,G)=π(i)π(j)n2.

When , take as above and to be a single vertex labelled . ∎

We will now apply a fortuitous change of basis furnished to us by the parameter matrix . Recall that is the matrix whose columns are the right eigenvectors of , satisfying and . Now define a matrix , by which we mean that

 ˇX=⎛⎜ ⎜ ⎜⎝F1,11⋯F1,k1⋮⋱⋮Fk,11⋯Fk,k1⎞⎟ ⎟ ⎟⎠⎛⎜ ⎜ ⎜⎝X1,1⋯X1,k⋮⋱⋮Xk,1⋯Xk,k⎞⎟ ⎟ ⎟⎠⎛⎜ ⎜ ⎜⎝F1,11⋯F1,k1⋮⋱⋮Fk,11⋯Fk,k1⎞⎟ ⎟ ⎟⎠.

We will think of , analogous to , as a matrix of blocks . Note that we can also think of this as as a change of basis directly on the variables appearing in polynomials accepted by our pseudoexpectation.

###### Lemma 5.2.

For any , if , and

 ⟨ˇXi,i,A(s)G⟩≃λsi∥qs∥2kn.

Furthermore,

 ⟨ˇXi,j,J⟩={n2i=j=10else.
###### Proof.

Our block-wise change of basis commutes with taking inner products between the blocks and the non-backtracking walk matrices. In other words,

 ⎛⎜ ⎜ ⎜ ⎜⎝⟨ˇX1,1,A(s)G⟩⋯⟨ˇX1,k,A(s)G⟩⋮⋱⋮⟨ˇXk,1,A(s)G⟩⋯⟨ˇXk,k,A(s)G⟩⎞⎟ ⎟ ⎟ ⎟⎠ =FT⎛⎜ ⎜ ⎜ ⎜⎝⟨X1,1,A(s)G⟩⋯⟨X1,k,A(s)G⟩⋮⋱⋮⟨Xk,1,A(s)G⟩⋯⟨Xk,k,A(s)G⟩⎞⎟ ⎟ ⎟ ⎟⎠F ≃FTDiag(π)MsF⋅∥qs∥s\textsckmn =FTDiag(π)FΛs⋅∥qs∥s\textsckmn =Λs⋅∥qs∥s\textsckmn

A parallel calculation gives us

 =FTππTF⋅n2 =e1eT1n2,

where is the first standard basis vector. The final line comes since , being the left eigenvector associated to , is (up to scaling) the first row of . ∎

The remainder of the proof will amount to combining the constraints on the diagonal blocks of . As is PSD, is as well, so any PSD linear combination must satisfy

 0≤1n⟨m∑s=0csA(s)G,ˇXi,i⟩≃m∑s=0csλsi∥qs∥2\textsckm.

We can show that no satisfying the given constraints, and thus that the SDP is infeasible, by producing such constants as to make the right hand side of the above equation negative for at least one of . Notice also

 m∑s=0csA(s)G=m∑s=0csqs(AG)≜f(AG)

for some polynomial of degree . Because is a scalar polynomial in , its eigenvalues are applied to those of , and we get when is nonnegative on . By Friedman’s Theorem [Fri08], this spectrum consists of the ‘trivial’ eigenvalue , together with remaining eigenvalues whose magnitudes with high probability are at most for any . In fact, it is not necessary even that . To see this, note that from our discussion above,

 ⟨f(AG−dnJ),ˇXi,i⟩