In this paper we study the planted clique problem, first introduced in [Jer92]. In this problem one observes an -vertex undirected graph sampled in two stages; in the first stage, the graph is sampled according to an Erdos-Renyi graph and in the second stage, out of the vertices are chosen uniformly at random and all the edges between these vertices are deterministically added (if they did not already exist due to the first stage sampling). We call the second stage chosen -vertex subgraph the planted clique . The inference task of interest is to recover from observing . The focus is on the asymptotic setting where both
and the recovery should hold with probability tending to one as(w.h.p.).
It is a standard result in the literature that as long as , the graph will have only as a -clique in w.h.p. (see e.g. [Bol01]). In particular under this assumption, is recoverable w.h.p. by the brute-force algorithm which checks every -vertex subset of whether they induce a -clique or not. Note that the exhaustive algorithm requires time to terminate, making it in principle not polynomial-time for the values of of interest. For any , a relatively simple quasipolynomial-time algorithm, that is an algorithm with termination time , can be also proven to recover correctly w.h.p. as (see e.g. the discussion in [FGR17] and references therein). Note that a quasipolynomial-time termination time outperforms the termination time of the exhaustive search for .
The first polynomial-time (greedy) recovery algorithm of came out of the observation in [Kuč95] according to which when for some sufficiently large , the -highest degree nodes in are the vertices of w.h.p. A fundamental work [AKS98] proved that a polynomial-time algorithm based on spectral methods recovers when for any fixed (see also [FR10], [DM13], [DGGP14] and references therein.) Furthermore, in the regime , various computational barriers have been established for the success of certain classes of polynomial-time algorithms, such as the Sum of Squares Hierarchy [BHK16], the Metropolis Process [Jer92] and statistical query algorithms [FGR17]. Nevertheless, no general algorithmic barrier such as NP-hardness has been proven for recovering when . The absence of polynomial-time algorithms together with the absence of an NP-hardness explanation in the regime where and gives rise to arguably one of the most celebrated and well-studied computational-statistical gaps in the literature, known as the planted clique problem.
Computational gaps between what existential or brute-force methods promise and what computationally efficient algorithms achieve is an ubiquitous phenomenon in the analysis of algorithmic tasks in random environments. Such gaps arise for example in the study of several “non-planted” models like the maximum-independent-set problem in sparse random graphs [GSa], [COE11], the largest submatrix problem of a random Gaussian matrix [GL16], the diluted 4-spin-model [CGPR17] and the study of random -SAT [MMZ05], [ACO08]. Recently, such computational gaps started appearing in “planted”
inference algorithmic tasks in statistics literature such as the high dimensional linear regression problem[GZ17a], [GZ17b]BAGJ18], [BR13] the stochastic block model (see [Abb17], [BBH18] and references therein) and, of course, the planted clique problem described above. Towards the fundamental study of such computational gaps the following two methods have been considered.
Computational gaps: Average-Case Complexity Theory and the central role of Planted Clique
None of the above gaps have been proven to be an NP-hard algorithmic task. Nevertheless, in correspondence with the well-studied worst-case NP-Completeness complexity theory (see e.g. [Kar72]), some very promising attempts have been made towards building a similar theory for planted inference algorithmic tasks (see e.g. [BR13], [CLR17], [WBP16], [BBH18] and references therein). The goal of this line of research is to show that for two conjecturally computationally hard statistical tasks the existence of a polynomial-time algorithm for one task implies a polynomial-time recovery algorithm for the other. In particular, (computational hardness of) the latter task reduces to (computational hardness of) the former. Notably, the planted clique problem seem to play a central role in these developments, similar to the role the boolean-satisfiability problem played in the development of the worst-case NP-completeness theory. Specifically in the context of statistical reduction, multiple statistical tasks in their conjecturally hard regime such as Sparse-PCA [BR13], submatrix localization [CLR17], RIP certification [WBP16], rank-1 Submatrix Detection, Biclustering [BBH18] have been proven to reduce to the planted clique problem in the regime .
Computational Gaps: A Spin Glass Perspective (Overlap Gap Property)
For several of the above-mentioned computational gaps, an inspiring connection have been drawn between the geometry of their solution space, appropriately defined, and their algorithmic difficulty. Specifically it has been repeatedly observed that the appearance of a certain disconnectivity property in the solution space called Overlap Gap Property (OGP), originated in spin glass theory, coincides with the conjectured algorithmic hard phase for the problem. Furthermore, it has also been seen that at the absence of this property even greedy algorithms can exploit the smooth geometry and succeed.
The connection between algorithmic performance and OGP was initially made in the study of the celebrated example of random -SAT (independently by [MMZ05], [ACORT11]) but then has been established for other “non-planted” models such as maximum independent set in random graphs [GSa], [RV14] but also “planted models” such as high dimensional linear regression [GZ17a], [GZ17b] and tensor PCA [BAGJ18]. Despite the fundamental nature of the planted clique problem in the development of average-case complexity theory, OGP has not been studied for the planted clique problem. The study of OGP in the context of the planted clique problem is the main focus of this work.
We start with providing some intuition on what OGP is in the context of “non-planted” problems. Motivated by the study of concentration of the associated Gibbs measures [Tal10] for low enough temperature, the OGP concerns the geometry of the near (optimal) solutions. It has been observed that any two “near-optimal” solutions for many such modes exhibit the disconnectivity property stating that that their overlap, measured as a rescaled Hamming distance, is either very large or very small, which we call the Overlap Gap Property (OGP) [ACORT11], [ACO08], [MRT11], [COE11], [GSa], [RV14], [CGPR17] [GSb]. For example, the independent sets achieving nearly maximal size in sparse random graph exhibit the OGP [GSa]. An interesting rigorous link also appears between OGP and the power of local algorithms. For example OGP has been used in [GSa] to establish a fundamental barriers on the power of a class of local algorithms called i.i.d. factors for finding nearly largest independent sets in sparse random graphs (see also [RV14] for a tighter later result). Similar negative results have been established in the context of the random NAE-K-SAT problem for the Survey propagation [GSb], of random NAE-K-SAT for the Walksat algorithm [COHH16] and of the max-cut problem in random hypergraphs for the family of i.i.d. factors [CGPR17], As mentioned also above, when OGP disappears the picture changes and, for many of these problems, greedy methods successfully work [ACO08], [AKKT02]. Importantly, because of this connection it is conjectured that the onset of the phase transition point for the presence of OGP corresponds to the onset of algorithmic hardness.
It is worth mentioning that other properties such as the shattering property and the condensation, which have been extensively studied in the context of random constraint satisfaction problems, such as random K-SAT, are topological properties of the solution space which have been linked with algorithmic difficulty (see e.g. [ACO08], [KMRT07] for appropriate definitions). We would like to importantly point out that neither of them is identical with OGP. OGP implies for trivial reasons the shattering property but the other implication does not hold. For example, consider the model of random linear equations [ACOGM17], where recovery can be obtained efficiently via the Gaussian elimination when the system is satisfiable. In [ACOGM17] it is established that OGP never appears as the overlaps concentrate on a single point but shattering property does hold in a part of the satisfiability regime. Furthermore, OGP is also not the same with condensation. For example, in the solution space of random -SAT, OGP appears for multioverlaps around ratio clauses to variables about (up to poly- factors) [GSb] which is far below condensation which appears around ratio [KMRT07]. It should be noted that in random k-SAT the onset of the apparent algorithmic hardness also occurs around [GSb], [Het16]. The exact connection between each of these properties and algorithmic hardness is an ongoing and fascinating research direction.
Recently the study of OGP has been initiated for “planted” problems as well, for example for the high dimensional linear regression problem [GZ17a], [GZ17b]. For this “planted” problem, the goal is to recover a hidden
-sparse binary vector from noisy linear observations of it. The strategy followed in this paper is comprised of two steps. First the task is reduced into an average-case optimization task associated with a natural empirical risk objective. Then, as a second step, a geometric analysis of the region of feasible solutions is performed and the OGP (or the lack of it) is established. Interestingly, in this line of work the “overlaps” considered are between the “near-optimal” solutions of the optimization task and the planted structure itself. In the present paper we follow a similar path to identify the OGP phase transition point for the planted clique problem.
Contribution and Discussion
In this paper we analyze the presence of OGP for the planted clique problem. We first turn the inference goal into an average-case optimization problem by adopting an “empirical risk” objective and then perform the OGP analysis on the landscape of near-optimal solutions. The first natural choice for the empirical risk is the log-likelihood of the recovery problem which assigns to any -subset the risk value . A relatively straightforward analysis of this choice implies that when the only -subset obtaining a non-trivial log-likelihood is the planted clique itself, since there are no other cliques of size in the graph w.h.p. as . In particular, this perspective of studying the near-optimal solutions and OGP fails to provide anything fruitful.
The Dense Subgraphs Landscape and OGP
We adopt the “relaxed” -Densest-Subgraph objective of the observed graph which assigns to any -subset the empirical risk , that is we would like to solve
where by we refer to the set of edges in the induced subgraph defined by . Notice that is equivalent with maximizing the log-likelihood of a similar recovery problem, the planted -dense subgraph problem where the edges of are only placed with some specific probability and the rest of the edges are still drawn with probability as before (see e.g. [BBH18] and references therein). Also, notice that, interestingly, does not depend on the value of ; that is it is universal for all values of . Now the planted clique model we are interested in can be seen as the extreme case of the planted -dense subgraph problem when . In this work we analyze the overparametrized version of , -densest-subgraph problem, where for some parameter the focus is on
while importantly the planted clique in remains of size . In this work we study the following question:
How much can a near-optimal solution of intersect the planted clique ?
The Overlap Gap Property (-OGP) for the -Densest subgraph problem would mean that near-optimal solution of (sufficiently dense -subgraphs of ) have either a large or small intersection with the planted clique (see Definition 1 below for more details on the notion).
To study the presence of -OGP we focus on the monotonicity of the overlap-restricted optimal values for
where Note that we define the overlaps beginning from as this level of overlap with is trivially obtained from a uniformly at random chosen -vertex subgraph.
Monotonicity and OGP
It is not hard to see that the monotonicity (or lack of) of might be linked with the presence or absence of -OGP. For example, assume that for some realization of the curve satisfies that for some ,
then -OGP holds. Indeed, choosing any with
we notice that (1) since any “dense” -subgraph with at least edges cannot overlap at exactly vertices with and (2) since there exist both zero and full overlap “dense” -subgraphs with that many edges. On the other hand, when the curve is monotonic with respect to overlap , -OGP does not hold for a similar reasoning. Furthermore, note, that when the curve is monotonically increasing the near-optimal solutions of have almost full intersection with (hence, considered relevant for recovery), while when it is monotonically decreasing the near-optimal solutions of have almost empty intersection with (hence, considered irrelevant for recovery).
Monotonicity of the First Moment Curve
Using an optimized union-bound argument (first moment method) we obtain a deterministic upper bound function (we call it first moment curve) such that for all overlap values ,
which is also provably tight, up-to-lower order terms, at the end-point (Proposition 1). For this reason, with the hope that provides a tight upper bound in (3), we perform a monotonicity analysis of .
We discover that when , and relatively small (including ) is non-monotonic satisfying a relation similar to (2) for some , while for relatively large it is decreasing. On the other hand, when for relatively small is non-monotonic satisfying a relation similar to (2) for some , while for relatively large it is increasing. In particular, an exciting phase transition is taking place at the critical size and high overparametrization . A summary is produced in Table 1. Theorem 1 and the discussion that follows provide exact details of the above statements.
|Low Overparametrization||High Overparametrization|
Assuming the tightness of in (3) we arrive at a conjecture regarding the -OGP of the landscape. In the apparently algorithmically ihard regime the landscape is either exhibiting -OGP or is uniformative. On the other hand, in the algorithmically tractable regime for appropriately large there is no -OGP and the optimal solutions of have almost full overlap with . Of course this is only a prediction for the monotonicity of , as the function corresponds only to an upper bound. For this reason we establish results proving parts of the picture suggested by the monotonicity of .
Overlap Gap Property for
We establish that under the assumption , for some indeed -OGP holds for (notice ) in the regime. The result holds for all values of (up-to- factors) where the curve is proven non-monotonic (Theorem 2). Specifically, we establish that for some constants any -subgraph of which is “sufficiently dense” will either intersect in at most nodes or in at least nodes. Our proof is based on a delicate second moment method argument for dense subgraphs of Erdos Renyi graphs. We believe that the second moment method argument can be further improved to extend the result to the case for arbitrary . We leave this important step as an open question.
Greedy Local Search Success
We also establish that when and sufficiently large (much bigger than ) a greedy local search algorithm (algorithm (LSPC)) on the space of -vertex subgraphs of can exploit the monotonicity property and recover in polynomial-time the planted clique (Theorem 3). This is in accordance with the expectations in the literature that at the absence of -OGP greedy local search algorithms work successfully (see Introduction). The algorithm is of “gradient-descent” nature. It’s success follows from a simple lemma; starting with a uniformly at random -subgraph there is always the densest neighboring (with respect to the vertex-Hamming distance) -subgraph has strictly higher intersection with the planted clique. The choice of stated in the Theorem 3 is coinciding, up-to- factors, with the regime that becomes increasing and appears necessary for the analysis of the algorithm to go through. In particular, the analysis does not work with simply when lies between , suggesting that over-parametrization is essential for local search to succeed for the planted clique problem, all the way down to .
The use of Overparametrization
The ability to choose is paramount in all the results described here. If we have opted for the arguably more natural choice , and focused solely on -vertex subgraphs the monotonicity of the curve exhibits a phase transition at the peculiar threshold (see Remark 2). To make this more precise, no landscape phase transition is suggested around the apparent algorithmic threshold if we focus on -vertex dense subgraphs (see for example the identical nature of Figure 1(a) and Figure 2(a) where is chosen near from below and above respectively). For this reason, the use of overparametrization is fundamental.
Significant inspiration from this overparametrization approach is derived from it’s recent success on “smoothening” bad local behavior in landscapes arising predominantly in the context of deep learning[SS17], [VBB18], [LMZ18] but also beyond it (e.g. [XHM18] in the context of learning mixtures of Gaussians). We consider this to be a novel conceptual contribution to this line of research on computational-statistical gaps with potentially various extensions.
-Dense Subgraphs of
for any where . The study of is a natural question in random graph theory which, to the best of our knowledge, remains not well-understood even for moderately large values of . For small enough values of , specifically , it is well-known w.h.p. as (originally established in [GM75]). On the other hand when , trivially follows and hence for any , w.h.p. as . If we choose for the sake of argument the following natural question can be posed;
How transitions from for to for ?
A recent result in the literature studies the case for [BBSV18] and establishes (it is an easy corollary of the main result of the aforementioned paper),
w.h.p. as . Here is natural logarithm and is the inverse of the (rescaled) binary entropy is defined by
Notice that which means that the result from [BBSV18] agrees with the first order behavior of at “ very large” such as . The proof from [BBSV18] is based on a careful and elegant application of the second moment method, where special care is made to control the way “sufficiently dense” subgraphs overlap.
We study the behavior of for any , for . Specifically, we build and improve on the second moment method technique from [BBSV18] and establish tight results for first and second order behavior of when is a power of strictly less than . Specifically in Theorem 4 we show that for any for there exists some positive constant such that
w.h.p. as .
First notice that as our result are established when is a power it does not apply in the logarithmic regime. Nevertheless, it is in agreement with the result of of [BBSV18] since for ,
Finally, by Taylor expanding around : for (Lemma 12), using our result we can identify the second order behavior of
w.h.p. as . See Corollary 1 for the exact statement. Note that the second order behavior is of different order in that in the case . We leave the analysis of the behavior of in the regime for between and as an intruiguing open question.
Connection with -OGP Notice that our result (6) holds for any , but in the discussion above we only claimed of using this result to prove -OGP for . This happens because to establish -OGP using our non-monotonicity arguments and this result (for ) we need to make sure the error term in (6) is , which from our result it can only be established if . The reason is that to transfer the non-monotonicity of the first moment curve to the non-monotonicity of the actual curve we need the error term in our approximation gap between and to do not alter the non-monotonicity behavior of . We quantify the non-monotonicity via its “depth”, that is via
The latter “depth” quantity can be proven to grow with order similar to leading to the necessary order for the error term to make the argument go through.
Throughout the paper we use standard big notations, e.g., for any real-valued sequences and , if there exists an absolute constant such that ; or if there exists an absolute constant such that ; or if .
For an undirected graph on vertices we denote by the sets of its vertices and the set of its edges. For a subset of we refer to the set of all vertices in connected with an edge to each vertex of , as the common neighborhood of .
2 Main Results
2.1 The Planted Clique Model and Overlap Gap Property
We start with formally defining the Planted Clique Model and the recovery goal of interest.
Let with . We assume that both are known. All of our results focus on the regime where grows with as with .
The Generative Process
First sample an vertex undirected graph according to the Erdos-Renyi distribution. Then choose out of vertices of uniformly at random and connect all pairs of these vertices with an undirected edge, creating what we call as the planted clique of size . We denote the resulting -vertex undirected graph by or for simplicity.
The Recovery Goal
Given one sample of recover the vertices of the planted clique .
2.2 The -Densest Subgraph Problem for
We study the landscape of the sufficiently dense subgraphs in . Besides we introduce an additional parameter with that will be optimized. The dense subgraphs we consider are of vertex size . We study overlaps between the sufficiently dense -dense subgraphs and the planted clique . Specifically we focus on the -densest subgraph problem on , defined in (7).
We define the -Overlap Gap Property of .
Definition 1 (-Ogp).
exhibits -Overlap Gap Property (-OGP) if there exists with and such that;
There exists -subsets with
For any -subset with it holds,
Here, the first part of the definition ensures that there are sufficiently dense -subgraphs of with both “low” and “high” overlap with . The second condition ensures that any sufficiently dense -subgraph of will have either “low” overlap or “high” overlap with , implying gaps in the realizable overlap sizes.
To study -OGP we study the following curve. For every let
with optimal value denoted by . In words, corresponds to the number of edges of the densest -vertex subgraph with vertex-intersection with the planted clique of cardinality . Notice that, as explained in the previous section, we restrict ourselves to overlap at least since this level of intersection with is achieved simply by sampling uniformly at random a -vertex subgraph of .
2.3 Monotonicity Behavior of the First Moment Curve
The following deterministic curve will be of distinct importance in what follows.
Definition 2 (First moment curve).
We define the first moment curve to be the real-valued function , where for ,
Here is defined in (5). We establish the following proposition relating and .
Let with .
with high probability as .
Suppose for . For ,
with high probability as .
We explain here how Part (1) of Proposition 1 is established with a goal to provide intuition for the first moment curve definition. Fix some . For
we consider the counting random variable for the number of subgraphs withvertices, vertices common with the planted clique and at least edges;
Notice that first moment method, or simply Markov’s inequality, yields
In particular, if for some it holds we conclude that whp and in particular all dense subgraphs have at most edges, that is
w.h.p. as . Therefore the pursuit of finding the tightest upper bound using this technique, consists of finding the .
Note that for any subset
the number of its induced edges follows a shifted Binomial distribution. In particular, we have
From this point on, standard identities connecting the tail of the Binomial distribution with the binary entropy function (see for example Lemma 11 below) yield the optimal choice to be
which yields Part (1) if Proposition 1. More details are in Section 4. The part (2) follows from a much more elaborate second moment method, the discussion of which we defer to Subsection 2.6 and Section 3.
We study the monotonicity property of the first moment curve. We establish the following proposition which proves that for appropriate choice of the overparametrization level of , the first moment curve exhibits a monotonicity phase transitions at the predicted algorithmic threshold .
Theorem 1 (Monotonicity Phase Transition at ).
Let with and an arbitrarily small constant. Suppose and furthermore . There exist a sufficiently large constant such that for the discretized interval the following are true for sufficiently large ,
for any the function is non-monotonic (Figure 1(a)).
for any the function is decreasing (Figure 1(b)).
for any the function is non-monotonic (Figure 2(a)).
for any the function is increasing (Figure 2(b)).
Furthermore, in the regime that the function is non-monotonic there are constants such that for and and large enough the following are true.
The proof of the Theorem can be found in Section 5.
In the special case where , it can be straightforwardly checked from Theorem 1 that exhibits a monotonicity phase transition at . In particular, for , the monotonicity of obtains no phase transition around .
Note that the monotonicity analysis in Theorem 1 is performed in the slightly “shrinked” interval for some constant and arbitrarily small . The restriction is made purely for technical reasons as it allows for an easier analysis of the curve’s monotonicity behavior. We leave the monotonicity analysis near the endpoints as a topic for future work.
Theorem 1 suggests that there are four regimes of interest for the pair and the monotonicity behavior of . We explain here the implication of Theorem 1 under the assumption that is a tight approximation of .
Let us focus first on the regime where the size of the planted clique is . Assume first that the level of overparametrization is relatively small, namely , including the case . In that case the curve is non-monotonic and (9) holds (the case of Figure 1(a)). Now this implies that -OGP appears for the model. The reason is that under the tightness assumption, (9) translates to
Using that we conclude easily that for sufficiently small constant any -vertex subgraph with number of edges at least must have either at most interesection with or at least intersection with and there exist both empty and full overlap dense subgraphs with at least that many edges.