Given an undirected graph, the decision problem of checking whether it contains a -clique, i.e., a subgraph of size which contains all the possible edges, is famously a NP-hard problem and appears in the list of twenty one NP-complete problems in the early work of Karp [Karp1972]. This is a notoriously hard problem in the worst-case. The best known approximation algorithm by the work of Boppana and Halldórsson has an approximation factor of [cite-bopanna]. The results by Håstad and Zuckerman shows that no polynomial time algorithm can approximate this to a factor better than for every , unless [DBLP:journals/eccc/ECCC-TR97-038, 10.1145/1132516.1132612]. This was improved by Khot et al. [10.1007/11786986_21], who showed that there is no algorithm which approximates the maximum clique problem (in the general case) to a factor better than for any constant assuming .
This led to studying this problem in the average-case, i.e., we plant a clique of size in a Erdős-Rényi random graph , and study the ranges of parameters of and for which this problem can be solved. We give a brief survey in sec:related_work.
Another direction is to consider the problem in a restricted family of graphs. This allows us to design new and interesting algorithms with much better guarantees (as compared to the worst-case or even the average-case) and might possibly help us get away from the adversarial examples which causes the problem to be hard in the first place. This way of studying hard problems falls under the area of “Beyond worst-case analysis”. We take this approach and in this work, we study the Planted Clique problem in a semi-random model. This is a model generated in multiple stages via a combination of adversarial and random steps. Such generative models have been studied in the early works of [MR1894527, cite-key-spencer, 10.1016/j.jalgor.2004.07.003]. We refer the reader to [DBLP:journals/corr/abs-2004-13978] and the references therein for a survey of variety of graph problems which have been subjected to such a study.
In this section, we describe our semi-random model.
An instance of our input graph Clique is generated as follows,
We partition the vertex set into two sets, and with . We further partition arbitrarily into sets . We add edges between pairs in and pairs in for
, independently with probabilityand of weight .
We add edges between pairs of vertices in such that the graph induced on is a clique. For the sake of brevity, we also add a self loop on each of the vertices of , this will make the arithmetic cleaner (like the average degree of is now instead of ) and has no severe consequences. Note that we assume the subgraph is unweighted.
For each , we add edges of arbitrary non-negative weights between arbitrary pairs of vertices in , such that the graph induced on has the following property,
Or in other words, for each , the maximum average degree of the subgraph is at most for some and . Here for an edge denotes the weight of edge .
(Monotone adversary step) Arbitrarily delete any of the edges added in step:one and step:three.
Output the resulting graph.
Also see the below figure (fig:fig) for a pictorial representation of the model.
In this paper, the problem which we study is as follows: Given a graph generated from the above described model (def:model), the goal is to recover the planted clique with high probability. We show that for a “large” range of the input parameters, we can indeed solve this problem.
Our algorithm is based on rounding a standard semidefinite programming relaxation of Densest -subgraph problem but we also add the following set of constraints to it,
For the sake of completeness, we rewrite the complete SDP in app:sdp. This is a key difference as compared to the Densest -subgraph problem and we will use the above set of constraints crucially in our analysis, much of which is inspired from [DBLP:journals/corr/abs-2004-13978]. We will describe this in more detail in sec:recovery.
1.2 Main Result
In this section, we describe our main results and its interpretation.
There exist universal constants and a deterministic polynomial time algorithm, which takes an instance of Clique where
satisfying , and , and recovers the planted clique with high probability (over the randomness of the input).
It is important to note that our results do not depend on the size of the subgraphs ’s but only on their count, i.e. parameter . Even our model is parameterized by but not on the sizes of ’s. That is all the ’s can be of different sizes but as long as they form a partition of and the average degree requirement of subgraphs (the one stated in step:three) is met, our results hold.
We see some interesting observations from the above theorem (thm:main). Firstly, there are two conditions for the algorithm to work,
222This condition is more of a technical requirement than an interesting setting., or to be verbose, should be “large enough”.
The function (which is dependent on the input parameters) should lie in the range , or stated in other words, should be “small”.
A setting of input parameters when the value of is “small” is as follows:
We now compare our results to a few models already studied in literature.
For , i.e. the case when there are no such ’s, or in other words when is nothing but a random graph on vertices and probability parameter , the lower bound on translates to . Thus in this case, our problem reduces to recovering the planted clique in a random graph and we get a similar threshold value of to the one already studied in literature [planted-random-graph, 7782957].
Recall the model, DSReg introduced in the work of [DBLP:journals/corr/abs-2004-13978]. In this model, the subgraph is an arbitrary -regular graph of size , is a random graph with parameter , and the subgraph , has the following property,
Clearly, this is analogous to the case when we have only one such comprising the whole of such that the maximum average degree of any subgraph of is at most . Now when , our model reduces to the case when DSReg has a clique on (instead of a -regular subgraph). Note that this case can be solved using our algorithm efficiently and we can recover the planted clique i.e. w.h.p. This is a much stronger guarantee as compared to the one in [DBLP:journals/corr/abs-2004-13978] where they output a vertex set with a large intersection with the planted set (but not completely), with the same threshold on , i.e., .
1.3 Related Work
Random models for the clique problem.
For the Erdős-Rényi random graph: , it is known that the largest clique has a size approximately [matula]. There are several poly-time algorithms which find a clique of size , i.e., with an approximation factor roughly [grimmett_mcdiarmid_1975]. It is a long standing open problem to give an algorithm which finds a clique of size for any fixed . This conjecture has a few interesting cryptographic consequences as well. [cite-key].
Planted models for the clique problem.
In the planted clique problem, we plant a clique of size in and study the ranges of for which this problem can be solved. The work by Kučera [10.1016/0166-218X(94)00103-K] shows that if , then the planted clique essentially comprises of the vertices of the largest degree. Alon, Krivelevich, and Sudakov [planted-random-graph] give a spectral algorithm to find the clique when . There is also a nearly linear time algorithm which succeeds w.h.p. when for any [deshpande-montanari]. When , the work by Barak et al. rules out the possibility for a sum of squares algorithm to work .
Semi-random models for related problems.
The semi-random model studied in this paper is inspired from a combination of two works. First is the Feige-Kilian model [MR1894527], this is a very generic model. In this model, we plant an independent set on (), the subgraph is a random graph with parameter , while the subgraph can be an arbitrary graph. Then an adversary is allowed to add edges anywhere without disturbing the planted independent set. McKenzie, Mehta, and Trevisan [DBLP:conf/soda/McKenzieMT20] show that for , their algorithm finds a “large” independent set. And for the range , their algorithm outputs a list of independent sets, one of which is with high probability. Restrictions of this model has also been studied in the works of [DBLP:journals/eccc/Steinhardt17, 10.1145/3055399.3055491].
The second relevant model is studied by Khanna and Louis [DBLP:journals/corr/abs-2004-13978] for the Densest -subgraph problem. They plant an arbitrary dense subgraph on , the subgraph is a random subgraph, and the subgraph has a property (step:three of model construction) like the one of ’s of this paper. A monotone adversary can delete edges outside . Our algorithm, model, and the analysis is inspired from their work. We study the problem in the case when is a clique on vertices instead of an arbitrary -regular graph. We get a full recovery of the clique in this paper instead of a “large” recovery of the planted set, for a “large” range of input parameters. A key result by Charikar [Charikar:2000:GAA:646688.702972] is used to prove our bounds.
The idea of using SDP based algorithms for solving semi-random models of instances has been explored in multiple works for a variety of graph problems, some of which are [6108205, Makarychev:2012:AAS:2213977.2214013, Makarychev:2014:CFA:2591796.2591841, mossel2016, 7523889, DBLP:conf/icalp/LouisV18, DBLP:conf/fsttcs/LouisV19, DBLP:conf/soda/McKenzieMT20, DBLP:journals/corr/abs-2004-13978].
1.4 Proof Idea
Our algorithm is based on rounding a SDP solution. The basic idea is to show that the vectors corresponding to the planted setare “clustered” together. This is shown by bounding the contribution of the vectors towards the SDP mass from the rest of the graph (i.e. everything except ). This allows us to exploit the geometry of vectors to recover a part of the planted clique. This is possible only because the subgraph is a sparse graph by construction (step:three) and the random bipartite subgraph (step:one) will not have dense sets either. Thus qualitatively the SDP should put most of the mass on the edges of . We show for a large range of input parameters, this indeed happens.
The rest of the vertices can be recovered using a greedy algorithm. Note that for this to work, we crucially use the orthogonality constraints added for each non-edge pair (equation eq:extra_constraint) and this additional recovery step works only because the planted set is a clique and not an arbitrary dense subgraph.
Our algorithm (alg:cliques) is based on rounding a SDP (sdp:dks) and is robust against a monotone adversary (step:four of the model construction). This is an important point because many of algorithms based on spectral or combinatorial methods are not always robust and do not work with the presence of such adversaries.
In this section, we bound the SDP mass corresponding to different subgraphs. The idea is to show that the SDP (sdp:dks) puts a large fraction of its total mass on . Most of the bounds closely follows that from [DBLP:journals/corr/abs-2004-13978]. A monotocity argument can be used to ignore the action of the adversary. We defer the details in this version of the paper. We decompose the SDP objective into three corresponding parts and bound each of them separately (sec:bound_sdp) and then combine these bounds in the end (sec:putting). We also highlight the lemma where we bound the corresponding sum.
Note that the first and the last term in equation eq:sum1 corresponds to the subgraphs and respectively while the second and the third term (i.e. the contribution from the random subgraph) can be further split as follows.
Note that there are two kinds of terms in equation eq:sum2, one which only depends on the SDP constraints and second which uses the adjacency matrix of .
Let be a sized centered matrix (i.e. ) defined as follows. Here denotes the adjacency matrix of the input graph (before the action of monotone adversary).
This definition (def:matrix_b) allows us to rewrite the centered terms as follows.
2.1 Bounding the SDP terms
In this section, we show an upper bound on the various terms of the SDP objective (discussed as above). First, we introduce some notation.
Remark 2.2 (Restatement of Notation from [DBLP:journals/corr/abs-2004-13978]).
We define probability distributionsover finite sets
. For a random variable (r.v.), its expectation is denoted by . In particular, we define the distribution which we use below. For a vertex set
, we define a probability (uniform) distributionon the vertex set as follows. For a vertex , . We use to denote for clarity.
Lemma 2.3 (Restatement of Lemma 3.2 from [DBLP:journals/corr/abs-2004-13978] with the value ).
Lemma 2.4 (Restatement of Lemma 2.5 from [DBLP:journals/corr/abs-2004-13978]).
Note that for all ,
The first inequality just follows from the SDP constraint eq:sdp5 (non-negativity) and the second one follows from the constraint eq:sdp4. Summing up for all ,
Lemma 2.6 (Restatement of Lemma 2.6 from [DBLP:journals/corr/abs-2004-13978]).
Lemma 2.7 (Restatement of Corollary 2.8 from [DBLP:journals/corr/abs-2004-13978]).
There exists universal constants such that if , then
with high probability (over the randomness of the input).333Note that the matrix is defined differently in the two papers however this is not a critical issue and the spectral norm bound still holds.
With high probability (over the randomness of the input),
if , where are a universal constants.
A similar calculation to the one done in lem:four, we can easily show that, ,
Summing up for and ,
Lemma 2.9 (Restatement of Proposition 3.12 from [DBLP:journals/corr/abs-2004-13978] with the value ).
For all ,
Summing up for all and using lem:six for each sum, we get,
2.2 Putting things together
In this section we combine the above bounds.
With high probability (over the randomness of the input),
if , where are a universal constants.
Since our SDP is a maximization relaxation (See app:sdp), we have that,
where we used equations eq:sum2, eq:sum3, and the results from the sec:bound_sdp. Rearranging, cancelling terms, and using the fact that the function is increasing, we get,
Let be a function over the input parameters defined as, for the sake of brevity.
3 Recovering the planted clique
In the previous section (sec:analysis), we showed that under some mild conditions over the input parameters (namely when, is “large” and is “small”) and with high probability (over the randomness of the input), we have,
We define a vertex set
where is a parameter to be chosen later.
We will next show that for a cleverly chosen value of , we can show that is also a clique, and using the fact that the boundary of the subgraph is random, we further show that . Once we have established this, it is easy to recover the rest of the vertices of
using a simple greedy heuristic. Before that, we recall two important technical results from[DBLP:journals/corr/abs-2004-13978].
Let be any feasible solution of the SDP and such that for all where , then for all , .
We defer the proof of lem:apx2 to app:proof since it is standard in the literature.
Lemma 3.2 (Restatement of Lemma 3.5 from [DBLP:journals/corr/abs-2004-13978]).
With high probability (over the randomness of the input),
The next lemma (lem:clique1) is perhaps the most important technical result of this paper.
For and . With high probability (over the randomness of the input), the subgraph is a clique and moreover, .
By applying lem:apx2 to the set , we get for all . We set such that . Thus we can set . It does satisfy the bounds on , namely when . By the SDP constraints, (the extra added constraint, namely, equation eq:extra_constraint), we have that the subgraph induced on is a clique. This is easy to see. Consider any two vertices such that there is no edge between and , then by the above SDP constraint, , however by the definition of set T, . This is a contradiction and thus is indeed a clique.
Next we prove that w.h.p. . By lem:size_T, when .
where we used the union bound in step 2 and the lower bound on in step 3. ∎
We now have all the ingredients to prove our main result.
Proof of thm:main.
By lem:clique1 we showed that , now we can use a greedy strategy to recover the rest of . We iterate over all vertices in and add them to our set if it has edges to all of . A calculation similar to the one shown above can be used to ensure that no vertex of enters in this greedy step. Also note that
Here is nothing but a normalization of for a cleaner representation. We summarize this in the algorithm below (alg:cliques). It is easy to see that the output of this algorithm, the set is nothing but the planted clique itself. ∎
In this paper, we looked at a semi-random model for the Planted Clique problem. We showed that the natural SDP relaxation of the -clique puts together the vectors corresponding to the planted clique “closely”. This allows us to recover a part of the planted solution. The rest of the solution can be recovered by using a greedy algorithm. Our model is inspired from the seminal work of Feige and Killian [MR1894527]. Our algorithm and the analysis closely follows from the work on Densest -subgraph problem by Khanna and Louis [DBLP:journals/corr/abs-2004-13978].
YK thanks Akash Kumar, Anand Louis, and Rameesh Paul for helpful discussions. YK was supported by the Ministry of Education, Government of India during his stay at IISc.
Appendix A Semidefinite Program
We use the following SDP relaxation to solve this problem.
It is easy to see that when is a clique, then the integral solution corresponding to does satisfy the above constraints. We state it now,
where is any unit vector and this feasible solution gives an objective value of .
Appendix B Proof of lem:apx2
The proof of this lemma follows along the lines of Lemma 2.3 of [DBLP:journals/corr/abs-2004-13978] however we restate it for completeness.
We first introduce vectors and scalars (for all ) such that and . Using SDP constraints eq:sdp7 and eq:sdp8 we get,
|Also note that,||(13)|
Since using this in above equation we get