A New Algorithm for the Robust Semi-random Independent Set Problem

08/10/2018
by   Theo McKenzie, et al.
0

In this paper, we study a semi-random version of the planted independent set problem in a model initially proposed by Feige and Kilian, which has a large proportion of adversarial edges. We give a new deterministic algorithm that finds a list of independent sets, one of which, with high probability, is the planted one, provided that the planted set has size k=Ω(n^2/3).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/08/2020

Independent Sets of Random Trees and of Sparse Random Graphs

An independent set of size k in a finite undirected graph G is a set of ...
04/02/2021

Independent Sets in Semi-random Hypergraphs

A set of vertices in a hypergraph is called an independent set if no hyp...
12/10/2019

Asymptotic performance of the Grimmett-McDiarmid heuristic

Grimmett and McDiarmid suggested a simple heuristic for finding stable s...
03/27/2020

Large independent sets on random d-regular graphs with d small

In this paper, we present a prioritized local algorithm that computes a ...
11/20/2017

Edge Estimation with Independent Set Oracles

We study the problem of estimating the number of edges in a graph with a...
10/24/2018

On random primitive sets, directable NDFAs and the generation of slowly synchronizing DFAs

We tackle the problem of the randomized generation of slowly synchronizi...
08/10/2021

Correlation Clustering Reconstruction in Semi-Adversarial Models

Correlation Clustering is an important clustering problem with many appl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The maximum independent set problem is, given a graph , find the largest set of mutually non-adjacent vertices. The associated decision problem, to determine whether a graph contains an independent set of size at least , was one of Karp’s twenty-one problems originally proved to be NP-complete [13]. More recent hardness of approximation results [19] show that, for every , it is impossible to approximate maximum independent set to within in the worst case, unless .

The worst-case hardness of this problem has motivated the study of its average-case complexity and of its complexity in semi-random models that are intermediate between average-case analysis and worst-case analysis.

A classical model for the average-case analysis of graph algorithms is the Erdós-Rényi model, where each edge is independently present with some probability . In such a model, the largest independent set has size about with high probability, for and the number of vertices in the graph [15]. A simple greedy algorithm finds, with high probability, an independent set of size about . It has been a long-standing open problem to give an algorithm that finds an independent set of size .

Another classical model is the “planted independent set” one, in which one starts from a (from now on we will mostly discuss the case, although other values of are also interesting) random graph and then one picks a random set of vertices and removes all existing edges among those vertices, turning them into an independent set. If , the selected set of vertices (which we will call the “planted” independent set of the graph) is, with high probability, the unique maximum independent set of the graph. In this case, the problem of finding the largest independent set in the graph coincides with the “recovery” problem of identifying the selected set of vertices.

When the size of the the planted independent set is , choosing the vertices of lowest degree is sufficient to find the hidden independent set [14]; Alon, Krivelevich and Sudakov [1] give a spectral algorithm to find the planted independent set with high probability when . It is an open problem whether there is a polynomial time algorithm that finds the planted independent set with high probability in the regime . Recently Barak, Hopkins, Kelner, Kothari, Moitra, and Potechin established this is impossible for sum of squares algorithms [2].

When studying simple generative models for graphs, such as or planted independent set models, there is a risk of coming up with algorithms that perform well in the model, but that are an overfit for it. For example picking the vertices of lowest degree is a good way to find a size independent set in the planted model (if ) but it would usually not perform well in practice.

In order to validate the robustness of average-case analyses of algorithms, there has been interest in the study of semi-random generative models, in which a graph is generated via a combination of random choices and adversarial choices. Even though no simple probabilistic model can capture all the subtle properties of realistic graph distributions, realistic distributions can be captured by semi-random models if the way in which the realistic distribution differs from the simple probabilistic model is interpreted as the action of the adversary.

Moreover, by studying semi-random models we gain insight into what part of a problem governs its hardness. If an algorithm in a random graph solves a problem in polynomial time with high probability, then we can ask how adversarial we can make our graph while still solving it in polynomial time with high probability. For example, Feige and Kilian believed the planted independent set should be recoverable without regard for the edges that do not touch the vertices of the independent set, so sought algorithms that could find the maximum independent set when these edges were made adversarial [8]. In order to gain insight on what instances of unique games can be difficult, Kolla, Makarychev, and Makarychev created algorithms that solved unique games with high probability in a model where out of 4 given steps of creating a satisfiable instance, only 1 is randomized [12].

Semi-random generative models for graphs were first introduced by Blum and Spencer [3], and then further studied by Feige and Kilian [8].

In the Feige-Kilian model, one generates a graph with a planted size independent set as follows: a set of vertices is chosen at random; Then, edges from to are selected as in a model; finally, an adversary is allowed to choose arbitrarily the edges within vertices in , and they are allowed to add edges from to . Note that, when , the planted set need not be a largest independent set in the graph since, for example, the adversary could choose to create an independent set of size among the vertices in .

Feige and Kilian studied the complexity of finding an independent set of size at least in the graph arising from their model. They prove that, for , they can solve the problem in polynomial time if and for constant , and, if , the problem is not solvable in polynomial time, unless . Since then, progress has been made on weaker monotone semirandom versions of the problem [5, 6, 9]. Moreover, Coja-Oghlan generalized Feige and Kilian’s algorithm to sparse subgraphs [7] as opposed to independent sets. However, prior to this paper, there had been no algorithm that improved on the size of the independent set in the Feige-Kilian model.

Steinhardt [18] studied the recovery problem (that is, the problem of finding the vertices of the planted independent set) in a slight restriction of the Feige-Kilian model with , in which the adversary can choose edges arbitrarily within but cannot add edges between and . Although the problem of recovering seems to be information-theoretically impossible when , Steinhardt studies a “list-decoding” version of the problem in which the goal is to output a collection of sets one of which is the planted independent set (or to output the planted independent set given a random vertex sampled from ). Steinhardt shows that the problem has an information-theoretic threshold when is order , and, along with Charikar and Valiant, gives a polynomial time recovery algorithm when [4].

In this paper, we provide a deterministic polynomial time algorithm for the recovery problem in the Feige-Kilian model (in which the adversary is allowed to add edges between and ) that works with high probability for . Along with this being the only known bound for a model this general since Feige and Kilian’s original randomized algorithm for an independent set of size where is a constant, we believe that our methods are more concise and easier to explain, and we are hopeful they will be used for future robust algorithms.

While Steinhardt relied on spectral techniques, we use semidefinite programming (SDP). The improved robustness comes from the robustness of the SDP technique, and the logarithmic gain comes from an analysis of the SDP via the Grothendieck inequality, which tends to give tighter information about the properties of random graphs than spectral bounds obtained from matrix Chernoff bounds or related techniques.

A natural way to apply SDP techniques in this setting would be to solve an SDP relaxation of the maximum independent set problem on the given graph. This, however, would not work, because the adversary could create a large set with few edges in and the optimum of the relaxation could be related to this other set and carry no information about .

Instead, we use a “crude SDP” (C-SDP), a technique used by Kolla, Makarychev and Makarychev in their work on semi-random Unique Games [12] and by Makarychev, Makarychev and Vijayaraghavan [16] in their later work on semi-random cut problems. Crude SDPs are not relaxations of the problem of interest and, in particular, there is no standard way of mapping an intended solution (in our case, the set ) to an associated canonical feasible solution of the SDP. Rather, the crude SDP is designed in such a way that the optimal solution reveals information about the planted solution.

Our crude SDP will associate a unit vector to each vertex, with the constraint (like in the theta function relaxation) that non-adjacent vertices are mapped to orthogonal vectors; the goal of the SDP is to minimize the sum of distances-squared among all pairs of vectors. The point of the analysis will be that, with high probability over the choice of the graph, and for every possible choice of the adversary, the optimal solution of the SDP will map the vertices in

to vectors that are fairly close to one another, and then can be recovered by looking for sets of vertices whose associated vectors are clustered.

We prove this via an argument by contradiction: if a solution does not cluster the vectors corresponding to the vertices in close together, then we can construct a new feasible solution of lower cost, meaning that the original solution was not optimal. To bound the cost of the new solution we need to understand the sum of distances-squared, according to the original solution, between pairs of vertices in . This is where we use the Grothendieck inequality: to reduce this question to a purely combinatorial question that can be easily solved using Chernoff bounds and union bounds.

We can interpret the crude SDP as the relaxation of a global problem that gives us the local property we are looking for. Call a partition of . The C-SDP for independent set is a relaxation of

The C-SDP for unique games of colors can be interpreted as a relaxation of

where corresponds to the number of satisfied constraints in coloring .

Finally, the C-SDP of the small set expansion for small set of size is equivalent to

In each of these scenarios, by using the C-SDP we are arguing that with high probability,

  1. The C-SDP gives a tight relaxation of the global property.

  2. The global property gives the local property we are looking for.

Our results are slightly easier to describe if we refer to the independent set problem instead of the clique problem; moreover this is the setting in the problem as originally stated by Feige and Kilian. The two settings are interchangeable by simply changing “edges” with “non-edges”.

2 Crude Semidefinite Programming

An independent set of a graph is a subset such that the subgraph induced by does not contain any edges. We form our semi-random graph as follows, using the same formulation as Feige and Kilian [8]. Here .

  1. An adversary chooses a set such that .

  2. Create a graph , where each pair of vertices forms an edge independently with probability . is formulated as follows.

  3. The adversary can add any edge arbitrarily as long as Our graph will be of the form where is an independent set in and .

This gives us a graph that is arbitrary on , has no edges within

, and is lower bounded by Bernoulli random variables on the boundary

.

Our goal is to find the set . Our main result is the following.

Theorem 2.1.

There exists a constant such that, when , we can, with high probability, return at most candidate solutions such that one is the original independent set. In particular, we are able to find an independent set of size at least .

When considering this problem, we compare it to other random models where the adversary has a large amount of power, such as [12] or [17], which focus on unique games and graph cuts, respectively.

One issue is that the amount of randomness we can use is much less than in these models. In the cut semi-random model, the adversary has control over only half of the edges. In the independent set model, of the edges are adversarial.

Another issue becomes evident when we consider the following relaxation of the maximum independent set problem, which is a formulation of the Lovász theta function. This function can retrieve the independent set in more restricted planted models [9].

The intended solution to this problem is to map all vectors in to the same vector of norm and all other vectors to 0. However the adversary can arrange the graph so that, in the optimum of the theta function, the vectors corresponding to , and to many vertices of , are mapped to zero, and only a small subset of, for example, vertices of are mapped to non-zero vectors. Then such a solution does not seem to give us any useful information to find .

Instead of taking a traditional SDP relaxation, our idea is to take an SDP for which we would expect to have the vectors corresponding to the vertices of clustered together away from the other vertices. The way we do this is by using a “crude SDP” (C-SDP), an idea used in [12] and [16].

Our crude SDP is as follows.

This is somewhat similar to Lovász Theta Function but is not a relaxation of the maximum independent set. In fact, even

will be much larger than , where is the vector corresponding to in the optimal solution of the C-SDP. However, all vectors have norm , and it seems reasonable to imagine that the vectors corresponding to will be close.

The C-SDP will give us the following clustering.

Lemma 2.2.

With high probability,

We will prove this in the following section.

3 C-SDP Clustering

Lemma 3.1.

Call the vector corresponding to for the optimal solution to our C-SDP. Then

Proof.

Consider the feasible solution to the SDP obtained by taking the optimal solution, then setting all vectors corresponding to to a single unit vector orthogonal to all other vectors. We keep all vectors corresponding to vertices in as the same as the optimal solution. Call the vector for our new adjusted solution corresponding to vertex . We then have

Our next step is to show that the second sum in Lemma 3.1 is large. Towards this end we show the following.

Lemma 3.2.

With high probability, for the initial random choice of edges ,

and

Proof.

First, let’s break the terms down into inner products. Call

We obtain that

where is the entry of the adjacency matrix corresponding to the vertex pair and the edge set .

We evaluate each of these two parts separately. For the first part, note that is the sum of independent Bernoulli 0-1 random variables, each edge appearing independently with probability . Therefore by Chernoff bounds,

We now turn towards the second half of the bound on . We define a new matrix such that

We have

We then have that there exists a constant such that

by Grothendiek’s inequality ([10], see for example [11]). For a fixed set of ,

Each entry corresponding to is a Bernoulli random 0-1 variable. Therefore, once again by Chernoff bounds

There are possibilities for assignments of . We use a union bound to say that

Therefore

Set and . Then with high probability

Proof of Lemma 2.2.

By combining Lemma 3.1 and Lemma 3.2

Note that this argument works even when the adversary adds edges to the boundary, as the vertex pairs corresponding to will approximate half the overall sum with high probability regardless of whether the other vertex pairs correspond to edges or not. Our argument only requires that the vertex pairs corresponding to correspond to edges, meaning we cannot remove edges from the boundary, but we can add them.

4 Algorithm Analysis and Recovery

Proof of Theorem 2.1.

Our algorithm is as follows

  • Solve the crude SDP.

  • For each vector, create a set of all vectors that are distance less than 1 from the original vector. Namely we take the ball of radius one around the vector and list all vectors inside the ball.

  • Add to the set all vertices that are independent with all vertices already in the set.

  • Return the largest such set.

Using Lemma 2.2, we have that with high probability there exists a constant such that

for the optimal solution of our SDP. Therefore there is some vertex such that

By Markov’s inequality we have

There exists a constant such that when , there will be some vertex where at least vertices in lie within a ball of radius 1 around . For any vertex , the probability shares no edges with these elements is upper bounded by the probability that it shares at most edges with . The probability this happens is at most , by Chernoff bounds. Taking a union bound over all such vertices,

We find, with high probability, all vertices in share at least one edge with the elements from in this ball. Hence, no vertices outside will be included in the set corresponding to , as orthogonal vectors are distance 2 away. Since only elements from will be present, the remaining vertices will be added during the greedy step. Therefore, when the algorithm terminates, this set will contain the original planted independent set with high probability. ∎

Corollary 4.1.

If edges in are added to with probability as opposed to , with high probability we can recover a set of size at least when .

Proof.

The proof is exactly the same as above, except we instead show that

The rest of the argument follows similarly. ∎

This means that our method is only of use when

If we are given a vertex at random, such as in the model of [18], then we can recover the original set exactly.

Theorem 4.2.

If we are given a random vertex of , then with high probability, we can recover the set when .

Proof.

We add the following steps to the algorithm:

  • Remove all independent sets of size less than from our list.

  • For sets on the list, if and then remove .

  • For sets on the list, if and , remove both and from the list.

  • If our random vertex is in exactly 1 set on our list, return this set. Otherwise, return FAIL.

First we will show that with high probability remains on the list by the end of the algorithm. If is on the list before the first removal step, it is necessarily maximal by the greedy step. Therefore if , then such that . For to be an independent set, there can be no edges from to , meaning with high probability , and is not removed in the second removal step.

For on the list immediately after the second removal step, we have so . If , then there is such that and .

For any such set , the probability that is an independent set is at most . The probability that an independent exists is

meaning that survives the second removal with high probability.

If represents the number of unique independent sets on the list at the end of the algorithm, then by the inclusion exclusion principle,

as there are elements overall. From this we can see that we must have

for large enough . The number of elements of that will appear in other independent sets in our list is at most . Therefore the probability a random vertex of is in any of the other sets remaining is at most

. ∎

References

  • [1] N. Alon, M. Krivelevich, and R. Sudakov. Finding a large hidden clique in a random graph. Random Structures and Algorithms, 13(3-4): pp. 457-466, 1998.
  • [2] B. Barak, S. B. Hopkins, J. Kelner, P. Kothari, A. Moitra, and A. Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. Foundations of Computer Science, 428-437, 2016.
  • [3] A. Blum and J. H. Spencer. Coloring random and semi-random k-colorable graphs. Journal of Algorithms, 19(2) pp. 204-234, 1995.
  • [4] M. Charikar, J. Steinhardt, and G. Valiant. Learning from untrusted data.

    Proceedings of the Forty-Ninth Annual ACM Symposium on Theory of Computing

    , pp. 2017.
  • [5] Y. Chen, S. Sanghavi, and H. Xu. Improved graph clustering. IEEE Transactions on Information Theory, 60(10) pp. 6440-6455, 2014.
  • [6] A. Coja-Oghlan. Coloring semirandom graphs optimally. Automata, Languages and Programming, pp. 71-100, 2004.
  • [7] A. Coja-Oghlan. Solving NP-hard semirandom graph problems in polynomial expected time. Journal of Algorithms, 62(1) pp. 19-46, 2007.
  • [8]

    U. Feige and J. Kilian. Heuristics for semirandom graph problems.

    Journal of Computer and System Sciences 63(4) pp. 639-671, 2001.
  • [9] U. Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Structures and Algorithms 16(2) pp. 195-208, 2000.
  • [10] A. Grothendieck. Résumé de la théorie métrique des produits tensoriels topologiques. Bol. Soc. Mat. São Paulo, 8 pp. 1-79, 1953.
  • [11]

    S. Khot and A. Naor. Grothendieck-type inequalities in combinatorial optimization.

    Communications on Pure and Applied Mathematics Volume 65, Issue 7, pp. 992-1035, 2012.
  • [12] A. Kolla, K. Makarychev, and K. Makarychev. How to play unique games against a semi-random adversary. In Proceedings of the Fifty-Second Annual IEEE Symposium on Foundations of Computer Science. pp. 443-452, 2011.
  • [13] R. M. Karp. Reducibility among combinatorial problems, In Complexity of computer computations,.pp. 85-103, R. E. Miller and J. W. Thatcher (eds.), Plenum Press, New York, 1972.
  • [14] L. Kučera. Expected complexity of graph partitioning problems, Discrete Applied Mathematics 57(2-3) pp. 193-212, 1995.
  • [15] D. Matula. The largest clique size in a random graph. Technical Report, Department of Computer Science, Southern Methodist University, 1976.
  • [16] K. Makarychev, Y. Makarychev, and A. Vijayaraghavan. Approximation algorithms for semi-random graph partitioning problems. In Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing, pp. 367-384, 2012.
  • [17] K. Makarychev, Y. Makarychev, and A. Vijayaraghavan. Constant factor approximation for balanced Cut in the PIE model. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, pp. 41-49, 2014.
  • [18] J. Steinhardt. Does robustness imply tractability? A lower bound for planted clique in the semi-random model. arXiv:1704.05120, 2017.
  • [19] D. Zuckerman. Linear degree extractors and the inapproximability of max clique and chromatic number. In Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, pp. 681-690, 2006.