Covering with Clubs: Complexity and Approximability

06/04/2018 ∙ by Riccardo Dondi, et al. ∙ Université Paris-Dauphine 0

Finding cohesive subgraphs in a network is a well-known problem in graph theory. Several alternative formulations of cohesive subgraph have been proposed, a notable example being s-club, which is a subgraph where each vertex is at distance at most s to the others. Here we consider the problem of covering a given graph with the minimum number of s-clubs. We study the computational and approximation complexity of this problem, when s is equal to 2 or 3. First, we show that deciding if there exists a cover of a graph with three 2-clubs is NP-complete, and that deciding if there exists a cover of a graph with two 3-clubs is NP-complete. Then, we consider the approximation complexity of covering a graph with the minimum number of 2-clubs and 3-clubs. We show that, given a graph G=(V,E) to be covered, covering G with the minimum number of 2-clubs is not approximable within factor O(|V|^1/2 -ε), for any ε>0, and covering G with the minimum number of 3-clubs is not approximable within factor O(|V|^1 -ε), for any ε>0. On the positive side, we give an approximation algorithm of factor 2|V|^1/2^3/2 |V| for covering a graph with the minimum number of 2-clubs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The quest for modules inside a network is a well-known and deeply studied problem in network analysis, with several application in different fields, like computational biology or social network analysis. A highly investigated problem is that of finding cohesive subgroups inside a network which in graph theory translates in highly connected subgraphs. A common approach is to look for cliques (i.e. complete graphs), and several combinatorial problems have been considered, notable examples being the Maximum Clique problem ([garey, GT19]), the Minimum Clique Cover problem ([garey, GT17]), and the Minimum Clique Partition problem ([garey, GT15]). This last is a classical problem in theoretical computer science, whose goal is to partition the vertices of a graph into the minimum number of cliques. The Minimum Clique Partition problem has been deeply studied since the seminal paper of Karp [DBLP:conf/coco/Karp72], studying its complexity in several graph classes  [DBLP:journals/dam/CerioliFFMPR08, DBLP:journals/ita/CerioliFFP11, DBLP:journals/algorithmica/PirwaniS12, DBLP:journals/gc/DumitrescuP11].

In some cases, asking for a complete subgraph is too restrictive, as interesting highly connected graphs may have some missing edges due to noise in the data considered or because some pair may not be directly connected by an edge in the subgraph of interest. To overcome this limitation of the clique approach, alternative definitions of highly connected graphs have been proposed, leading to the concept of relaxed clique [DBLP:journals/algorithms/Komusiewicz16]. A relaxed clique is a graph whose vertices satisfy a property which is a relaxation of the clique property. Indeed, a clique is a subgraph whose vertices are all at distance one from each other and have the same degree (the size of the clique minus one). Different definitions of relaxed clique are obtained by modifying one of the properties of clique, thus leading to distance-based relaxed cliques, degree-based relaxed cliques, and so on (see for example [DBLP:journals/algorithms/Komusiewicz16]).

In this paper, we focus on a distance-based relaxation. In a clique all the vertices are required to be at distance at most one from each other. Here this constraint is relaxed, so that the vertices have to be at distance at most , for an integer . A subgraph whose vertices are all distance at most is called an -club (notice that, when , an -club is exactly a clique). The identification of -clubs inside a network has been applied to social networks [Mokken79, SociometricClique, DBLP:journals/snam/LaanMM16, DBLP:journals/snam/MokkenHL16, DBLP:conf/biostec/ZoppisDSCSM18], and biological networks [DBLP:journals/jco/BalasundaramBT05]. Interesting recent studies have shown the relevance of finding -clubs in a network [DBLP:journals/snam/LaanMM16, DBLP:journals/snam/MokkenHL16], in particular focusing on finding -clubs in real networks like DBLP or a European corporate network.

Contributions to the study of -clubs mainly focus on the Maximum s-Club problem, that is the problem of finding an -club of maximum size. Maximum s-Club is known to be NP-hard, for each  [DBLP:journals/eor/BourjollyLP02]. Even deciding whether there exists an -club larger than a given size in a graph of diameter is NP-complete, for each  [DBLP:journals/jco/BalasundaramBT05]. The Maximum s-Club problem has been studied also in the approximability and parameterized complexity framework. A polynomial-time approximation algorithm with factor for every on an input graph has been designed [Asahiro2017]. This is optimal, since the problem is not approximable within factor , on an input graph , for each and  [Asahiro2017]. As for the parameterized complexity framework, the problem is known to be fixed-parameter tractable, when parameterized by the size of an -club  [DBLP:journals/ol/SchaferKMN12, DBLP:journals/dam/KomusiewiczS15, DBLP:journals/computing/ChangHLS13]. The Maximum s-Club problem has been investigated also for structural parameters and specific graph classes [DBLP:journals/jgaa/HartungKN15, DBLP:journals/dam/GolovachHKR14].

In this paper, we consider a different combinatorial problem, where we aim at covering the vertices of a network with a set of subgraphs. Similar to Minimum Clique Partition, we consider the problem of covering a graph with the minimum number of -clubs such that each vertex belongs to an -club. We denote this problem by , and we focus in particular on the cases and . We show some analogies and differences between and Minimum Clique Partition. We start in Section 3 by considering the computational complexity of the problem of covering a graph with two or three -clubs. This is motivated by the fact that Clique Partition is known to be in P when we ask whether there exists a partition of the graph consisting of two cliques, while it is NP-hard to decide whether there exists a partition of the graph consisting of three cliques [DBLP:journals/tcs/GareyJS76]. As for Clique Partition, we show that it is NP-complete to decide whether there exist three -clubs that cover a graph. On the other hand, we show that, unlike Clique Partition, it is NP-complete to decide whether there exist two -clubs that cover a graph. These two results imply also that and do not belong to the class XP for the parameter ”number of clubs” in a cover.

Then, we consider the approximation complexity of and . We recall that, given an input graph , Minimum Clique Partition is not approximable within factor , for any , unless  [DBLP:journals/toc/Zuckerman07]. Here we show that has a slightly different behavior, while is similar to . Indeed, in Section 4 we prove that is not approximable within factor , for any , unless , while is not approximable within factor , for any , unless . In Section 5, we present a greedy approximation algorithm that has factor for , which almost match the inapproximability result for the problem. We start the paper by giving in Section 2 some definitions and by formally defining the problem we are interested in.

2 Preliminaries

Given a graph and a subset , we denote by the subgraph of induced by . Given two vertices , the distance between and in , denoted by , is the length of a shortest path from to . The diameter of a graph is the maximum distance between two vertices of . Given a graph and a vertex , we denote by the set of neighbors of , that is . We denote by the close neighborhood of , that is . Define , with . Given a set of vertices and , with , define . We may omit the subscript when it is clear from the context. Now, we give the definition of -club, which is fundamental for the paper.

Definition 1

Given a graph , and a subset , is an -club if it has diameter at most .

Notice that an -club must be a connected graph. We present now the formal definition of the problem we are interested in.

()
Input: a graph and an integer .
Output: a minimum cardinality collection such that, for each with , , is an -club, and, for each vertex , there exists a set , with , such that .

We denote by , with , the decision version of that asks whether there exists a cover of consisting of at most -clubs.

Notice that while in Minimum Clique Partition we can assume that the cliques that cover a graph partition , hence the cliques are vertex disjoint, we cannot make this assumption for . Indeed, in a solution of , a vertex may be covered by more than one -club, in order to have a cover consisting of the minimum number of -clubs. Consider the example of Fig. 1. The two -clubs induced by and cover , and both these -clubs contain vertex . However, if we ask for a partition of , we need at least three -clubs. This difference between Minimum Clique Partition and is due to the fact that, while being a clique is a hereditary property, this is not the case for being an -club. If a graph is an -club, then a subgraph of may not be an -club (for example a star is a -club, but the subgraph obtained by removing its center is not anymore a -club).

Figure 1: A graph and a cover consisting of two -clubs (induced by the vertices in the ovals). Notice that the -clubs of this cover must both contain vertex .

3 Computational Complexity

In this section we investigate the computational complexity of and and we show that , that is deciding whether there exists a cover of a graph with three -clubs, and , that is deciding whether there exists a cover of a graph with two -clubs, are NP-complete.

3.1 is NP-complete

In this section we show that is NP-complete by giving a reduction from the problem, that is the problem of computing whether there exists a partition of a graph in three cliques. Consider an instance of , we construct an instance of (see Fig. 2). The vertex set is defined as follows:

The set of edges is defined as follows:

Before giving the main results of this section, we prove a property of .

Lemma 1

Let be an instance of and let be the corresponding instance of . Then, given two vertices and the corresponding vertices :

  • if , then

  • if , then

Proof

Notice that . It follows that if and only if there exists a vertex (or ), which is adjacent to both and . But then, by construction, if and only if . ∎

We are now able to prove the main properties of the reduction.

Lemma 2

Let be a graph input of and let be the corresponding instance of . Then, given a solution of on , we can compute in polynomial time a solution of on .

Proof

Consider a solution of on , and let , , be the sets of vertices of that partition . We define a solution of on as follows. For each , with , define

We show that each , with , is a -club. Consider two vertices , with . Since they correspond to two vertices that belong to a clique of , it follows that and . Thus . Now, consider the vertices , with , and , with . If or , assume w.l.o.g. , then by construction . Assume that and (assume w.l.o.g. that ), since , it follows that . Since , it follows that . By construction, there exist edges , in , thus implying that . Finally, consider two vertices , with and . Then, by construction, and . But then, belongs to , and, by construction, and . It follows that .

We conclude the proof observing that, by construction, since partition , it holds that , thus , , covers . ∎

Figure 2: An example of a graph input of and the corresponding graph input of .

Based on Lemma 1, we can prove the following result.

Lemma 3

Let be a graph input of and let be the corresponding instance of . Then, given a solution of on , we can compute in polynomial time a solution of on .

Proof

Consider a solution of on consisting of three -clubs , , . Consider a -club , with . By Lemma 1, it follows that, for each , . As a consequence, we can define three cliques , , in as follows. For each , with , is defined as:

Next, we show that , with , is indeed a clique. By Lemma 1 if then it holds , thus by construction and is a clique in . Moreover, since , then . Notice that , , may not be disjoint, but, starting from , , , it is easy to compute in polynomial time a partition of in three cliques. ∎

Now, we can prove the main result of this section.

Theorem 3.1

is NP-complete.

Proof

By Lemma 2 and Lemma 3 and from the NP-hardness of  [DBLP:conf/coco/Karp72], it follows that is NP-hard. The membership to NP follows easily from the fact that, given three -clubs of , it can be checked in polynomial time whether they are -clubs and cover all vertices of . ∎

3.2 is NP-complete

In this section we show that is NP-complete by giving a reduction from a variant of called . Recall that a literal is positive if it is a non-negated variable, while it is negative if it is a negated variable.

Given a collection of clauses over the set of variables , where each , with , contains exactly five literals and does not contain both a variable and its negation, asks for a truth assignment to the variables in such that each clause , with , is double-satisfied. A clause is double-satisfied by a truth assignment to the variables if there exist a positive literal and a negative literal in that are both satisfied by . Notice that we assume that there exist at least one positive literal and at least one negative literal in each clause , with , otherwise cannot be doubled-satisfied. Moreover, we assume that each variable in an instance of appears both as a positive literal and a negative literal in the instance. Notice that if this is not the case, for example a variable appears only as a positive literal, we can assign a true value to the variable, as defining an assignment to false does not contribute to double-satisfy any clause. First, we show that is NP-complete, which may be of independent interest.

Theorem 3.2

is NP-complete.

Proof

We reduce from , where given a set of variables and a set of clauses, which are a disjunction of 3 literals (a variable or the negation of a variable), we want to find an assignment to the variables such that all clauses are satisfied. Moreover, we assume that each clause in does not contain a positive variable and its negation , since such a clause is obviously satisfied by any assignment. The same property holds also for the instance of we construct.

Consider an instance of , we construct an instance of as follows. Define , where and is defined as follows:

The set of clauses is defined as follows:

where , are defined as follows. Consider , where , with is a literal, that is a variable (a positive literal) or a negated variable (a negative literal), the two clauses and are defined as follows:

We claim that is satisfiable if and only if is double-satisfiable.

Assume that is satisfiable and let be an assignment to the variables on that satisfies . Consider a clause in , with . Since it is satisfied by , it follows that there exists a literal of , with , that is satisfied by . Define an assignment on that is identical to on and, if is positive, then assigns value false to both and , if is negative, then assigns value true to both and . It follows that both and are double-satisfied by .

Assume that is double-satisfied by an assignment . Consider two clauses and , with , that are double-satisfied by , we claim that there exists at least one literal of and not in which is satisfied. Assume this is not the case, then, if is double-satisfied, it follows that is true and is false, thus implying that is not double-satisfied. Then, an assignment that is identical to restricted to satisfies each clause in .

Now, since is NP-complete [DBLP:conf/coco/Karp72], it follows that is NP-hard. The membership to NP follows from the observation that, given an assignment to the variables on , we can check in polynomial-time whether each clause in is double-satisfied or not. ∎

Let us now give the construction of the reduction from to . Consider an instance of consisting of a set of clauses over set of variables. We assume that it is not possible to double-satisfy all the clauses by setting at most two variables to true or to false (this can be easily checked in polynomial-time).

Before giving the details, we present an overview of the reduction. Given an instance of , for each positive literal , with , we define vertices , and for each negative literal , with , we define a vertex . Moreover, for each clause , with , we define a vertex . We define other vertices to ensure that some vertices have distance not greater than three and to force the membership to one of the two -clubs of the solution (see Lemma 4). The construction implies that for each with , and belong to different -clubs (see Lemma 5); this corresponds to a truth assignment to the variables in . Then, we are able to show that each vertex belongs to the same -club of a vertex , with , and of a vertex , with , adjacent to (see Lemma 7); these vertices correspond to a positive literal and a negative literal , respectively, that are satisfied by a truth assignment, hence is double-satisfied.

Now, we give the details of the reduction. Let be an instance of , we construct an instance of as follows (see Fig. 3). The vertex set is defined as follows:

The edge set is defined as follows:

We start by proving some properties of the graph .

Lemma 4

Consider an instance of and let be the corresponding instance of . Then, (1) , (2) , (3) , for each with , and (4) , .

Proof

We start by proving (1). Notice that any path from to must pass through , or . Each of , or is adjacent to vertices , and , with (in addition to ), and none of these vertices is adjacent to , thus concluding that . Moreover, observe that for each vertex , with , there exists a vertex , with , or , with , that is adjacent to , with , thus , for each with . As a consequence of (1), it follows that (2) holds, that is . Since , for each with , it holds (3) .

Finally, we prove (4). Notice that and that none of the vertices in is adjacent to and , thus . ∎

Figure 3: Schematic construction for the reduction from to .

Consider two sets and , such that and are two -clubs of that cover . As a consequence of Lemma 4, it follows that and are in exactly one of , , w.l.o.g. , while , , and , for each with , belong to and not to .

Next, we show a crucial property of the graph built by the reduction.

Lemma 5

Given an instance of , let be the corresponding instance of . Then, for each with , .

Proof

Consider a path of minimum length that connects and , with . First, notice that, by construction, the path after must pass through one of these vertices: , , or , with .

We consider the first case, that is the path after passes through . Now, the next vertex in is either or , with . Since both and are not adjacent to , it follows that in this case the path has length greater than three.

We consider the second case, that is the path after passes through . Now, after , passes through either or , with . Since both and are not adjacent to , it follows that in this case the path has length greater than three.

We consider the third case, that is the path after passes through . Now, the next vertex of is either or or , with and . Since , and are not adjacent to , it follows that in this case the path has length greater than three.

We consider the last case, that is the path after passes through , with . We have assumed that and do not belong to the same clause, thus by construction is not incident in . It follows that after , the path must pass through either or , with , or , and . Once again, since , and are not adjacent to , it follows that also in this case the path has length greater than three, thus concluding the proof. ∎

Now, we are able to prove the main results of this section.

Lemma 6

Given an instance of , let