 # Faster parameterized algorithm for Cluster Vertex Deletion

In the Cluster Vertex Deletion problem the input is a graph G and an integer k. The goal is to decide whether there is a set of vertices S of size at most k such that the deletion of the vertices of S from G results a graph in which every connected component is a clique. We give an algorithm for Cluster Vertex Deletion whose running time is O^*(1.811^k).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A graph is called a cluster graph if every connected component of is a clique (i.e., a complete graph). A set of vertices in a graph is called a cluster deletion set of if deleting the vertices of from results a cluster graph. In the Cluster Vertex Deletion problem the input is a graph and an integer . The goal is to decide whether there is a cluster deletion set of size at most .

Note that a graph is a cluster graph if and only if does not contain an induced path of size 3. As Cluster Vertex Deletion is equivalent to the problem of finding whether there is a set of vertices of size at most that hits every induced path of size 3 in , the problem can be solved in -time . A faster -time algorithm for this problem was given by Gramm et al. . The next improvement on the parameterized complexity of the problem came from results on the more general 3-Hitting Set problem [4, 12]. The currently fastest parameterized algorithm for 3-Hitting Set runs in time , and therefore Cluster Vertex Deletion can be solved within this time. Later, Hüffner et al.  gave an -time algorithm for Cluster Vertex Deletion based on iterative compression. Finally, Boral et al.  gave an -time algorithm.

In a recent paper, Fomin et al.  showed a general approach for transforming a parameterized algorithm to an exponential-time algorithm for the non-parameterized problem. Using this method on the algorithm of Boral et al. gives an -time algorithm for Cluster Vertex Deletion. This improves over the previously fastest exponential-time algorithm for this problem . A related problem to Cluster Vertex Deletion is the 3-Path Vertex Cover problem. In this problem, the goal is to decide whether there is a set of vertices of size at most that hits every path of size 3 in . Algorithms for 3-Path Vertex Cover were given in [11, 13, 9, 3, 14, 10]. The currently fastest algorithm for this problem has running time .

In this paper we give an algorithm for Cluster Vertex Deletion whose running time is . Using our algorithm with the method of  gives an -time algorithm for Cluster Vertex Deletion.

Our algorithm is based on the algorithm for Boral et al. . The algorithm of Boral et al. works as follows. The algorithm chooses a vertex and then constructs a family of sets, where each set hits all the induced paths in that contain . Then, the algorithm branches on the constructed sets. In order to analyze the algorithm, Boral et al. used a Python script for automated analysis of the possible cases that can occur in the subgraph of induced by the vertices with distance at most 2 from

. For each case, the script generates a branching vector and computes the branching number. Our improvement is achieved by first making several simple but crucial modifications to the algorithm of Boral et al. Then, we modify the Python script by adding restrictions on the cases the algorithm can generate. Finally, we manually examine the four hardest cases and for each case we either show that the case cannot occur, or give a better branching vector for the case.

## 2 Preleminaries

Let be a graph. For a vertex in a , is the set of neighbors of , , and is the set of all vertices with distance exactly from . For a set of vertices , is the subgraph of induced by (namely, ). We also define . For a set that consists of a single vertex , we write instead of .

An -star is a graph with vertices and edges . The vertex is called the center of the star, and the vertices are called leaves.

A vertex cover of a graph is a set of vertices such that every edge of is incident on at least one vertex of .

## 3 The graph Hv

Let be a vertex of . We define a graph as follows. The vertices of are , where and . For and , there is an edge in if and only if is an edge in . Additionally, for , there is an edge in if and only if is not an edge in . Note that is an independent set in . We will omit the superscript when the graph is clear from the context.

Two vertex covers of are equivalent if and . We say that a vertex cover of dominates a vertex cover if , , and are not equivalent. An equivalence class of vertex covers is called dominating if for every vertex cover , there is no vertex cover that dominates , and there is no proper subset of which is a vertex cover. and for every nonempty , there is no vertex cover that is equivalent to . A dominating family of is a family of vertex covers of such that contains a vertex cover from each dominating equivalence class. The algorithm of Boral et al. is based on the following Lemma.

###### Lemma 1 (Boral et al. ).

Let be a dominating family of . There is a cluster deletion set of of minimum size such that either or there is such that .

A family of vertex covers of is called -dominating if there is a dominating family such that . From Lemma 1 we obtain the following simple branching algorithm for Cluster Vertex Deletion. Given an instance , choose a vertex and compute a -dominating family of . Then, recursively run the algorithm on the instance (corresponding to a cluster deletion set that contains ) and on the instances for every . In Section 5 we will give a more complex algorithm based on this idea.

A connected component of is called a seagull if is a 2-star whose center is in and its leaves are in . A subgraph of is called an -skien if contains seagulls, and the remaining connected components of are isolated vertices. If is an -skien we also say that is a skien.

## 4 Algorithm for finding a dominating family

This section describes an algorithm, denoted , for constructing a -dominating family of . The algorithm is a recursive branching algorithm, and it is based on the algorithm from . Let denote the input to the algorithm. When we say that the algorithm recurses on , the algorithm performs the following lines. .

Given an input , the algorithm applies the first applicable rule from the rules below.

VC.1  If , return an empty list.

VC.2  If does not have edges, return a list with a single element which is an empty set.

VC.3  If there is a vertex such that , recurse on , where be the unique neighbor of .

VC.4  If is a cycle in such that for all and , recurse on

is odd

}.

VC.5  If is an even cycle in such that for all , for odd , and for even , recurse on .

VC.6  If contains vertices of degree at least 3, choose a vertex as follows. Let be the maximum degree of a vertex in . If there is a vertex with degree in , let be such vertex. Otherwise, is a vertex with degree in . Branch on and on .

Note that if Rules VC.1–VC.4 cannot be applied, every vertex in has degree 2 and every vertex in has degree 1 or 2. Additionally, every connected component in is an induced path.

VC.7  If is not an independent set, let be a connected component of with minimum size among the connected components of size at least 2. is a path , and let and be the unique neighbors of and in , respectively. Branch on and .

We note that the reason we choose a connected component with minimum size is to simplify the analysis. The algorithm does not depend on this choice.

If Rules VC.1–VC.4 cannot be applied, every connected component of is an induced path such that if is odd and if is even.

VC.8  Otherwise, let be a connected component of with maximum size. Branch on and .

Note that the branching vectors of the branching rules of the algorithm are at least . The branching vector occurs only when is a skein. In this case, the algorithm applies Rule VC.4 on some seagull.

The differences between the algorithm in this section and the algorithm in  are as follows.

1. In Rule VC.4, the algorithm of  chooses an arbitrary vertex with degree at least 3.

2. Rule VC.4 is different than the corresponding rule in .

3. Rule VC.4 does not appear in .

## 5 The main algorithm

In this section we describe the algorithm for Cluster Vertex Deletion.

We say that vertices and are twins if . Note that is a twin of if and only if is an isolated vertex in . Let be a set containing and all its twins.

###### Lemma 2.

If are twins then for every cluster deletion set of of minimum size, if and only if .

###### Proof.

Suppose conversely that there is a cluster deletion set of of minimum size such that, without loss of generality, and . Let . We claim that is a cluster deletion set. Suppose conversely that is not a cluster deletion set. Therefore, there is an induced path of size 3 in . Since is a cluster deletion set, must contain . Since and are twins, does not contain . Therefore, replacing with gives an induced path , and is also an induced path in , a contradiction to the assumption that is a cluster deletion set. Therefore, is a cluster deletion set. This is a contradiction to the assumption that is a cluster deletion set of minimum size. Therefore, the lemma is correct. ∎

The following lemma generalizes Lemma 9 in .

###### Lemma 3.

Let be a vertex cover of . There is a cluster deletion set of of minimum size such that either or .

###### Proof.

Let be a cluster deletion set of of minimum size and suppose that and otherwise we are done. By Lemma 2, . Let . We claim that is a cluster deletion set of . Suppose conversely that is not a cluster deletion set. Then, contains an induced path of size 3. contains exactly one vertex . Since and is a vertex cover of , we have that the connected component of in is a clique (by Lemma 6 in ) and this component contains . This is a contradiction, so is a cluster deletion set of . From the assumption we obtain that is a cluster deletion set of of minimum size. Since , the lemma is proved. ∎

Denote by the minimum size of a vertex cover of

###### Corollary 4.

If then there is a cluster deletion set of of minimum size such that .

The algorithm for Cluster Vertex Deletion is a branching algorithm. Let denote the input to the algorithm. We say that the algorithm branches on if for each , the algorithm tries to find a cluster deletion set that contains . More precisely, the algorithm performs the following lines. returns ‘yes’‘yes’ ‘no’.

Given an instance for Cluster Vertex Deletion, the algorithm first repeatedly applies the following reduction rules.

R1  If , return ‘no’.

R2  If is a cluster graph, return ‘yes’.

R3  If there is a connected component which is a clique, delete the vertices of .

R4  If there is a connected component such that there is a vertex for which is a cluster graph, delete the vertices of and decrease by 1.

R5  If there is a connected component such that the maximum degree of is 2, compute a cluster deletion set of of minimum size. Delete the vertices of and decrease by .

When the reduction rules cannot be applied, the algorithm chooses a vertex as follows. If the graph has vertices with degree 1, is a vertex with degree 1. Otherwise, is a vertex with maximum degree in . The algorithm then constructs the graph and computes a -dominating family of using algorithm of Section 4. Additionally, the algorithm decides whether is 1, 2, or at least 3 (this can be done in time). It then performs one of the following branching rules, depending on .

B1  If or ( and ), branch on every set in .

B2  If and , let be a vertex cover of size 2 of , and let be a vertex such that the connected component of in is not a clique. Construct a -dominating family of the graph using algorithm . Branch on every set in and on for every .

B3  If branch on and on every set in .

Note that if has degree 1, and therefore Rule B4 is applied.

The main difference between the algorithm in this section and the algorithm in  are as follows.

1. The algorithm of  chooses an arbitrary vertex .

2. The algorithm of  does not take advantage of twins. That is, in Rule B4, the algorithm of  branches on instead of .

3. Rule R4 does not appear in .

## 6 Analysis

In this section we analyze our algorithm.

Let be some parameterize algorithm on graphs. The run of the algorithm on an input can be represented by a recursion tree, denoted , as follows. The root of the tree corresponds to the call . If the algorithm terminates in this call, the root is a leaf. Otherwise, suppose that the algorithm is called recursively on the instances . In this case, the root has children. The -th child of is the root of the tree . The edge between and its -th child is labeled by . See Figure 1 for an example.

We define the weighted depth of a node to the sum of the labels of the edges on the path from the root to . For an internal node in , define the branching vector of , denoted , to be a vector containing the labels of the edges between and its children. Define the branching number of a vector to be the largest root of . We define the branching number of a node in to be the branching number of . The running time of the algorithm can be bounded by bounding the number of leaves in . The number of leaves in is , where is the maximum branching number of a node in the tree.

An approach for obtaining a better bound on the number of leaves in the recursion tree is to treat several steps of the algorithm as one step. This can be viewed as modifying the tree by contracting some edges. If is a nodes in and is a child of , contracting the edge means deleting the node and replacing every edge between and a child of with an edge . The label of is equal to the label of plus the label of . See Figure 1(c).

### 6.1 Analysis of the algorithm of Section 4

In order to analyze the algorithm of Section 5, we want to enumerate all possible recursion trees for the algorithm. However, since the number of recursion trees is unbounded, we will only consider a small part of the recursion tree, called top recursion tree. Suppose that we know that for some integer . Then, mark every node in with weighted depth less than . Additionally, if is a node with weighted depth whose branching vector is then mark all the descendants of with distance at most from . Now define the top recursion tree to be the subtree of induced by the marked vertices and their children. The labels of the edges of are modified as follows. If a node has a single child and the label of is for , change the label of the edge to . If has two children , let be the labels of the edges , respectively. If and , replace the label of with . If and , replace the labels of and with and , respectively. The reason for changing the labels of edges in the top recursion tree is that this reduces the number of possible top recursion trees.

We now show some properties of the tree when is a vertex with maximum degree in .

We define an ordering on the brancing vectors of the nodes of a top recursion tree. Define . This order corresponds to the order of the reduction and branching rules that generate these vectors. That is, a node in a top recursion tree has branching vector if the rule that algorithm applied in the corresponding recursive call is either VC.4, VC.4, or VC.4. If the branching vector is or then the algorithm applied Rule VC.4. If the branching vector is the algorithm applied Rule VC.4. If the branching vector is the algorithm applied Rule VC.4 or Rule VC.4. If the branching vector is the algorithm applied Rule VC.4 (note that in this case, the corresponding graph is a skien).

For the following lemmas, suppose that is a vertex with maximum degree in , and consider a top recursion tree for some and .

###### Lemma 5.

If are nodes in such that is a child of then .

###### Proof.

The lemma follows directly from the definition of algorithm and the definition of . ∎

For the next two properties of , we first give the following lemma.

###### Lemma 6.

Let be a vertex with maximum degree in . In the graph , a vertex with neighbors in has at least neighbors in .

###### Proof.

Let . Let be the number of neighbors of in (in the graph ). By the definition of , (in , has neighbors in and neighbors in ). Since , the lemma follows. ∎

###### Lemma 7.

If there is a node in with branching vector then the branching vector of the root of is either or .

###### Proof.

Suppose that the node corresponds to the recursive call . By definition, is a skein, so there is a vertex such that has two neighbors in in the graph . Since is a subgraph of , also has two neighbors in in the graph . By Lemma 6, , and the lemma follows from the definition of the algorithm and the definition of . ∎

###### Lemma 8.

If the branching vector of the root of is or then the branching vector of the left child of the root is not .

###### Proof.

Suppose conversely that the label of the left child of the root is . By definition, when algorithm is called on , it applies Rule VC.4, and let be the vertex of on which the rule is applied. From the assumption that the branching vector of the left child of the root is , is a skien. Denote by the vertices of the seagulls in . Every neighbor of (in ) which is not in has degree 1 (in ). Since Rule VC.4 was not applied on , every neighbor of (in ) which is not in is in . Therefore, consists of the centers of the seagulls of , and possibly and isolated vertices. Let be some center of a seagull in . In , have at most one neighbor in and at least two neighbors in , contradicting Lemma 6. We obtain that the label of the left child of the root is not . ∎

### 6.2 Analysis of the main algorithm

We now analyze the main algorithm. Our method is based on the analysis in  with some changes.

To simplify the analysis, suppose that Rule VC.4 returns a list containing an arbitrary vertex cover of (e.g. the set ). This change increases the time complexity of the algorithm. Thus, it is suffices to bound the time complexity of the modified algorithm.

To analyze the algorithm, we define a tree that represents the recursive calls to both and . Consider a node in , corresponding to a recursive call . Suppose that in the recursive call , the algorithm applies Rule B4. Recall that in this case, the algorithm branches on and on every set in . Denote . In the tree , has children . The label of the edge is , and the label of an edge is . The tree also contains the nodes . In , has two children and . The labels of the edges and are and , respectively. The node is the root of the tree . The nodes are the leaves of . See Figure 2 for an example. Figure 2: Example of the definition of TC(G,k). Figure (a) shows the node x and its children in TCVDalg(G,k). Suppose that the graph Hv in the corresponding call to CVDalg is the graph in Figure 1(a). In this case, Fv,k′ consists of four sets of size 5 and one set of size 6. Since vc(Hv)=5, the algorithm applies Rule B4 and branches on {v} and on every set in Fv,k′. Therefore, in the tree TCVDalg(G,k), x has 6 children y,x1,…,x5 and the labels of the edges between x and its children are 1,5,5,5,5,6. Figures (b) and (c) show the corresponding subtree in T(G,k) and TC(G,k), respectively. Note that the edges that were contracted are (x,x′) and all the edges of TVCalg(Hv,k′) except the edge between x5 and its parent. The branching vector of x in TC(G,k) is (1,5,5,5,5,5). Note that this branching vector is greater than (1,5,5,5,5,4), which is the branching vector obtained by adding an element 1 to (5,5,5,5,4), where the latter vector is the sequence of weighted depths of the leaves in the top recursion tree T4(Hv,k) in Figure 1(d).

Similarly, if in the recursive call that corresponds to the algorithm applies Rule B4, then the algorithm branches on every set in and on for every . Denote and . In the tree , has children . The label of an edge is , and the label of an edge is . In the tree , has two children and . The labels of the edges and are and , respectively. The node is the root of the tree , and are the leaves of this tree. The node is the root of the tree , and are the leaves of this tree.

Finally, suppose that in the recursive call that corresponds to , the algorithm applies Rule B4. In this case, the algorithm branches on every set in . In the tree , has children . In , the node is the root of the tree , and are the leaves of this tree.

Our goal is to analyze the number of leaves in . For this purpose, we perform edge contractions on to obtain a tree . Consider a node in that corresponds to a recursive call . Suppose that in the recursive call , the algorithm applies Rule B4. Using the same notations as in the paragraphs above, we contract the following edges: (1) The edge . (2) The edges of that are present in , where if and if . We note that the distinction between the cases and will be used later in the analysis. If in the recursive call the algorithm applies Rule B4, we contract the following edges. (1) The edges and . (2) The edges of that are present in . Note that we do not contract the edges of . If in the recursive call the algorithm applies Rule B4, we do not contract edges. Therefore, the branching vector of in this case is at least or at least .

The nodes in that correspond to nodes in are called primary nodes, and the remaining nodes are secondary nodes. Note that secondary nodes with two children have branching vectors that are at least

, and therefore their branching numbers are at most 1.619. The branching numbers of the primary nodes are estimated as follows. Let

be a primary node and suppose that in the corresponding recursive call , the algorithm applies Rule B4. Using the same notations as in the paragraphs above, the branching vector of is at least , where are the weighted depths of the leaves of the top recursion tree . See Figure 2. If in the recursive call , the algorithm applies Rule B4, then the branching vector of is at least or at least , where are the weighted depths of the leaves of the top recursion tree (this follows from the fact that in the node has branching vector of at least or at least ).

From the discussion above, we can bound the branching numbers of the nodes of as follows. Generate all possible top recursion trees . For each tree, compute the branching number of , where are the weighted depths of the leaves of the tree. Additionally, generate all possible top recursion trees for . For each tree, compute the branching number of , where are the weighted depths of the leaves of the tree. The maximum branching number computed is an upper bound on the branching numbers of the nodes of . Since the number of possible top recursion trees is relatively large, we used a Python script to generate these trees. The script uses Lemmas 5, 7, and 8 to reduce the number of generated trees. The five branching vectors with largest branching numbers generated by the script are given in Table 1. For the rest of the section, we consider the first four cases in Table 1. For each case we either show that the case cannot occur, or give a better branching vector for the case. Therefore, the largest branching number of a node in is at most 1.811, and therefore the time complexity of the algorithm is . Table 1: The five branching vectors with largest branching numbers generated by the Python script.

#### Case 1

In this case, the algorithm applies Rule B4, and when algorithm is called on , the algorithm applies Rule VC.4 on a path , where and . Since and does not have twins (in other words, does not have isolated vertices), the vertices of are . It follows that . Since is vertex of maximum degree in , this contradicts the assumption that Rule R4 cannot be applied. Therefore, case 1 cannot occur.

#### Case 2

In this case, the algorithm applies Rule B4, , and when algorithm is called on , the algorithm applies Rule VC.4 on a vertex with degree 3. In the branch , the algorithm applies Rule VC.4 on a path , where and .

If contains isolated vertices then the branching vector is at least and the branching number is at most 1.672. We now assume that does not contain isolated vertices.

Since Rule VC.4 was applied on , we obtain the following.

###### Observation 9.

and .

Let be a vertex cover of of size 3. In order to cover the edge , must contain either or . Suppose without loss of generality that . Since covers the edge , there is an index such that .

.

###### Proof.

From Lemma 10 we obtain that : Suppose conversely that . Therefore, all the neighbors of are in . However, , contradicting the fact that . Therefore, .

We have that has at least two neighbors in , otherwise have at most one neighbor in and at least two neighbors in , contradicting Lemma 6. Since , we obtain that is a clique. Additionally, every vertex in has exactly one neighbor in (note that these neighbors are not necessarily distinct).

Now, consider the application of Rule B4 on . In the branch , the vertices have degree 1. Therefore, the algorithm applies either a reduction rule on or Rule B4. Note that Rule R4 cannot be applied: Conversely, if there is a connected component in which is a clique then must contain a vertex . However, has a single neighbor , and (since Rule B4 was not applied on ). Therefore, is not a clique, a contradiction.

If the algorithm applies Rule B4, then the algorithm for computing applies either a reduction rule on or a branching rule with branching vector at least . Therefore, the branching vector for Case 2 is at least or at least . The branching number is at most 1.797.

#### Case 3

In this case, the algorithm applies Rule B4, and when algorithm is called on , the algorithm applies Rule VC.4 on a vertex with degree 3. In the branch , the algorithm applies Rule VC.4 on a vertex with degree 3.

Since and , we have that the unique vertex cover of of size 2 is . Therefore, the graph consists of a 3-star whose center is . Since Rule VC.4 was not applied on , the leaves of the star are in and therefore the center of the star is in . We now have that has at most one neighbor in and at least 3 neighbors in , contradicting Lemma 6. Therefore, case 3 cannot occur.

#### Case 4

In this case, the algorithm applies Rule B4, , and when algorithm is called on , the algorithm applies Rule VC.4 on a vertex with degree 3. In the branch , the algorithm applies Rule VC.4 on a vertex with degree 3, and in the branch the algorithm applies Rule VC.4 on a vertex with degree 3.

Note that are not adjacent otherwise , contradicting the choice of when Rule VC.4 was applied on . Using the same argument we have that is an independent set in .

###### Lemma 11.

is a vertex cover of .

###### Proof.

Let be a vertex cover of of size 3. Suppose that . Therefore, . Since