 # An Improved Fixed-Parameter Algorithm for 2-Club Cluster Edge Deletion

A 2-club is a graph of diameter at most two. In the decision version of the parametrized 2-Club Cluster Edge Deletion problem, an undirected graph G is given along with an integer k≥ 0 as parameter, and the question is whether G can be transformed into a disjoint union of 2-clubs by deleting at most k edges. A simple fixed-parameter algorithm solves the problem in 𝒪^*(3^k), and a decade-old algorithm was claimed to have an improved running time of 𝒪^*(2.74^k) via a sophisticated case analysis. Unfortunately, this latter algorithm suffers from a flawed branching scenario. In this paper, an improved fixed-parameter algorithm is presented with a running time in 𝒪^*(2.695^k).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A graph modification problem typically requires some minimal number of operations, referred to as graph editing, to transform a given graph into one that has a desired property, or structure. When restricted to edge editing operations, namely the addition or deletion of an edge, the practical objective is to make “corrections” to the graph by eliminating false positives (edge removal) and/or false negatives (edge addition). If edge deletion only is required, the objective can also be to partition the vertex set of a graph into subsets that satisfy the desired property.

A typical popular problem in this area is Cluster Editing, which is known as a model for correlation clustering. The problem seeks a transformation of an input graph into a disjoint union of cliques via a user-specified (or minimum) number of edge editing operations. Cluster Editing received a notable attention in the parameterized complexity literature [1, 2, 6, 7, 12, 13, 14, 15, 17, 22], and it has found application in various practical settings [3, 4, 5, 8, 9, 10, 16]. In various application scenarios, the requirement for clusters to be cliques is found to be too restrictive; hence, some relaxed clique models for dense subgraph have been proposed as alternatives. Examples include quasi-clique, -plex and -club . In this paper we merely consider the notion of a 2-club, being a natural extension of a clique, or 1-club, and also because in social networks nodes that are at distance two from each other are often expected to be closely related .

Many variants of editing a graph into a disjoint union of 2-clubs have been studied, such as 2-Club Cluster Vertex Deletion, 2-Club Cluster Edge Deletion, and 2-Club Cluster Editing. All these variants are -Complete . Moreover, it was shown in  that 2-Club Cluster Editing is -hard with respect to the number of modified edges, hence most likely not fixed-parameter tractable (). In addition, the 2-Club Cluster Vertex Deletion version of the problem was shown to be but not poly-kernelizable (unless co-.) Moreover, the problem was shown not to have a subexponential-time algorithm modulo the Exponential-Time Hypothesis .

In this paper we are mainly interested in the 2-Club Cluster Edge Deletion problem, which we believe is a natural extension of Cluster Editing being a possibly-better model for correlation clustering. In , Liu et al. presented a fixed-parameter algorithm for the problem, with a running time that was claimed to be in . Unfortunately, the claimed asymptotic running time was based on a branching scenario that omitted a critical case. We shall provide a brief note about the flawed argument in the appendix.

## 2 Preliminaries

We consider simple undirected unweighted graphs, and we use common graph theoretic terminology such as those found in . Let be a simple undirected unweighted graph. The distance between two vertices and in , denoted , is the length of a shortest path between them. The diameter of a connected graph is the maximum distance between any two vertices.

For a vertex , the set of vertices at distance from is denoted by , and the set of all vertices that are at distance at most from is denoted by . In particular the open and closed neighborhoods of are, respectively, and . Since we are dealing with simple graphs (with no multiple edges or self loops), the degree of a vertex is . A vertex of degree one is referred to as a pendant vertex.

A simple path in is an ordered sequence of pairwise distinct vertices such that for all . is an induced path if these are the only edges between its vertices. The length of is in this case and a path of length is denoted by (so we assume the number of vertices in is ). A tail of length , or -tail, is an induced path with degree-two internal vertices and with one endpoint that is of degree one in . A 3-tail is shown in Figure 1 (next section).

A clique in a graph is a set of pair-wise adjacent vertices. An -club is a set of vertices any two of which are at distance at most from each other. As such, a clique is nothing but a 1-club. As mentioned in the previous section, the main contribution of this paper is an improved fixed-parameter algorithm for the 2-Clubs Edge Deletion problem, which we formally define as follows.

2-Club Cluster Edge Deletion (2CCED)

Given: a graph and an integer

Question: can be transformed into a disjoint union of 2-clubs by deleting at most edges?

The 2CCED problem is NP-Complete, as shown in . However, the hardness proof does not work for bounded-degree graphs, which can be of special importance since any 2-club is of bounded size in this case. Observe that 2CCED is trivially solvable in polynomial time when the maximum degree is bounded above by two: if a connected component of the graph is a path , we simply successively delete edges for , which is optimum in this case. On the other hand, if a connected component is a cycle of length then we delete an arbitrary edge and the resulting graph will be an isolated path that can be resolved as discussed.

A solution to the 2CCED problem yields a graph whose connected components are diameter-two subgraphs. We refer to the resulting graph as a 2-clubs graph. The presence of a path of length three whose endpoints are at distance exactly three from each other is the main “forbidden structure” that prevents a graph from being a 2-clubs graph. We shall refer to such a path as a conflict quadruple in this paper. During the search for a solution we look for a conflict quadruple and try to resolve it by deleting one of the three edges forming it. We shall mark some edges as permanent if we decide they are not to be part of a solution (hence not to be deleted).

## 3 An Improved 2CCED Algorithm

Our algorithm is simply based on resolving any conflict quadruple by deleting one of the three edges forming it. In each case (or branch) the parameter is decreased by one. This general approach gives a simple algorithm. However, there are cases where more than one conflict intersect in a way that allows us to further reduce the parameter at some branches. Moreover, there are simpler cases where we know exactly which edge (or group of edges) to delete “without loss of optimality.” Such cases can be dealt with as part of a polynomial-time procedure that is based on reduction rules.

### 3.1 A Reduction Procedure

A reduction procedure is assumed to be exhaustively applied before the search-tree backtracking algorithm and during the search process, prior to any choice, or decision, made by the search algorithm. The main reduction rules are given below. They are assumed to be applied successively in such a way that a rule is not applied, until all the previous rules have been applied exhaustively. We shall prove the soundness of non-obvious reduction rules only.

Reduction Rule 1. The algorithm terminates and reports a no instance whenever the parameter becomes negative.

Reduction Rule 2. The algorithm terminates and reports a yes instance if the graph becomes empty (assuming due to the previous rule).

Reduction Rule 3. If contains a connected component that is a 2-club, then delete .

Note that exhaustive application of Rule 3 results also in deleting all isolated vertices.

Reduction Rule 4. If two non-adjacent vertices and have more than common neighbours then delete the edges linking and to and respectively.

Soundness. Since and have more than common neighbors it would be impossible to cause the distance between them to increase beyond two, so they must belong to the same 2-club, which does not contain elements of .

Reduction Rule 5. If contains a connected component of maximum degree two, then can be transformed optimally into a 2-clubs (sub)graph. This results in decreasing the value of by the number of deleted edges.

Soundness. Any connected component of maximum degree two is either a cycle or a path, which can be resolved as described in Section 2 above.

Reduction Rule 6. If has a 3-tail , as in Figure 1, then we simply delete the edge and decrease by 1.

Soundness. Since the two vertices and must belong to two different 2-clubs, at least one of the three edges forming must be deleted. Deleting results in an isolated 2-club (namely the path formed by and ) and cannot result in a sub-optimal solution.

### 3.2 Branching Rules

We now present our bounded search tree algorithm, which simply works in a recursive manner and can be viewed as a search-tree traversal. The running time is thus proportional (modulo a polynomial factor) to the number of recursive calls. This is why we use the notation, which mainly displays the total number of recursive calls and hides any polynomial factor.

In what follows, we consider an instance of 2CCED that has been pre-processed by exhaustive application of the reduction rules. As mentioned earlier, the reduction rules are assumed to be applied exhaustively whenever they are applicable during the search process. As such, we either have a solution (when becomes empty) or every connected component of contains at least one vertex of degree and at least two vertices that are at distance exactly three from each other. This order of events applies also to the branching rules, given by a list of cases below. Therefore, in each case, we assume none of the previously addressed conditions hold.

#### Case 1. Neighbors of endpoints of a P2.

If we have an induced path of length two, say , such that , then we branch by either (i) deleting or or all the vertices in . The worst-case recurrence is thus with a corresponding running time in .

Soundness. Each of the first two branches deletes one of the three edges of a conflict quadruple that contains as a sub-path. In the third case (or branch) the two edges and become permanent. Thus any neighbor of that is at distance three from must be deleted, and vice versa.

###### Remark 1.

The above branching scenario applies implicitly in two notable cases that we shall (therefore) exclude in the sequel.

• If we have a conflict quadruple with degree-two internal vertices ( and ), then any neighbor of is at distance exactly three from , and the same applies to and . Thus the path satisfies the branching condition of Case 1, so from this point on this case is implicitly excluded.

• If we have a pair of vertices and that are at distance four from each other, then the three internal vertices on a shortest path between and also satisfy the condition of Case 1.

Based on the above remark, we can assume that from this point on every connected component of is a 3-club. Moreover, any such 3-club contains at least one vertex with a non-empty and every vertex in has at most one neighbor in (if an element of has two or more neighbors in then Case 1 would be applicable).

In the following cases and sub-cases we assume we have a conflict quadruple , and we mainly seek to resolve it by deleting one of the three edges. In some cases, we might also consider other conflict quadruples, if found in the neighborhood of .

#### Case 2. Conflict quadruple with pendant endpoints.

In the special case where every conflict quadruple satisfies , we know the internal vertices and do not have more than one degree-one neighbor, otherwise Case 1 applies. Therefore deleting the edge can only resolve exactly one conflict and it could possibly yield more conflicts, while deleting or can resolve one or more conflicts without leading to more conflict quadruples. Therefore in this special case we simply branch by either deleting or , with a corresponding running time in .

From this point on, and without loss of generality, we shall assume has at least one neighbor other than . Such a neighbor is therefore at distance one or two from . In fact, if its distance to is three, then Case 1 would apply to the path .

#### Case 3. b has a neighbor at distance one from d.

Let be a common neighbor of and , as shown in Figure 2. We branch as follows:

• delete edge ;

• delete edges and ;

• delete edges and ;

• delete edges and ;

• delete edges and .

This gives the recurrence: with a corresponding running time in .

Soundness. After the second branch, we know that and at least one vertex from the pair is in the same 2-club as . If both and are in this 2-club, then we must delete and (since cannot be in the same club). This justifies the third branch. After the third branch, the 2-club of contains either so we delete and , or it contains and this leads to deleting and . ∎

###### Remark 2.

Observe that not all links are shown in the above figure, but the branching scenario can only be improved if other links exist without affecting the distance between and . For example, adding an edge between and leads to a better recurrence since would have to be deleted in each of the last two branches.

#### Case 4. b has a neighbor at distance two from d.

Let be an induced corresponding to this case, as shown in Figure 3.

The distance between and in the above figure leads to two possible sub-cases, namely and .

#### Case 4.1. d(a,y)=3.

In this particular case we branch as follows:

• delete ;

• delete and ;

• delete and ;

• delete and ;

• delete and .

This again yields the recurrence with a running time in .

Soundness. After the second branch we are sure that is in the same club as or (or both). If and are in the same club, then we must delete edges and , which corresponds to (and justifies) the third branch. Otherwise, we have exactly two cases: either is deleted or is deleted. In the first case, we must also delete and in the second we must delete . ∎

#### Case 4.2. d(a,y)=2.

This is depicted in Figure 7 below. We further note that is either 3 or 2 (if then Case 3 would have been applied). If , then the path would satisfy the condition of Case 1. Therefore we restrict our attention to the case where , and let be the common neighbor of and . We further distinguish the two cases where and .

#### Case 4.2.1 w≠b

In this case we branch to resolve the conflict quadruple as follows (see Figure 5):

• delete edge and further branch to deleting

, and or (to disconnect from );

• delete and further branch to deleting:

, and or

, and or ;

• delete and further branch to deleting:

, and or

, and or (since after the deletion of )

, , and or (to disconnect from )

, , and or (same reason).

This gives the recurrence with a running time in .

Soundness. We prove the soundness of each branching action separately.

In the first branch we delete , being one of the edges of the conflict quadruple , and proceed into resolving the conflict quadruple . In this case, after the second (sub)branch we know and are permanent so we must delete or to make sure and are not in the same club (since we deleted of edge , which forces and to be in different 2-clubs).

In the second branch we proceed by deleting of , and we know is permanent. When we delete , the distance between and must become three. Otherwise, we would have a common neighbor between and other than and Case 3 would have been applied. Therefore we have another conflict quadruple to resolve, namely . So we branch by deleting either or (since is permanent in this branching case). The same applies to the sub-case (or sub-branch) where we delete (since becomes three again).

Finally, we note the importance of the order by which the quadruple is resolved in the third branch. First, the deletion of again leads to which is resolved by deleting or . Second, the deletion of increases the distance between and to three (same argument as in the case of and ). We thus have to resolve the conflict quadruple by deleting or (since is permanent in this branch). Finally, when deleting we introduce two conflict quadruples: and , which are resolved by deleting or and (in each case) deleting or . ∎

#### Case 4.2.2 w=b

In this case we also branch to resolve the conflict quadruple as follows (see Figure 6):

• delete edge and further branch to deleting

, and or (to disconnect from );

• delete and further branch to deleting:

, and or

, and or ;

• delete and further branch to deleting:

, and or

, and or

and (to make sure is disconnected from ).

This gives the recurrence with a running time in .

Soundness. The only difference between this case and the previous one is in the very last branch, when deleting and . In this case we must make sure and are in different clubs (since we deleted ), so we further delete since is permanent in this last case. ∎

The above branching scenarios cover all the possible cases where we can find two vertices at distance three from each other in a graph that is not a disjoint union of 2-clubs. Therefore we can now state our main result.

###### Theorem 1.

The 2-Club Cluster Edge Deletion problem is solvable in .

## 4 Concluding Remarks

We presented an improved fixed-parameter algorithm for 2-Club Cluster Edge Deletion. The main approach is based on gradual elimination of favorable scenarios: bounded-degree-two, tail of length three, special paths of length two, paths of length four, etc… At each branching step, the absence of previous favorable scenarios makes it possible to improve the branching factor. Despite its practical importance, we believe the problem has not received enough attention, thus far. In fact, the only known FPT algorithm that improves on the exhaustive (folklore) method is the decade-old algorithm of Liu et al. , which is shown to have a flawed branching case (as we prove in the appendix).

The importance of the 2-Club Cluster Edge Deletion problem stems from its ability to provide a better model for correlation clustering than the well studied Cluster Editing problem. From a technical standpoint, the number of edge modifications (the parameter ) can be much smaller since the amount of edge additions needed to turn each resulting component into a clique can be very large. As such, correlation clustering via 2-Club Cluster Edge Deletion can be more practical and possibly more informative. It would be interesting to have a fixed-parameter algorithm for the 3-Club Cluster Edge Deletion problem using techniques similar to what we presented in this paper.

## References

•  F. N. Abu-Khzam. On the complexity of multi-parameterized cluster editing. J. Discrete Algorithms, 45:26–34, 2017.
•  F. N. Abu-Khzam, J. Egan, S. Gaspers, A. Shaw, and P. Shaw. Cluster editing with vertex splitting. In J. Lee, G. Rinaldi, and A. R. Mahjoub, editors, Combinatorial Optimization - 5th International Symposium, ISCO 2018, Marrakesh, Morocco, April 11-13, 2018, Revised Selected Papers, volume 10856 of Lecture Notes in Computer Science, pages 1–13. Springer, 2018.
•  J. R. Barr, P. Shaw, F. N. Abu-Khzam, and J. Chen. Combinatorial text classification: the effect of multi-parameterized correlation clustering. In 2019 First International Conference on Graph Computing (GC), pages 29–36, 2019.
•  J. R. Barr, P. Shaw, F. N. Abu-Khzam, T. Thatcher, and S. Yu. Vulnerability rating of source code with token embedding and combinatorial algorithms. International Journal of Semantic Computing, 14(04):501–516, 2020.
•  J. R. Barr, P. Shaw, F. N. Abu-Khzam, S. Yu, H. Yin, and T. Thatcher. Combinatorial code classification & vulnerability rating. In 2020 Second International Conference on Transdisciplinary AI (TransAI), pages 80–83. IEEE, 2020.
•  S. Böcker and J. Baumbach. Cluster editing. In P. Bonizzoni, V. Brattka, and B. Löwe, editors, The Nature of Computation. Logic, Algorithms, Applications - 9th Conference on Computability in Europe, CiE 2013, Milan, Italy, July 1-5, 2013. Proceedings, volume 7921 of Lecture Notes in Computer Science, pages 33–44. Springer, 2013.
•  S. Böcker, S. Briesemeister, and G. W. Klau. Exact algorithms for cluster editing: Evaluation and experiments. Algorithmica, 60(2):316–334, 2011.
•  M. D’Addario, D. Kopczynski, J. Baumbach, and S. Rahmann. A modular computational framework for automated peak extraction from ion mobility spectra. BMC Bioinformatics, 15(1), 2014.
•  F. Dehne, M. A. Langston, X. Luo, S. Pitre, P. Shaw, and Y. Zhang. The cluster editing problem: Implementations and experiments. In Parameterized and Exact Computation, pages 13–24. Springer Berlin Heidelberg, 2006.
•  A. Fadiel, M. A. Langston, X. Peng, A. D. Perkins, H. S. Taylor, O. Tuncalp, D. Vitello, P. H. Pevsner, and F. Naftolin. Computational analysis of mass spectrometry data using novel combinatorial methods. AICCSA, 6:8–11, 2006.
•  A. Figiel, A. Himmel, A. Nichterlein, and R. Niedermeier. On 2-clubs in graph-based data clustering: Theory and algorithm engineering. In T. Calamoneri and F. Corò, editors, Algorithms and Complexity - 12th International Conference, CIAC 2021, Virtual Event, May 10-12, 2021, Proceedings, volume 12701 of Lecture Notes in Computer Science, pages 216–230. Springer, 2021.
•  J. Gramm, J. Guo, F. Hüffner, and R. Niedermeier. Graph-modeled data clustering: Fixed-parameter algorithms for clique generation. In R. Petreschi, G. Persiano, and R. Silvestri, editors, Algorithms and Complexity, 5th Italian Conference, CIAC 2003, Rome, Italy, May 28-30, 2003, Proceedings, volume 2653 of Lecture Notes in Computer Science, pages 108–119. Springer, 2003.
•  J. Gramm, J. Guo, F. Hüffner, and R. Niedermeier. Automated generation of search tree algorithms for hard graph modification problems. Algorithmica, 39(4):321–347, 2004.
•  J. Guo. A more effective linear kernelization for cluster editing. Theoret. Comput. Sci., 410(8):718–726, 2009.
•  P. Heggernes, D. Lokshtanov, J. Nederlof, C. Paul, and J. A. Telle. Generalized graph clustering: Recognizing (p, q)-cluster graphs. In D. M. Thilikos, editor, Graph Theoretic Concepts in Computer Science - 36th International Workshop, WG 2010, Zarós, Crete, Greece, June 28-30, 2010 Revised Papers, volume 6410 of Lecture Notes in Computer Science, pages 171–183, 2010.
•  F. Hüffner, C. Komusiewicz, H. Moser, and R. Niedermeier. Fixed-parameter algorithms for cluster vertex deletion. Theory of Computing Systems, 47(1):196–217, 2010.
•  C. Komusiewicz and J. Uhlmann. Cluster editing with locally bounded modifications. Discrete Appl. Math., 160(15):2259–2270, 2012.
•  V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. In C. C. Aggarwal and H. Wang, editors, Managing and Mining Graph Data, volume 40 of Advances in Database Systems, pages 303–336. Springer, 2010.
•  D. Liben-Nowell and J. M. Kleinberg. The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol., 58(7):1019–1031, 2007.
•  H. Liu, P. Zhang, and D. Zhu. On editing graphs into 2-club clusters. In J. Snoeyink, P. Lu, K. Su, and L. Wang, editors, Frontiers in Algorithmics and Algorithmic Aspects in Information and Management, pages 235–246, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
•  N. Misra, F. Panolan, and S. Saurabh. Subexponential algorithm for d-cluster edge deletion: Exception or rule? J. Comput. Syst. Sci., 113:150–162, 2020.
•  R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification problems. Discrete Applied Mathematics, 144(1):173 – 182, 2004. Discrete Mathematics and Data Mining.
•  D. B. West. Introduction to Graph Theory. Prentice Hall, 2 edition, September 2000.

## Appendix: The algorithm of Liu et al.

The 2CCED algorithm of Liu et al. is claimed to have a worst-case running time in . Unfortunately, there is a branching rule that is wrong due to an omitted case. The rule corresponds to the below figure (labeled Case 2.2.4 in the same paper). It is redrawn below for a clear illustration in a manner that matches our case analysis.

In , the authors presented the following branching scenario (Page 245, Table 1, row 4).

• delete edges 1, 5 and 7;

• delete edges 1, 5 and 8;

• delete edges 1, 6 and 7;

• delete edges 1, 6 and 8;

• delete edges 2 and 4;

• delete edges 2, 5 and 7;

• delete edges 2, 5 and 8;

• delete edges 2, 6 and 7;

• delete edges 2, 6 and 8;

• delete edges 3 and 7;

• delete edges 3 and 8;

• delete edges 3, 4 and 5;

• delete edges 3, 4 and 6.

The corresponding worst-case recurrence is with a running time in . To understand the above branching, observe that it tries to resolve the conflict quadruple by first deleting edge 1 () and then simultaneously resolve the two conflict quadruples and . The latter conflict results from the deletion of edge 1.

The first four branches are not enough to cover the case of deleting edge 1 () since there is a case where both edges 1 and 4 are deleted. This becomes obvious from branches 5-9 where the authors do notice the need to delete edges 2 and 4 to cover the case where edge 2 is deleted. The branching rule can be fixed by adding a branch/case for the deletion of edges 1 and 4 at the beginning. The running time would go up to if this is fixed, provided there are no other errors or missed cases. Finally, had this branching rule been correct as described in , we would have used it to cover Case 2.4 in our algorithm and we would have improved the running time to .