A Parameterized View on Multi-Layer Cluster Editing

In classical Cluster Editing we seek to transform a given graph into a disjoint union of cliques, called a cluster graph, using the fewest number of edge modifications (deletions or additions). Motivated by recent applications, we propose and study Cluster Editing in multi-layer graphs. A multi-layer graph consists of a set of simple graphs, called layers, that all have the same vertex set. In Multi-Layer Cluster Editing we aim to transform all layers into cluster graphs that differ only slightly. More specifically, we allow to mark at most d vertices and to transform each layer of the multi-layer graph into a cluster graph with at most k edge modifications per layer such that, if we remove the marked vertices, we obtain the same cluster graph in all layers. Multi-Layer Cluster Editing is NP-hard and we analyze its parameterized complexity. We focus on the natural parameters "max. number d of marked vertices", "max. number k of edge modifications per layer", "number n of vertices", and "number l of layers". We fully explore the parameterized computational complexity landscape for those parameters and their combinations. Our main results are that Multi-Layer Cluster Editing is FPT with respect to the parameter combination (d, k) and that it is para-NP-hard for all smaller or incomparable parameter combinations. Furthermore, we give a polynomial kernel with respect to the parameter combination (d, k, l) and show that for all smaller or incomparable parameter combinations, the problem does not admit a polynomial kernel unless NP is in coNP/poly.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/18/2019

Cluster Editing parameterized above the size of a modification-disjoint P_3 packing is para-NP-hard

Given a graph G=(V,E) and an integer k, the Cluster Editing problem asks...
10/15/2018

Parameterized Dynamic Cluster Editing

We introduce a dynamic version of the NP-hard Cluster Editing problem. T...
01/01/2019

On the Parameterized Cluster Editing with Vertex Splitting Problem

In the Cluster Editing problem, a given graph is to be transformed into ...
12/06/2021

Modification-Fair Cluster Editing

The classic Cluster Editing problem (also known as Correlation Clusterin...
06/26/2020

On 2-Clubs in Graph-Based Data Clustering: Theory and Algorithm Engineering

Editing a graph into a disjoint union of clusters is a standard optimiza...
10/09/2020

Parameterized Reinforcement Learning for Optical System Optimization

Designing a multi-layer optical system with designated optical character...
10/24/2017

Counting small subgraphs in multi-layer networks

Motivated by the prevalence of multi-layer network structures in biologi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The NP-hard Cluster Editing problem, also known as Correlation Clustering, models the following clustering task. Given a set of objects and their binary similarity relations, partition the objects into parts, minimizing the similarities between objects in different parts and non-similarities between objects in the same part. In graph-theoretic terms, given a graph  and an integer , we want to edit, that is, add or delete, at most  edges in  such that we obtain a disjoint union of cliques, also called a cluster graph. Cluster Editing was introduced by Ben-Dor, Shamir, and Yakhini [3] and Bansal, Blum, and Chawla [1]

in biological and machine-learning contexts and has been studied extensively by the corresponding communities. The edited edges can be thought of as noise that obfuscates the inherent cluster structure, where the noise may stem from measurement errors, for example.

Cluster Editing has since also become one of the best-studied parameterized problems, see Böcker and Baumbach [6] for a survey. Notably, parameterized approaches to Cluster Editing were also successful in practice [7].

In recent years, the multi-layered nature of data in many applications is becoming more and more relevant [5, 9, 14, 16]. For example, useful information about individuals may be represented in their social interactions, geographic closeness, common interests or activities [14]

. An example from biology is the neural network of Caenorhabditis elegans in which neurons can be connected by either chemical links or ionic channels 

[5]. To represent and analyze such data, researchers commonly use multi-layer graphs222Multi-layer graphs are known by a multitude of other names including multi-dimensional networks, multiplex networks, and edge-colored multigraphs [16]., which are collections of ordinary graphs on the same vertex set, called layers. Multi-layer graphs enable us to take into account the different aspects of the data modeled in each layer. The field of clustering on multi-layer graphs, while still in its infancy [16], was already focus of much research (see surveys [14, 16]). Indeed, crucial information may be lost, if we instead aggregate all layers into one graph [2, 12, 19].

Here we introduce a natural discrete model that lifts the established Cluster Editing  problem to multi-layer graphs. The challenge in multi-layer clustering is to recover the cluster structure inherent in each layer, while also determining the overlap of communities between layers [4]. A natural approach is thus as follows. To ensure that the cluster structure that we recover by editing edges is sufficiently reflected in each layer, we specify a maximum budget  and allow in each layer to edit at most  edges. To determine the overlap of communities between layers, we specify an upper bound  on the number of entities that may switch communities between layers—the (recovered) communities represented by the remaining entities have to be the same across all layers. Formally, the computational problem that we study is as follows.

Multi-Layer Cluster Editing Input:  graphs , and two integers . Question: Is there a vertex subset with and  edge modification sets such that for each we have that , for each the graph is a cluster graph, and for all we have that ?

Herein, denotes the symmetric difference, that is, and denotes for .

We study Multi-Layer Cluster Editing from a parameterized algorithms point of view. Our motivation is threefold. First, we think that it is a natural model for (correlation) clustering in multi-layer graphs. Indeed, many works for multi-layer clustering strive to find a consensus clustering among all layers [12, 18, 19]. If in Multi-Layer Cluster Editing, then we check whether the input multi-layer graph conforms to a consensus clustering, up to noise. However, it is intuitive, that such a consensus clustering does not always exist. For example, following the data analyzed by Kim et al. [15], a researcher may be part of different communities with respect to the work, lunch, facebook, friend, and coauthor relationships (corresponding to different layers), while for others these communities are similar. This motivates us to partition the vertex set into a consensus cluster part and a “fluctuating” part  in Multi-Layer Cluster Editing.

Second, given the success of parameterized approaches for Cluster Editing [6, 7], we think that they are suitable to attack also Multi-Layer Cluster Editing and that this problem is a natural candidate to extend the toolkit of parameterized algorithms and apply it to the new challenges arising in the emerging field of multi-layer graphs.

Third, our techniques apply to temporal graphs. These are multi-layer graphs in which the layers are equipped with a linear order, modeling time-stamped communication in a social network, for example. Berger-Wolf and Tantipathananandh [20] studied a cluster editing problem for temporal graphs that is closely related to Multi-Layer Cluster Editing. The main difference is that, instead of marking vertices in  over all layers, they mark vertices for each layer individually, allowing them to move between clusters between the layer in which they are marked and the successor layer.333In addition, Berger-Wolf and Tantipathananandh’s [20] objective function translates to minimizing . This translates to a modified Condition (3) which applies only to consecutive layers . We believe that studying Multi-Layer Cluster Editing

 constitutes the natural first step towards the multivariate analysis of the somewhat more complex problem of Berger-Wolf and Tantipathananandh.

FPT, Theorem 1

para-NP-hard, Proposition 1

para-NP-hard, Observation 1

(same as )
No poly kernel, Theorem 3

Poly kernel, Theorem 2

(same as ; “instance size”)
Figure 1: Our results in a Hasse diagram of the upper-boundedness relation between the parameters “max. number  of marked vertices”, “max. number  of edge modifications per layer”, “number  of vertices”, and “number  of layers”, and all of their combinations. Multi-Layer Cluster Editing is para-NP-hard for all parameter combinations colored in red. It is FPT for all parameter combinations that are colored yellow or green and admits a polynomial kernel for all parameter combinations colored green. It does not admit a polynomial kernel for all parameter combinations that are colored yellow unless .

Our Results.

We completely classify

Multi-Layer Cluster Editing in terms of fixed-parameter tractability and existence of polynomial-size problem kernels with respect to the parameters “max. number  of marked vertices”, “max. number  of edge modifications per layer”, “number  of vertices”, and “number  of layers”, and all of their combinations. Note that, within these parameters, we have the following hierarchical relations: we know that  and . We show an overview of our results in Figure 1. The main results are that Multi-Layer Cluster Editing is FPT with respect to the number  of marked vertices combined with the maximum number  of edge modifications per layer. It is para-NP-hard for all smaller or incomparable parameter combinations, however. Furthermore, we give a polynomial kernel with respect to the parameter combination and show that for all smaller or incomparable parameter combinations, the problem does not admit a polynomial kernel unless .

Notation.

We use standard notation from parameterized complexity [10] and graph theory [11]. We use  to denote the number of vertices in the given vertex set, i.e., unless stated otherwise we have that . Similarly, is the number of layers in the input multi-layer graph. We call  the -th layer or layer . We call the vertices in marked and we call the budget of a layer. We say that a tuple of edge modification sets and marked vertices  is a solution if it satisfies Conditions (1) to (3) of Multi-Layer Cluster Editing.

2 Hardness of Multi-Layer Cluster Editing

Our problem is contained in NP since we can verify in polynomial time whether a given subset of vertices and some edges sets fulfill the three requirements given in the definition of our problem question. Thus, in all proofs for NP-completeness, we omit the proof for NP containment and only show the hardness part.

Since Cluster Editing is NP-complete [1], we immediately get NP-hardness for Multi-Layer Cluster Editing.

Observation 1.

Multi-Layer Cluster Editing is NP-complete for and .

By a polynomial-time reduction from Vertex Cover (given a simple graph  and an integer , decide whether there is a size-at-most- vertex cover, i.e., a subset of at most vertices which are jointly incident to all edges) which is NP-complete on graphs with maximum vertex degree three [13], we obtain that Multi-Layer Cluster Editing is NP-hard for a constant number of layers even if no edge modifications are allowed. This means that also the marking of vertices is a computationally hard task.

Proposition 1.

Multi-Layer Cluster Editing is NP-complete for and .

Proof.

We reduce from Vertex Cover on graphs with maximum (vertex) degree three. Let  be an instance of Vertex Cover, where has maximum degree three. By Vizing’s Theorem [21] we know that  is 4-edge-colorable and we can compute a proper 4-edge-coloring in polynomial time [17]. Let  be the set of edges colored with color  for an arbitrary but fixed 4-edge-coloring of . Define to be an instance of Multi-Layer Cluster Editing. As already argued,  can be constructed from in polynomial time. We show that  is a yes-instance of Vertex Cover if and only if  is a yes-instance of Multi-Layer Cluster Editing.

() Let  be a vertex cover of  with . We first mark all vertices of . Note that the graph in each layer consists of isolated edges since two adjacent edges cannot have the same color in a proper edge-coloring. Hence for every , graph is a cluster graph. By definition, is edgeless . Therefore for all , implying that is a yes-instance of Multi-Layer Cluster Editing.

() Let  be a set of marked vertices with . Note that  for all  with since every edge in is colored with exactly one color. Since we are not allowed to make any edge modification and hence for each edge at least one of the endpoints has to be marked. It follows that  is a vertex cover for  of size at most . ∎

Modifying the reduction in the proof of Proposition 1 to introduce one layer for each edge in the graph of the Vertex Cover instance, we obtain a parameterized reduction from Independent Set which is W[1]-hard with respect to the solution size. This gives the following corollary.

Corollary 1.

Multi-Layer Cluster Editing is W[1]-hard with respect to the parameter  for .

Proof.

We reduce from Independent Set parameterized by the solution size. Note that this is equivalent to Vertex Cover parameterized by the dual of the solution size. We can use a reduction that is very similar to the reduction used in the proof of Proposition 1. The main difference is that we cannot restrict the input graphs to graphs with maximum degree 3 since Independent Set parameterized by the solution size is FPT on graphs with bounded maximum degree. However, by Vizing’s Theorem [21], we can still edge color the graph with one more color than its maximum degree in polynomial time. As we again introduce one layer for each color, this increases the number of layers in the reduction to one plus the maximum degree of the input graph. The rest of the proof is analogous. ∎

3 An Fpt Algorithm for Multi-Layer Cluster Editing

In this section, we present an FPT algorithm for Multi-Layer Cluster Editing with respect to the combined parameter .

Theorem 1.

Multi-Layer Cluster Editing is FPT with respect to the number  of edge modifications per layer and number  of marked vertices combined. It can be solved in time .

We describe a recursive search-tree algorithm (see Algorithm 1) which takes the following data as input:

  • An instance of Multi-Layer Cluster Editing consisting of a multi-layer graph and two integers and .

  • A constraint , consisting of a set of marked vertices , edge modification sets , and a set of permanent vertex pairs.

The algorithm follows the greedy localization approach in which we make some decisions greedily, which we possibly revert through branching later on. The greedy decisions herein give us some structure that we can exploit to keep the search-tree size small. The edge modification sets  represent both the greedy decisions and those that we made through branching. The set  contains only those made by branching.

Throughout the algorithm, we try to maintain a good constraint which intuitively means that the constraint can be turned into a solution (if one exists).

Definition 1 (Good Constraint).

Let be an instance of Multi-Layer Cluster Editing. A constraint is good for if there is a solution such that (i) , (ii) there is no such that , and (iii) for all we have . We also say that witnesses that is good.

The following is easy to see.

Observation 2.

For any yes-instance of Multi-Layer Cluster Editing, we have that is a good constraint.

We also call the above constraint  trivial. The initial call of our algorithm is with the input instance of Multi-Layer Cluster Editing together with the trivial constraint .

Our algorithm uses a number of different branching rules to search for a solution to our Multi-Layer Cluster Editing input instance:

Definition 2 (Branching Rule).

A branching rule takes as input an instance of Multi-Layer Cluster Editing and a constraint and returns a set of constraints .

When a branching rule is applied, the algorithm invokes a recursive call for each constraint returned by the branching rule and returns true if at least one of the recursive calls returns true; otherwise, it returns false. For that to be correct, whenever a branching rule is invoked with a good constraint, at least one of the constraints returned by the branching rule has to be a good constraint as well. We say that a branching rule is safe if it has this property.

In the following, we introduce the branching rules used by the algorithm and prove that each of them is safe. This together with Observation 2 will allow us to prove by induction that the algorithm eventually finds a solution for the input instance of Multi-Layer Cluster Editing if it is a yes-instance. To make the description of the branching rules more readable, we introduce four types of non-marked vertex pairs. Say that a vertex pair is

  • settled if for all or for all (edge always present or never present),

  • frequent if (edge almost always present),

  • scarce if (edge almost never present), and

  • unsettled otherwise, that is, (edge sometimes present).

Note that, by definition, if a vertex pair falls in one of the above categories, both of the vertices in that pair are not marked.

Our aim with the first two rules is to settle all pairs in . In order to achieve our desired running time bound, we can only afford to exhaustively search through all unsettled vertex pairs:

Branching Rule 1.

If there is an unsettled vertex pair , then output the following up to four constraints:

  1. For all , put , , and . (Add the edge corresponding to the vertex pair in all layers where it is not present and mark it as permanent.)

  2. For all , put , , and . (Remove the edge corresponding to the vertex pair from all layers where it is present and mark it as permanent.)

  3. If there is no such that , then , the rest stays the same. (Mark the first vertex in the vertex pair.)

  4. If there is no such that , then , the rest stays the same. (Mark the second vertex in the vertex pair.)

Lemma 1.
Proof.

Let the input constraint be good and let be a solution to the input instance that witnesses that is good. If , then by Condition (3) on solutions either for all or for all . Hence, one of the first two constraints is good. Otherwise, if , then one of the last two constraints is good. ∎

The following Greedy Rule deals with all frequent and scarce vertex pairs. It only produces one constraint and hence no branching occurs in that sense. For formal reasons it is nevertheless useful to treat the Greedy Rule as a special case of a branching rule. Note that the algorithm also invokes a recursive call with the output constraint of this rule. The rule greedily adds the edge corresponding to a frequent vertex pair in all layers where it is not present and removes edges corresponding to scarce vertex pairs in all layers where it is present. Intuitively, the Greedy Rule is safe, because all of its decisions can be reverted later on.

Greedy Rule.

If there is a frequent or a scarce vertex pair , then return one of the following two constraints:

  • If is frequent, then for all put , the rest stays the same. (Add the edge corresponding to the vertex pair in all layers where it is not present.)

  • If is scarce, then for all put , the rest stays the same. (Remove the edge corresponding to the vertex pair from all layers where it is present.)

Lemma 2.

The Greedy Rule is safe.

Proof.

Let the input constraint be good and let be a solution for the input instance that witnesses that is good and denote for all . Note that neither nor is changed in the constraint output by the Greedy Rule. Hence, trivially, Conditions (i) and (ii) of being good are satisfied. For Condition (iii), we claim that no set changes, implying the condition. For a contradiction, assume . Since witnesses being good, we have that for all and we have that there is no such that . This implies that is settled, since otherwise there is a such that which would imply that , which is impossible because is a solution. Thus, the Greedy Rule is not applicable to , a contradiction. ∎

After the above two rules have been applied exhaustively, all pairs in are settled. With the following rule we edit the subgraphs induced by all non-marked vertices into cluster graphs. This branching rule represents a well-known rule from the classical Cluster Editing with the addition that we also branch on marking vertices.

Branching Rule 2.

If there is an induced in for some , where , then return the following up to six constraints:

  1. If , then for all put , , and . (Remove the edge corresponding to the vertex pair from all layers and mark it as permanent.)

  2. If , then for all put , , and . (Remove the edge corresponding to the vertex pair from all layers and mark it as permanent.)

  3. If , then for all put , , and . (Add the edge corresponding to the vertex pair in all layers and mark it as permanent.)

  4. For each : If there is no such that , then return a constraint with , the rest stays the same. (Mark vertices of the that are not part of permanent vertex pairs. This gives up to three constraints.)

If none of the above possibilities apply, then reject the current branch.444This technically does not fit the definition of a branching rule but we can achieve the same effect by returning trivially unsatisfiable constraints such as a constraint with .

Lemma 3.
Proof.

Let the input constraint be good and let be a solution for the input instance witnessing that is good. By Condition (2) of solutions it holds that, for all , graph  does not contain any as an induced subgraph, where . Hence, if there is some and three vertices  that induce a in , where , then there are two cases.

In the first case, one of is also in , say . Note that, then, cannot be part of any permanent vertex pair, by the definition of good constraints. Thus, the constraint that puts output in the fourth part of Branching Rule 2 is good.

The second case is that . Then, since is a cluster graph, at least one of the vertex pairs formable from is modified by , that is, in . Say . By Condition (3) of solutions, is settled. Note that cannot be permanent since otherwise we already have that by the definition of a good constraint. Thus the constraint which adds to and makes it permanent is good. Hence, the rule is safe. ∎

The next rule keeps the sets of edge modifications  free of marked vertices. Pairs in can become marked if vertices of vertex pairs processed by the Greedy Rule are marked by other branching rules further down the search tree. Like the Greedy Rule, it only produces one constraint and hence no branching occurs, so it is also a degenerate branching rule and we treat is as such. Note that the algorithm also invokes a recursive call with the output constraint of this rule.

Clean-up Rule.

If there is an such that there is a with , then return a constraint with , the rest stays the same.

Lemma 4.

The Clean-up Rule is safe.

Proof.

Let the input constraint be good and let be a solution for the input instance witnessing that is good. Note that permanent vertex pairs cannot contain marked vertices by the definition of constraints. It follows that the Clean-up Rule does not add or remove permanent vertex pairs from any set ; this implies Condition (iii) of being good. Furthermore, it does not change the sets and , implying Conditions (i) and (ii). Hence, the Cleanup Rule is safe. ∎

The next rule tries to repair any budget violations that might occur. Since with the Greedy Rule

we greedily make decisions and do not exhaustively search through the whole search space, we expect that some of the choices were not correct. This rule will then revert these choices. Also, to have a correct estimate of the sizes of the current edge modification sets, this rule requires that the

Clean-up Rule is not applicable. For technical reasons, it also requires Branching Rule 1 and the Greedy Rule not to be applicable.

Branching Rule 3.

If there is an for some with , then if , let , otherwise, take any with and return the following constraints:

  1. For each return a constraint in which for all we put , , and .

  2. For each :

    • If there is no such that , then return a constraint with , , and : .

    • If there is no such that , then return a constraint with , and : .

If , then reject the current branch.

Lemma 5.

If Branching Rule 1, the Greedy Rule, and the Clean-up Rule are not applicable, then Branching Rule 3 is safe.

Proof.

Let the input constraint be good and let be a solution for the input instance witnessing that is good.

Since the Greedy Rule and Branching Rule 1 are not applicable, we have that all vertex pairs in are settled. Since the Clean-up Rule is not applicable, no edge modification set contains marked vertices. Now if there is an with , then there clearly is a  with . Since is a good contraint, we also have that .

If , then it is easy to see that for each with there is at least one vertex pair such that , otherwise we would have that . This holds in particular for the chosen by the branching rule. The branching rule creates constraints for each possible vertex pair in that could be removed from , there is particularly one output constraint where is removed from . If , then by Condition (3) on solutions either for all or for all . However, since all is settled, we also have that for all or for all and furthermore, if and only if . Since we have that if and only if , one of the constraints in the first case is good.

Otherwise at least one of its endpoints is marked in  and the one of the constraints in the second case is good. ∎

The last rule, Branching Rule 4 requires that all other rules are not applicable. In this case the non-marked vertices induce the same cluster graph in every layer. Branching Rule 4 checks whether in every layer it is possible to turn the whole layer (including the marked vertices) into a cluster graph such that the cluster graph induced by the non-marked vertices stays the same and the edge modification budget is not violated in any layer.

Branching Rule 4.

For all we use to denote the set of all possible edge modifications where each edge is incident to at least one marked vertex, that turn into a cluster graph. More specifically, we have

If there is an such that then let and return the following constraints:

  1. For each return a constraint in which for all we put , , and .

  2. For each :

    • If there is no such that , then return a constraint with , , and : .

    • If there is no such that , then return a constraint with , and : .

If , then reject the current branch.

Lemma 6.

If the Greedy Rule, the Clean-up Rule, and Branching Rules 12, and 3 are not applicable, then Branching Rule 4 is safe.

Proof.

Let the input constraint be good and let be a solution for the input instance witnessing that is good. Since the Greedy Rule and Branching Rule 1 are not applicable, we have that all vertex pairs in are settled. Since Branching Rule 2 is not applicable, we have that for all we have that and is a cluster graph, where . Furthermore, we have that for all , otherwise Branching Rule 3 would be applicable, and that each does not contain vertex pairs with marked vertices, otherwise the Clean-up Rule would be applicable.

For each layer , Branching Rule 4 checks the minimum number of edge modifications involving at least one marked vertex to turn into a cluster graph. Since is already a cluster graph, this number always exists. Since does not contain vertex pairs with marked vertices, if the number of edge modifications needed is too large, i.e. larger than , there is at least one non-permanent, settled vertex pair in that is not in . If follows from an analogous argumentation to the one in the proof of Lemma 5 that the rule is safe. ∎

Input:
  • A set of graphs .

  • Two integers , and .

  • A set of marked vertices .

  • Edge modification sets .

  • A set of permanent vertex pairs.

if  or there is an such that  then return false Apply the first applicable rule in the following ordered list: Branching Rule 1, Greedy Rule, Branching Rule 2, Clean-up Rule, Branching Rule 3, and Branching Rule 4. return true
Algorithm 1 Multi-Layer Cluster Editing

To prove correctness of the algorithm, we first argue that, whenever the algorithm outputs true, then the input instance of Multi-Layer Cluster Editing was indeed a yes-instance. This follows in a straightforward manner from the fact, that if the algorithm outputs true, then none of the branching rules are applicable.

Lemma 7.

Given an instance of Multi-Layer Cluster Editing, if Algorithm 1 outputs true on input and the trivial partial solution , then is a yes-instance.

Proof.

Let be the input instance of Multi-Layer Cluster Editing. If the algorithm outputs true, then there is a constraint such that for all we have that and , and none of the branching rules are applicable. In the following we show that then there is a solution for such that . Let for all .

First of all, for all we have that , otherwise Branching Rule 3 would apply. Also, for all we have that , otherwise either Branching Rule 1 or the Greedy Rule would apply. Furthermore, we have that for all we have that is a cluster graph, otherwise Branching Rule 2 would apply.

It remains to show that there are for all such that is a cluster graph and . Since Branching Rule 4 is not applicable, we know that these sets exist: take for the definition of of Branching Rule 4. Then we have that and hence . ∎

It remains to show that, whenever the input instance of the algorithm is a yes-instance, then the algorithm outputs true. To this end, we define the quality of a good constraint and show that the algorithm increases the quality until it eventually finds a solution.

Definition 3 (Quality of a constraint).

Let be an instance of Multi-Layer Cluster Editing. The quality  of a constraint for is

Lemma 8.

Let be a good constraint for a yes-instance of Multi-Layer Cluster Editing. If applicable, each of the Greedy Rule and Branching Rules 123, and 4 return a good constraint with strictly increased quality in comparison to .

Proof.

We show the claim individually for each of the rules. We consider each of the possible returned constraints  and show that, assuming that is good, then the quality of is strictly larger than .

It is easy to see that the Greedy Rule decreases the number of frequent or scarce vertex pairs by one.

Next, we consider Branching Rule 1. Observe that if be a yes-instance of Multi-Layer Cluster Editing and a good constraint for , then we have that all vertex pairs in are settled. Otherwise there would be a contradiction to the fact that in a good constraint there is no such that needs to be marked and that for all we have that the edge modifications in are can all be kept in a solution. Hence, in the first two cases, the branching rule increases . In the other two cases, the branching rule increases .

Next, we consider Branching Rule 2. In the first three cases, the branching rule increases . In the remaining cases, the branching rule increases . Lastly, we consider Branching Rules 3 and 4. In each case, the branching rules increase . ∎

Next we show that the notion of quality of a good constraint is indeed a measure that allows us to argue that the algorithm eventually produces a solution (if it exists).

Lemma 9.

Let be a yes-instance of Multi-Layer Cluster Editing, then there is a constant such that for every good constraint we have that and there is at least one good constraint with . Furthermore, for any good constraint with , we have that Algorithm 1 outputs true on input and .

Proof.

We first show the first part of the statement. Let be a yes-instance of Multi-Layer Cluster Editing. By Definition 1, for any good constraint there has to be a solution for with , there is no such that , and for all we have that .

Fix a solution , let be the set of all good constraints with , there is no such that , and for all . It easy to see that for good constraints in the quality is maximized when and . Note that this also implies that for all and that there are no frequent or scarce vertex pairs.

Let be the set of all solutions for a given yes-instance of Multi-Layer Cluster Editing. Then we have that

and this already yields that this maximum is reached by at least one good constraint.

The second part of the statement follows from Lemma 8 and the safeness of the branching rules. Note that the order in which rules are applied (see Algorithm 1) ensures safeness for all branching rules (Lemmata 12345, and 6). Let be a good constraint with . By Lemma 8, we have that each branching rule increases the quality of the good constraint. Since has maximum quality, there is no good constraint with a higher quality. Hence, no branching rule is applicable, otherwise we would have a contradiction to the safeness of the rule. It follows that the algorithm outputs true. ∎

Now we have all the tools to show the correctness of Algorithm 1. Lemma 7 ensures that we only output true if the input is actually a yes-instance and Lemmata 8 and 9 together with the safeness of all branching rules ensures that if the input is a yes-instance, the algorithm outputs true.

Corollary 2 (Correctness of Algorithm 1).

Given an instance of Multi-Layer Cluster Editing, Algorithm 1 outputs true on input and the trivial good constraint if and only if is a yes-instance.

Proof.

We have that if Algorithm 1 outputs true on input and the trivial good constraint , then is a yes-instance. This follows from Lemma 7. It remains to show the other direction.

Let be a yes-instance of Multi-Layer Cluster Editing. By Observation 2 we have that is a good constraint. Note that the order in which rules are applied (see Algorithm 1) ensures safeness for all branching rules (Lemmata 12345, and 6). Furthermore, by Lemma 8 we have that all branching rules except the Clean-up Rule strictly increase the quality of a good constraint. It is easy to see that the Clean-up Rule does not decrease the quality of a good constraint and that it can be applied at most times before either one of the other rules apply or the algorithm terminates. The quality of is at least , hence the algorithm eventually reaches a good constraint with quality (or outputs true earlier). By Lemma 9 the algorithm then outputs true. ∎

It remains to show that Algorithm 1 has the claimed running time upper-bound. We can check that all branching rules create at most recursive calls. The differentiation between unsettled, frequent and scarce vertex pairs ensures that the edge modification sets in sufficiently many layers increase for the search tree to have depth of at most . The time needed to apply a branching rule is dominated by Branching Rule 4, where we essentially have to solve classical Cluster Editing in every layer.

Lemma 10.

The running time of Algorithm 1 is in .

Proof.

We follow the following straight forward approach to bound the running time of Algorithm 1. First, we bound the size of the search tree, and then the computation spent in each node of the search tree.

The search tree is spanned by the non-degenerate branching rules. To bound the depth of the search tree, we show that each branching rule increases either by exactly one or it increases . If or , then the algorithm terminates (Line 1). In the first two cases, Branching Rule 1 increases by at least since the vertex pair that is modified in unsettled. In the case of Branching Rule 2, it is important to not, that if it is applicable, then the Greedy Rule was not applicable since it appears earlier in Algorithm 1. Hence, in the first three cases increases by if the modified vertex pair was originally settled, and if the vertex pair was originally frequent or scarce, since in that case a modification decreases for at most different layers and increases for at least different layers . By a similar argument, also Branching Rules 3 and 4 increase by at least . Hence, we can upper-bound the depth of the search tree with . It is not difficult to check that the number of children of each node in the search tree is asymptotically upper-bounded by . It follows that the size of the whole search tree is in .

The Greedy Rule and the Clean-up Rule play a special role. Note that if they are applicable, they do not create branches. Also, in both cases it is easy to check that an application of the rule cannot make an earlier rule applicable in the recursive call. Hence, these rules are essentially applied in a loop until they are not applicable any more. The Greedy Rule can be applied at most times consecutively and the Clean-up Rule can be applied at most times consecutively.

Lastly, we analyze for each rule, how much time is needed to check whether the rule is applicable and if so compute the constraints it outputs. It is not difficult to check that the algorithm needs time to check whether Branching Rule 1 is applicable and output the constraints, same for the Greedy Rule. To check the applicability of Branching Rule 2, the algorithm needs to check whether there is a layer containing an induced . This can be done in time, where is the maximum number of edges in a layer.555The proof is folklore and proceeds roughly as follows. Find the connected components of the input graph. Next, determine whether there are two nonadjacent vertices  in a connected component. If so, then find an induced along a shortest path between and . Otherwise, there is no induced . Nonadjacent vertices in a connected component can be checked for in time summed over each vertex  in that component. Hence, overall we need time to check whether Branching Rule 2 is applicable and in this time we can also compute the output constraints. For the Clean-up Rule, we need time to check whether it is applicable and to output the new constraint. In the case of Branching Rule 3, we need time to check whether it is applicable and to output the constraints. For the last rule, Branching Rule 4, we essentially need to solve Cluster Editing on each layer to check whether the rule is applicable. This can be done in time [6]. In the same time, we can also compute the constraints. Hence, overall, the algorithm has running time . ∎

Remark. It is not difficult to see that Multi-Layer Cluster Editing can also be solved in time, which is incomparable to the running time of Algorithm 1 since  might be as large as : First guess the marked vertices. Then guess how many clusters (i.e. disjoint cliques) there are in the modified graph induced by the non-marked vertices, and for every non-marked vertex, guess to which cluster it belongs. Now for every layer, independently guess how many additional clusters there are consisting only of marked vertices, and for every marked vertex, guess to which cluster it belongs. Finally check, whether such a solution can be obtained by at most modifications per layer.

4 Kernelization of Multi-Layer Cluster Editing

In this section we investigate the kernelizability of Multi-Layer Cluster Editing for different combinations of the four parameters as introduced in Section 1. More specifically, we identify the parameter combinations for which Multi-Layer Cluster Editing admits a polynomial kernel, and then we identify the parameter combination for which no polynomial kernels exit, unless .

4.1 A Polynomial Kernel for Multi-Layer Cluster Editing

We start with presenting a polynomial kernel for the parameter combination . Formally, we prove the following theorem.

Theorem 2.

Multi-Layer Cluster Editing admits a polynomial kernel with respect to the parameter combination . In particular, the problem admits a kernel of size that can be computed in time.

We provide several reduction rules that subsequently modify the instance and we assume that if a particular rule is to be applied, then the instance is reduced with respect to all previous rules, that is, all previous rules were already exhaustively applied. For each rule we immediately prove its correctness, that is, the produced instance is a yes-instance if and only if the original instance is. However, we leave the analysis of the running time of testing whether particular reduction rule applies and of applying the rule until all the rules are presented.

To keep track of the budget in the individual layers we introduce the following intermediate problem.

Multi-Layer Cluster Editing with Separate Budgets Input: graphs and integers . Question: Is there a vertex subset with