 # Parameterized Algorithms for Diverse Multistage Problems

The world is rarely static – many problems need not only be solved once but repeatedly, under changing conditions. This setting is addressed by the "multistage" view on computational problems. We study the "diverse multistage" variant, where consecutive solutions of large variety are preferable to similar ones, e.g. for reasons of fairness or wear minimization. While some aspects of this model have been tackled before, we introduce a framework allowing us to prove that a number of diverse multistage problems are fixed-parameter tractable by diversity, namely Perfect Matching, s-t Path, Matroid Independent Set, and Plurality Voting. This is achieved by first solving special, colored variants of these problems, which might also be of independent interest.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the multistage setting, given a sequence of instances of some problem, one asks whether there is a corresponding sequence of solutions such that consecutive solutions relate in some way to each other. Often the aim is to find consecutive solutions that are very similar [26, 19, 21, 6, 5, 20]. This is reasonable when changing between distinct solutions incurs some form of cost. In other settings, the opposite goal is more reasonable, that is, consecutive solutions should be very different. This is a natural goal when wear minimization, load distribution, or resilience against failures or attacks are of interest. This “diverse multistage” setting is what we want to focus on in this paper. Here, given a sequence of instances of some decision problem, the task is to find a sequence of solutions such that the diversity, i.e., the size of the symmetric difference of any two consecutive solutions is at least .

This problem has already received some attention in the literature: Fluschnik et al.  studied the problem of finding diverse - paths and Bredereck et al.  considered series of committee elections. In a similar setting, but aiming for large symmetric difference between every two (i.e., not just consecutive) solutions, Baste et al.  provide a framework for parameterization by treewidth, while Fomin et al. [24, 25] focus on the case that all problems are defined on the same graph and study matching, independent set, and matroids.

We briefly give a formal definition. Assume  to be some decision problem which asks whether the family of solutions of an instance  of  is non-empty, where is some base set encompassing all possible solutions. For example, for an instance  of Vertex Cover, the set is the set of all vertices and  is the set of all vertex covers within the size bound. The problem Diverse Multistage is now the following.

Diverse Multistage Input: A sequence of instances of and an integer . Question: Is there a sequence of solutions  such that for all ?

### Our contributions.

We present a general framework which allows us to prove fixed-parameter tractability of Diverse Multistage parameterized by the diversity  for several problems . This includes finding diverse matchings, but also diverse commitees (answering an open question by Bredereck et al. ), diverse - paths, and diverse independent sets in matroids such as spanning forests. Finally, we show that similar results cannot be expected for finding diverse vertex covers.

Generally, our framework can be applied to Diverse Multistage whenever one can solve a -colored variant of efficiently. Formally, this variant is defined as follows.

4-Colored Exact Input: An instance of , a coloring , and . Output: A solution such that for all  or “no” if no solution exists.

Our main result reads as follows. If an instance of 4-Colored Exact can be solved in time, then an instance of Diverse Multistage of size can be solved in time, where is the maximum of parameter over all instances of in .111For example, if the input is a sequence of graphs and is the treewidth, then is the maximum treewidth over all graphs in the input. We prove Section 1 in Section 3 in a more general form which also allows solving 4-Colored Exact by a Monte Carlo algorithm. We then apply our framework to the following problems:

Committee Election (Section 4). In Diverse Multistage Plurality Voting, we are given a set of agents, a set of candidates, and many voting profiles . The goal is to find a sequence of committees  such that each committee  is of size at most  and gets at least  votes in the voting profile  (i.e., ), and for all . We show that there is a -time algorithm to solve a Diverse Multistage Plurality Voting instance . This answers an open question of Bredereck et al. . Later, in Section 7, we generalize the algorithm used to solve -Colored Exact Plurality Voting to matroids.

Perfect Matching (Section 5). In the multistage setting, Perfect Matching is among the problems most intensively studied [26, 3, 4, 13, 39]. Given a sequence of graphs and an integer , Diverse Multistage Perfect Matching asks whether there is a sequence  such that each  is a perfect matching in , and for all . We show that there is a randomized -time algorithm to solve a Diverse Multistage Perfect Matching instance

with constant error probability. This stands in remarkable contrast to the

W-hardness of the (non-diverse) Multistage Perfect Matching, when parameterized by  . To apply our framework, we establish an algebraic algorithm using the Pfaffian of a specific variant of the Tutte matrix to solve -Colored Exact Perfect Matching on an -vertex graph in time with low error probability.

- Path (Section 6). Studying - Path in the multistage setting was already suggested in the seminal work of Gupta et al. . In Diverse Multistage - Path one is given a sequence of graphs , two distinct vertices and , and an integer , and asks whether there is a sequence such that each  is an - Path in , and for all . Fluschnik et al.  provided a comprehensive study of finding - paths of bounded length in the multistage setting from the viewpoint of parameterized complexity. Among other results, they showed that Diverse Multistage - Path is NP-hard but fixed-parameter tractable when parameterized by the maximum length of an - Path in the solution. We show that Diverse Multistage - Path parameterized by  is fixed-parameter tractable. At first glance, using our framework seems unpromising since -Colored Exact - Path can presumably not be solved in polynomial time (it is NP-hard by a straight-forward reduction from Hamiltonian Path). However, we develop a win/win strategy around a generalization of the Erdős-Pósa theorem for long cycles due to Mousset et al.  so that we have to solve -Colored Exact - Path only on graphs on which the treewidth is upper-bounded in the parameter .

In Section 8, we complement our fixed-parameter tractability results with a W-hardness for Diverse Multistage Vertex Cover when parameterized by .

## 2 Preliminaries

We denote by  and  the natural numbers excluding and including zero, respectively. For , let . For two sets  and , we denote by  the symmetric difference of  and , and by  the disjoint union of  and . For a function , let  and , where . We also use the notations  and  as shorthands for  and , respectively.

A Monte Carlo algorithm, or an algorithm with error probability , is a randomized algorithm that returns a correct answer with probability .

Let  be a finite alphabet. A parameterized problem  is a subset . An instance  is a yes-instance of  if and only if  (otherwise, it is a no-instance). A parameterized problem  is fixed-parameter tractable (in FPT) if for every input  one can decide in  time whether , where  is some computable function only depending on . A W-hard parameterized problem is not fixed-parameter tractable unless FPT=W. We refer to Downey and Fellows  and Cygan et al.  for more material on parameterized complexity.We use standard notation from graph theory . Throughout this paper, we assume graphs to be simple and undirected.

## 3 The General Framework

In this section, we introduce a general framework to show (for some decision problem ) fixed-parameter tractability of Diverse Multistage parameterized by . Recall that, for every instance of decision problem , we denote the family of solutions by and the input size of is at least . For the reminder of this section we assume that  for all instances of . The framework is applicable to Diverse Multistage if there is an efficient algorithm for 4-Colored Exact . Formally, we use the following prerequisite, which is slightly more general than in Section 1.

###### Assumption .

There are computable functions such that for every for which is defined, there is a Monte-Carlo algorithm with error probability  and running time , that solves an instance  of 4-Colored Exact , where is some parameter of and is monotone non-increasing.

We allow an error probability in Section 3 because for one of our applications (in Section 5), no other polynomial-time algorithm is known. The goal is to prove the following. Let Section 3 be true. Then any size- instance  of Diverse Multistage can be solved in  time by a Monte-Carlo algorithm with error probability , where is the maximum of parameter over all instances of in , and is an arbitrary probability for which the above expression is defined.222For example, if we only have an algorithm with non-zero error probability, then is excluded. The proof of Section 3 is deferred to the end of this section. Note that, if we have a non-randomized algorithm in Section 3 (that is, is defined and maps always to one), then Section 1 follows directly from Section 3.

The underlying strategy of the algorithm for a Diverse Multistage -instance behind Section 3 is to compute for each instance of in a solution family such that the Cartesian product of these families contains a solution for if and only if is a yes-instance. Once these families are obtained, we can check whether is a yes-instance by dynamic programming. To this end, we compute a small subset of satisfying the following definition. Let be a set family. A subfamily of is called an -diverse representative of if, for any and sets  with , there is an such that . First of all, we note that -diverse representatives can be rather small. Let be a set family and . If for all distinct , then is an -diverse representative of .

###### Proof.

Assume for contradiction that there exist sets and with for all . Without loss of generality, assume that . Then for we have by the triangle inequality. Therefore, . Again, by the triangle inequality , i.e., — a contradiction. ∎

In the following, we measure the distance of two solutions by the size of the symmetric difference. In a nutshell, we compute an -diverse representative of the family of solutions by first trying to compute three solutions which are far apart from each other (that is, size of symmetric difference at least ). If this succeeds, then by Section 3 we are done. Otherwise, we distinguish between three cases.

No solution.

If there is no solution at all, then trivially  is an -diverse representative of the family of solutions.

One solution.

If we only find one solution to the instance of , then each other solution is close to . Hence, for any two sets , if one of them is far away from , then by the triangle inequality it is also far away from every other solution and can be safely ignored. For those sets which are close to , we can exploit the upper bound on the symmetric difference by using color-coding  and then applying Section 3 to compute an -diverse representative of the family of solutions. This case is handled in Section 3.

Two solutions.

If we find two diverse solutions and such that no other solution is far away from both, then and partition the solution space into two parts: the solutions close to and those close to . Again, given two sets , if either of them is far away from  and , then we may ignore it. By including  and  in our family, we may further assume that is similar to  and is similar to . We distinguish two subcases. If the distance between  and  is very large, then  is far away from all solutions in the second part and  is far away from all solutions in the first part. We can thus ignore one of them (say ) and exploit the fact that , , and all solutions of interest are close to each other to use color-coding and then apply Section 3. In the other subcase where the distance between  and  is bounded, we can utilize that fact similarly. This case is handled in Section 3.

Hereafter, the details. Before we dive into the case distinction outlined above, we need to prove two technical lemmata, telling us how to build a diverse representative set that works for all sets obeying some given coloring of the elements of . These will later work as building blocks in the construction of proper diverse representatives. In the first lemma, only two colors are used, and we are only concerned with one arbitrary set  instead of two. Let Section 3 be true. Given an instance of of size , a coloring , and a solution , one can compute in  time and with error probability at most  a family  of size at most  such that for any and any  with and , there is with and .

###### Proof.

Let , , , and .

Start with . Then, for each and each partition , use algorithm  to search in  time and with error probability at most  for a set such that for all . If this succeeds, then we add to . Since there are possibilities for , the probability of an error occurring is upper-bounded by . Moreover, the size of is upper-bounded by  and hence the time required is bounded by .

It remains to be proven that has the desired properties. Let be arbitrary and set for all . By construction,  contains a set  such that . We then have .

Let be a set with and . Since we have

 |A∩S∩c1|=|A∩c1|≥|A∩ˆS∩c1| (1)

and since , we have that

 |A∩S∩c2|=|S∩c2|=m2+m4=|ˆS∩c2|≥|A∩ˆS∩c2|. (2)

By adding (1) and (2) we obtain which in turn implies since . ∎

The next lemma extends the approach of Section 3 to the case where we have four colors and two arbitrary sets . Let Section 3 be true. Given an instance of of size , a coloring , one can compute in  time and with error probability at most  a family  of size at most  such that for any and all sets with there is with for all .

###### Proof.

Begin with . Then, for each and each partition , use algorithm to search in  time and with error probability at most  for an such that for all . If this succeeds, then add to . Since there are possibilities for , the probability of an error occurring is upper-bounded by . Moreover, the size of is at most  and thus the overall running time is .

Now let be arbitrary. Set , for all . By construction there is such that for all . It remains to be proven that  has the desired properties. To this end, let be two sets as stated in the lemma. By symmetry, it suffices to show that .

Since we have

 |S∩A∩c1,3|=|S∩c1,3|=m1+m3=|ˆS∩c1,3|≥|ˆS∩A∩c1,3| (3)

and since , we have

 |S∩A∩c2,4|=|A∩c2,4|≥|ˆS∩A∩c2,4|. (4)

By adding (3) and (4), we obtain and thus since . ∎

We now describe how we generate the colorings required for using Sections 3 and 3. Color-coding  is well-established in the toolbox of parameterized algorithms. While color-coding was initially described as a randomized technique, we use universal sets  to derandomize this technique as shown in the next lemma. Interestingly, without this derandomization the error probability of the color-coding step would later propagate through the dynamic program and consequently also depend on the number of instances of in the input instance of Diverse Multistage . The derandomization works as follows.

For any set  of size  and any one can compute in time a family of functions such that for any with there is a  such that , for all .

###### Proof.

Let . By a result of Naor et al. , one can compute in time a so-called -universal set which is a family  such that for every with  the family contains all subsets of . Let . We then define , , by

 cj(ai):=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩1,if i,i+n∈Uj,2,if i∈Uj and i+n∉Uj,3,if i∉Uj and i+n∈Uj, and4,if i,i+n∉Uj.

Now let be an arbitrary -partition of a subset of of size at most . Consider . We assume that is of size , otherwise we add arbitrary elements from . Since there is an  such that . Hence, , for all . ∎

We now show how to generate an -diverse representative of the family of solutions if there is one solution  from which no other solution differs by more than . Let Section 3 be true. Given an instance of of size , and a solution such that each satisfies , one can compute in time and with error probability  an -diverse representative of of size at most .

###### Proof.

For simplicity, let . Apply Section 3 with to compute in  time a family of colorings . By Section 3 this family has size . For each , apply Section 3 to and to compute a family  with error probability . Observe that the probability of an error occurring at any of the  steps is bounded by . Choose . According to Section 3 the size of is upper-bounded by and the time required is bounded by .

We now show that is an -diverse representative of . To this end, let and let be two arbitrary sets such that and . Since , we may assume by symmetry that, say, , otherwise we are done. Note that and that . We say that some coloring  is good for if the conditions of Section 3 are satisfied, i.e. if

 A∖(B∪S) ⊆c1, B∖(A∪S) ⊆c2, (A∩B)∖S ⊆c3, and S∖(A∩B) ⊆c4.

We distinguish between two cases.

Case 1: .

Then . According to Section 3 there is an such that coloring is good for , since . By Section 3 and construction of , there is an such that and .

Case 2: .

Set . According to Section 3 there is an such that coloring is good for , since . Thus, by Section 3 and by the construction of there is an such that . Finally, we observe that by the triangle inequality.

This completes the proof. ∎

Next, we show how to generate an -diverse representative of the family of solutions if there are two solutions such that no other solution differs from both by more than . Let Section 3 be true. Let be an -instance of size , and such that and each has . Then one can compute, in time and with error probability , an -diverse representative of of size .

###### Proof.

For simplicity, let . Apply Section 3 with to compute in  time a family of colorings . By Section 3 this family has size .

For each , apply Section 3 to and to compute a family  of size at most with error probability . Observe that the probability of an error occurring at any of the  steps is upper-bounded by and the computation of all  takes  time.

Next, define another family of colorings by setting . Then, for each , apply Section 3, to , and to compute a family , with the same error probability and time bound as before. Repeat with instead of  to obtain .

Set . Then has size at most . Computing  takes  time. The probability of an error occurring at any step while computing  is upper-bounded by .

We now show that is an -diverse representative of . To this end, let and be two arbitrary sets such that and . We may assume for each that or , otherwise we are done. By symmetry, we may assume . Then by the triangle inequality and thus we must have . By assumption, , so let without loss of generality . Note that . We distinguish the following two cases.

Case 1: .

Then, . We say that some coloring  is good for if the conditions of Section 3 are satisfied, i.e. if According to Section 3 there is an such that coloring is good for , since . By Section 3, there is such that such that and .

Case 2: .

Since , there is such that By Section 3 there is such that and . Finally, observe that by the triangle inequality .

This completes the proof. ∎

With Sections 3, 3 and 3 at hand we can formalize the case distinction outlined in the beginning of the section. This gives us a way to efficiently compute an -diverse representative in general. Let Section 3 be true. Let  be an instance of of size . One can compute an -diverse representative of of size in time with error probability at most .

###### Proof.

Our procedure to compute an -diverse representative of works in four steps.

Step 1.

We use with a monochrome coloring and error probability  to search for some in by guessing the size of . Observe that the probability of an error occurring in any of the searches is upper-bounded by If we do not succeed, then output the empty set and we are done. Otherwise, we proceed with the next step.

Step 2.

For each pair with and , try to compute with and in  time and with error probability  using with a -coloring where elements in are assigned one color and elements in are assigned the second color. If no such  is found for any pair , then for every the symmetric difference . In that case we may apply Section 3 with error probability and are done. Observe that the probability of an error occurring at any step until here is upper-bounded by and the overall running time is . If we found such an , then we proceed with the next step.

Step 3.

We have with . Define the coloring  by

 c(v):=⎧⎨⎩i if v∈Mi∖Mj % for {i,j}={1,2},3 if v∈M1∩M2, and4 otherwise.

For all with and and , search for a solution with , for all , using with and error probability . For all these combined, we thus have error probability  and need  time. If no such  is found for any choice of , then any must have . In that case we may apply Section 3 with error probability and are done. Observe that the probability of an error occurring at any step until here is upper-bounded by and the overall running time is . In case that we found such an , we proceed with the next step.

Step 4.

We have such that for all distinct . Hence, by Section 3, we can output . This completes the proof.∎

Finally, Section 3 allows us to formulate a dynamic program for Diverse Multistage and prove Section 3.

###### Proof of Section 3.

Let be an instance of Diverse Multistage , where . For each we apply Section 3 to obtain an -diverse representative  of that has size at most in  time with error probability . Observe that the probability of an error occurring at any step is upper-bounded by . Now we use the following dynamic program to check whether is a yes-instance.

 ∀i∈{2,3,…,τ},S∈Fi:D[i,S]:={⊤if ∃ˆS∈Fi−1:D[i−1,ˆS]=⊤ and |SΔˆ