In the multistage setting, given a sequence of instances of some problem, one asks whether there is a corresponding sequence of solutions such that consecutive solutions relate in some way to each other. Often the aim is to find consecutive solutions that are very similar [26, 19, 21, 6, 5, 20]. This is reasonable when changing between distinct solutions incurs some form of cost. In other settings, the opposite goal is more reasonable, that is, consecutive solutions should be very different. This is a natural goal when wear minimization, load distribution, or resilience against failures or attacks are of interest. This “diverse multistage” setting is what we want to focus on in this paper. Here, given a sequence of instances of some decision problem, the task is to find a sequence of solutions such that the diversity, i.e., the size of the symmetric difference of any two consecutive solutions is at least .
This problem has already received some attention in the literature: Fluschnik et al.  studied the problem of finding diverse - paths and Bredereck et al.  considered series of committee elections. In a similar setting, but aiming for large symmetric difference between every two (i.e., not just consecutive) solutions, Baste et al.  provide a framework for parameterization by treewidth, while Fomin et al. [24, 25] focus on the case that all problems are defined on the same graph and study matching, independent set, and matroids.
We briefly give a formal definition. Assume to be some decision problem which asks whether the family of solutions of an instance of is non-empty, where is some base set encompassing all possible solutions. For example, for an instance of Vertex Cover, the set is the set of all vertices and is the set of all vertex covers within the size bound. The problem Diverse Multistage is now the following.Diverse Multistage Input: A sequence of instances of and an integer . Question: Is there a sequence of solutions such that for all ?
We present a general framework which allows us to prove fixed-parameter tractability of Diverse Multistage parameterized by the diversity for several problems . This includes finding diverse matchings, but also diverse commitees (answering an open question by Bredereck et al. ), diverse - paths, and diverse independent sets in matroids such as spanning forests. Finally, we show that similar results cannot be expected for finding diverse vertex covers.
Generally, our framework can be applied to Diverse Multistage whenever one can solve a -colored variant of efficiently. Formally, this variant is defined as follows.4-Colored Exact Input: An instance of , a coloring , and . Output: A solution such that for all or “no” if no solution exists.
Our main result reads as follows. If an instance of 4-Colored Exact can be solved in time, then an instance of Diverse Multistage of size can be solved in time, where is the maximum of parameter over all instances of in .111For example, if the input is a sequence of graphs and is the treewidth, then is the maximum treewidth over all graphs in the input. We prove Section 1 in Section 3 in a more general form which also allows solving 4-Colored Exact by a Monte Carlo algorithm. We then apply our framework to the following problems:
Committee Election (Section 4). In Diverse Multistage Plurality Voting, we are given a set of agents, a set of candidates, and many voting profiles . The goal is to find a sequence of committees such that each committee is of size at most and gets at least votes in the voting profile (i.e., ), and for all . We show that there is a -time algorithm to solve a Diverse Multistage Plurality Voting instance . This answers an open question of Bredereck et al. . Later, in Section 7, we generalize the algorithm used to solve -Colored Exact Plurality Voting to matroids.
Perfect Matching (Section 5). In the multistage setting, Perfect Matching is among the problems most intensively studied [26, 3, 4, 13, 39]. Given a sequence of graphs and an integer , Diverse Multistage Perfect Matching asks whether there is a sequence such that each is a perfect matching in , and for all . We show that there is a randomized -time algorithm to solve a Diverse Multistage Perfect Matching instance
with constant error probability. This stands in remarkable contrast to theW-hardness of the (non-diverse) Multistage Perfect Matching, when parameterized by . To apply our framework, we establish an algebraic algorithm using the Pfaffian of a specific variant of the Tutte matrix to solve -Colored Exact Perfect Matching on an -vertex graph in time with low error probability.
- Path (Section 6). Studying - Path in the multistage setting was already suggested in the seminal work of Gupta et al. . In Diverse Multistage - Path one is given a sequence of graphs , two distinct vertices and , and an integer , and asks whether there is a sequence such that each is an - Path in , and for all . Fluschnik et al.  provided a comprehensive study of finding - paths of bounded length in the multistage setting from the viewpoint of parameterized complexity. Among other results, they showed that Diverse Multistage - Path is NP-hard but fixed-parameter tractable when parameterized by the maximum length of an - Path in the solution. We show that Diverse Multistage - Path parameterized by is fixed-parameter tractable. At first glance, using our framework seems unpromising since -Colored Exact - Path can presumably not be solved in polynomial time (it is NP-hard by a straight-forward reduction from Hamiltonian Path). However, we develop a win/win strategy around a generalization of the Erdős-Pósa theorem for long cycles due to Mousset et al.  so that we have to solve -Colored Exact - Path only on graphs on which the treewidth is upper-bounded in the parameter .
In Section 8, we complement our fixed-parameter tractability results with a W-hardness for Diverse Multistage Vertex Cover when parameterized by .
We denote by and the natural numbers excluding and including zero, respectively. For , let . For two sets and , we denote by the symmetric difference of and , and by the disjoint union of and . For a function , let and , where . We also use the notations and as shorthands for and , respectively.
A Monte Carlo algorithm, or an algorithm with error probability , is a randomized algorithm that returns a correct answer with probability .
Let be a finite alphabet. A parameterized problem is a subset . An instance is a yes-instance of if and only if (otherwise, it is a no-instance). A parameterized problem is fixed-parameter tractable (in FPT) if for every input one can decide in time whether , where is some computable function only depending on . A W-hard parameterized problem is not fixed-parameter tractable unless FPT=W. We refer to Downey and Fellows  and Cygan et al.  for more material on parameterized complexity.We use standard notation from graph theory . Throughout this paper, we assume graphs to be simple and undirected.
3 The General Framework
In this section, we introduce a general framework to show (for some decision problem ) fixed-parameter tractability of Diverse Multistage parameterized by . Recall that, for every instance of decision problem , we denote the family of solutions by and the input size of is at least . For the reminder of this section we assume that for all instances of . The framework is applicable to Diverse Multistage if there is an efficient algorithm for 4-Colored Exact . Formally, we use the following prerequisite, which is slightly more general than in Section 1.
There are computable functions such that for every for which is defined, there is a Monte-Carlo algorithm with error probability and running time , that solves an instance of 4-Colored Exact , where is some parameter of and is monotone non-increasing.
We allow an error probability in Section 3 because for one of our applications (in Section 5), no other polynomial-time algorithm is known. The goal is to prove the following. Let Section 3 be true. Then any size- instance of Diverse Multistage can be solved in time by a Monte-Carlo algorithm with error probability , where is the maximum of parameter over all instances of in , and is an arbitrary probability for which the above expression is defined.222For example, if we only have an algorithm with non-zero error probability, then is excluded. The proof of Section 3 is deferred to the end of this section. Note that, if we have a non-randomized algorithm in Section 3 (that is, is defined and maps always to one), then Section 1 follows directly from Section 3.
The underlying strategy of the algorithm for a Diverse Multistage -instance behind Section 3 is to compute for each instance of in a solution family such that the Cartesian product of these families contains a solution for if and only if is a yes-instance. Once these families are obtained, we can check whether is a yes-instance by dynamic programming. To this end, we compute a small subset of satisfying the following definition. Let be a set family. A subfamily of is called an -diverse representative of if, for any and sets with , there is an such that . First of all, we note that -diverse representatives can be rather small. Let be a set family and . If for all distinct , then is an -diverse representative of .
Assume for contradiction that there exist sets and with for all . Without loss of generality, assume that . Then for we have by the triangle inequality. Therefore, . Again, by the triangle inequality , i.e., — a contradiction. ∎
In the following, we measure the distance of two solutions by the size of the symmetric difference. In a nutshell, we compute an -diverse representative of the family of solutions by first trying to compute three solutions which are far apart from each other (that is, size of symmetric difference at least ). If this succeeds, then by Section 3 we are done. Otherwise, we distinguish between three cases.
- No solution.
If there is no solution at all, then trivially is an -diverse representative of the family of solutions.
- One solution.
If we only find one solution to the instance of , then each other solution is close to . Hence, for any two sets , if one of them is far away from , then by the triangle inequality it is also far away from every other solution and can be safely ignored. For those sets which are close to , we can exploit the upper bound on the symmetric difference by using color-coding  and then applying Section 3 to compute an -diverse representative of the family of solutions. This case is handled in Section 3.
- Two solutions.
If we find two diverse solutions and such that no other solution is far away from both, then and partition the solution space into two parts: the solutions close to and those close to . Again, given two sets , if either of them is far away from and , then we may ignore it. By including and in our family, we may further assume that is similar to and is similar to . We distinguish two subcases. If the distance between and is very large, then is far away from all solutions in the second part and is far away from all solutions in the first part. We can thus ignore one of them (say ) and exploit the fact that , , and all solutions of interest are close to each other to use color-coding and then apply Section 3. In the other subcase where the distance between and is bounded, we can utilize that fact similarly. This case is handled in Section 3.
Hereafter, the details. Before we dive into the case distinction outlined above, we need to prove two technical lemmata, telling us how to build a diverse representative set that works for all sets obeying some given coloring of the elements of . These will later work as building blocks in the construction of proper diverse representatives. In the first lemma, only two colors are used, and we are only concerned with one arbitrary set instead of two. Let Section 3 be true. Given an instance of of size , a coloring , and a solution , one can compute in time and with error probability at most a family of size at most such that for any and any with and , there is with and .
Let , , , and .
Start with . Then, for each and each partition , use algorithm to search in time and with error probability at most for a set such that for all . If this succeeds, then we add to . Since there are possibilities for , the probability of an error occurring is upper-bounded by . Moreover, the size of is upper-bounded by and hence the time required is bounded by .
It remains to be proven that has the desired properties. Let be arbitrary and set for all . By construction, contains a set such that . We then have .
The next lemma extends the approach of Section 3 to the case where we have four colors and two arbitrary sets . Let Section 3 be true. Given an instance of of size , a coloring , one can compute in time and with error probability at most a family of size at most such that for any and all sets with there is with for all .
Begin with . Then, for each and each partition , use algorithm to search in time and with error probability at most for an such that for all . If this succeeds, then add to . Since there are possibilities for , the probability of an error occurring is upper-bounded by . Moreover, the size of is at most and thus the overall running time is .
Now let be arbitrary. Set , for all . By construction there is such that for all . It remains to be proven that has the desired properties. To this end, let be two sets as stated in the lemma. By symmetry, it suffices to show that .
We now describe how we generate the colorings required for using Sections 3 and 3. Color-coding  is well-established in the toolbox of parameterized algorithms. While color-coding was initially described as a randomized technique, we use universal sets  to derandomize this technique as shown in the next lemma. Interestingly, without this derandomization the error probability of the color-coding step would later propagate through the dynamic program and consequently also depend on the number of instances of in the input instance of Diverse Multistage . The derandomization works as follows.
For any set of size and any one can compute in time a family of functions such that for any with there is a such that , for all .
Let . By a result of Naor et al. , one can compute in time a so-called -universal set which is a family such that for every with the family contains all subsets of . Let . We then define , , by
Now let be an arbitrary -partition of a subset of of size at most . Consider . We assume that is of size , otherwise we add arbitrary elements from . Since there is an such that . Hence, , for all . ∎
We now show how to generate an -diverse representative of the family of solutions if there is one solution from which no other solution differs by more than . Let Section 3 be true. Given an instance of of size , and a solution such that each satisfies , one can compute in time and with error probability an -diverse representative of of size at most .
For simplicity, let . Apply Section 3 with to compute in time a family of colorings . By Section 3 this family has size . For each , apply Section 3 to and to compute a family with error probability . Observe that the probability of an error occurring at any of the steps is bounded by . Choose . According to Section 3 the size of is upper-bounded by and the time required is bounded by .
We now show that is an -diverse representative of . To this end, let and let be two arbitrary sets such that and . Since , we may assume by symmetry that, say, , otherwise we are done. Note that and that . We say that some coloring is good for if the conditions of Section 3 are satisfied, i.e. if
We distinguish between two cases.
- Case 1: .
- Case 2: .
This completes the proof. ∎
Next, we show how to generate an -diverse representative of the family of solutions if there are two solutions such that no other solution differs from both by more than . Let Section 3 be true. Let be an -instance of size , and such that and each has . Then one can compute, in time and with error probability , an -diverse representative of of size .
For each , apply Section 3 to and to compute a family of size at most with error probability . Observe that the probability of an error occurring at any of the steps is upper-bounded by and the computation of all takes time.
Next, define another family of colorings by setting . Then, for each , apply Section 3, to , and to compute a family , with the same error probability and time bound as before. Repeat with instead of to obtain .
Set . Then has size at most . Computing takes time. The probability of an error occurring at any step while computing is upper-bounded by .
We now show that is an -diverse representative of . To this end, let and be two arbitrary sets such that and . We may assume for each that or , otherwise we are done. By symmetry, we may assume . Then by the triangle inequality and thus we must have . By assumption, , so let without loss of generality . Note that . We distinguish the following two cases.
- Case 1: .
- Case 2: .
Since , there is such that By Section 3 there is such that and . Finally, observe that by the triangle inequality .
This completes the proof. ∎
With Sections 3, 3 and 3 at hand we can formalize the case distinction outlined in the beginning of the section. This gives us a way to efficiently compute an -diverse representative in general. Let Section 3 be true. Let be an instance of of size . One can compute an -diverse representative of of size in time with error probability at most .
Our procedure to compute an -diverse representative of works in four steps.
- Step 1.
We use with a monochrome coloring and error probability to search for some in by guessing the size of . Observe that the probability of an error occurring in any of the searches is upper-bounded by If we do not succeed, then output the empty set and we are done. Otherwise, we proceed with the next step.
- Step 2.
For each pair with and , try to compute with and in time and with error probability using with a -coloring where elements in are assigned one color and elements in are assigned the second color. If no such is found for any pair , then for every the symmetric difference . In that case we may apply Section 3 with error probability and are done. Observe that the probability of an error occurring at any step until here is upper-bounded by and the overall running time is . If we found such an , then we proceed with the next step.
- Step 3.
We have with . Define the coloring by
For all with and and , search for a solution with , for all , using with and error probability . For all these combined, we thus have error probability and need time. If no such is found for any choice of , then any must have . In that case we may apply Section 3 with error probability and are done. Observe that the probability of an error occurring at any step until here is upper-bounded by and the overall running time is . In case that we found such an , we proceed with the next step.
- Step 4.
We have such that for all distinct . Hence, by Section 3, we can output . This completes the proof.∎
Proof of Section 3.
Let be an instance of Diverse Multistage , where . For each we apply Section 3 to obtain an -diverse representative of that has size at most in time with error probability . Observe that the probability of an error occurring at any step is upper-bounded by . Now we use the following dynamic program to check whether is a yes-instance.