How to influence an election? One answer to this is gerrymandering [3, 7, 12]. Gerrymandering is the systematic manipulation of the boundaries of electoral districts in favor of a particular party. It has been studied in the political sciences for decades . In recent years, various models of gerrymandering were investigated from an algorithmic and computational perspective. For instance, Lewenberg et al.  and Eiben et al.  studied the (parameterized) computational complexity of gerrymandering assuming that the voters are points in a two-dimensional space and the task is to place polling stations where each voter is assigned to the polling station closest to her. Cohen-Zemach et al.  introduced a version of gerrymandering over graphs (which may be seen as models of social networks) where the question is whether a given candidate can win at least districts. This leads to the question whether there is a partition of the graph into connected subgraphs such that at least of these are won by a designated candidate; herein, and are part of the input of the computational problem. Cohen-Zemach et al. showed that this version is NP-complete even when restricted to planar graphs. Following up on the pioneering work of Cohen-Zemach et al. , Ito et al.  performed a refined complexity analysis, particularly taking into account the special graph structures of cliques, paths, and trees. Indeed, their formal model is slightly different from the one of Cohen-Zemach et al.  and their work will be our main point of reference. Notably, both studies focus on the perhaps simplest voting rule, Plurality.
We mention in passing that earlier work also studied the special case of gerrymandering on grid graphs. More specifically, Apollonio et al.  analyzed gerrymandering in grid graphs where each district in the solution has to be of (roughly) the same size and they analyzed, focusing on two candidates (equivalently, two parties), the maximum possible win margin if the two candidates had the same amount of support. Later, Borodin et al.  also considered gerrymandering on grid graphs with two parties (expressed by colors red and blue), but here each vertex represents a polling station and thus is partially “red” and partially “blue” colored. They provided a worst-case analysis for a two-party situation in terms of the total fraction of votes the party responsible for the gerrymandering process gets. They also confirmed their findings with experiments.
To formally define our central computational problem, we continue with a few definitions. For a vertex-colored graph and for each color , let be the set of -colored vertices. A vertex-weighted graph is -colored if for each color it holds that . A vertex-weighted graph is uniquely -colored if for each color . Thus, we arrive at the central problem of this work, going back to Ito et al. .
Gerrymandering over Graphs In: An undirected, connected graph , a weight function , a set of colors, a target color , a coloring function , and an integer . ?: Is there a partition of into exactly subsets such that every , , induces a connected subgraph in and the number of uniquely -colored induced subgraphs exceeds the number of -colored induced subgraphs for each ?
Figure 1 presents a simple example of Gerrymandering over Graphs. We remark that all our results except for Theorem 1 (that is, the NP-hardness on paths) also transfer to the slightly different model of Cohen-Zemach et al. .333In fact, we conjecture that the gerrymandering problem of Cohen-Zemach et al.  is polynomial-time solvable on paths.
We also use an equivalent interpretation of solution partitions for Gerrymandering over Graphs. Since each part has to induce a connected subgraph, in the spirit of edge deletion problems from algorithmic graph theory, we also represent solutions by a set of edges such that removing these yields the disjoint union of subgraphs induced by each part . In Figure 1, removing the edges and yields a solution.
Finally, regarding notation, for a color we use if is of color and if has another color. Further, we use and .
|constant||polynomial time||[8, Theorem 4.4]|
|Trees||constant||pseudo-polynomial time||[8, Theorem 4.5]|
|polynomial time||Proposition 1|
|weakly NP-hard||Theorem 2|
|polynomial time||[8, Theorem 4.2]|
|polynomial time||Proposition 2|
|NP-hard||[8, Theorem 3.3]|
Known and new results.
As mentioned before, we essentially build our studies on the work of Ito et al. , in particular studying exactly the same computational problem. We only focus on the case of path and tree graphs as input, whereas they additionally studied cliques. For cliques, they showed NP-hardness already for and two colors. On the positive side, for cliques they provided a pseudo-polynomial-time algorithm for and and a polynomial-time algorithm for each fixed . Moving to paths and trees, besides some positive algorithmic and hardness results Ito et al.  particularly left three open problems:
Existence of a polynomial-time algorithm for paths when is part of the input.
Existence of a polynomial-time algorithm for trees when is a constant.
Existence of a polynomial-time algorithm for trees of diameter exactly three.
Indeed, they called the first two questions the “main open problems” of their paper. We settle, all three questions, the first two in the negative by showing NP-hardness. See Table 1 for an overview on some old and our new results. Notably, our new results (partially together with the previous results of Ito et al. ) reveal two sharp complexity dichotomies for trees. For up to two colors, the problem is polynomial-time solvable, whereas it gets NP-hard with three or more colors; moreover, it is polynomial-time solvable for trees with diameter at most three but NP-hard for trees with diameter at least four. In the remainder of this work, we first present our results for paths, and then for trees.
2 NP-hardness on paths
Ito et al.  showed that Gerrymandering over Graphs on paths can be solved in polynomial time for fixed , and left open the question of polynomial-time solvability on paths when is unbounded. Negatively answering their question, we show that Gerrymandering over Graphs remains NP-hard on paths even if every vertex has unit weight.
Gerrymandering over Graphs restricted to paths is NP-hard even if all vertices have unit weight.
We reduce from Clique on regular graph, which is NP-hard . Let be an instance of Clique, where is -regular for some integer , and is the sought solution size. The main idea is to first construct an equivalent instance of Gerrymandering over Graphs where the graph consists of disjoint paths. Afterwards, we slightly modify the reduction to obtain one connected path.
All vertices in the following constructions have weight one. Let and be the number of vertices and edges in , respectively, and let . We introduce a path on vertices for each vertex and a path on four vertices for each edge . Moreover, we introduce an independent set of vertices. We denote by the disjoint union of all for , all for , and . Note that has connected components.
We introduce colors , and a unique color for each , where is the target color. We color vertices of with color and vertices of with color . For each vertex , we color the vertices in as follows.
The first vertices receive color ,
for each , the -th vertex receives color , and
each remaining vertex receives a new color (which is distinct for each vertex).
An illustration of the path is shown in Figure 2. For each edge , we color the two inner vertices of with color and the endpoints with colors and , respectively. Finally, we set .
First, we show that if contains a clique of size , then the constructed instance of Gerrymandering over Graphs is a yes-instance. We will specify the set of exactly edges such that the connected components of correspond to a solution. Note that each removal of an edge increases the number of connected components by exactly one.
For each vertex , the edge set contains all edges in that are not between two -colored vertices. There are such edges.
For each vertex and each edge , the edge set contains the edge incident to the -colored vertex in . There are such edges as each vertex in the input graph has neighbors.
For each edge where both endpoints are contained in , the edge set contains the edge between the two inner (-colored) vertices in . There are such edges.
Thus, contains edges in total, leaving connected components in the graph .
Now we examine the color of each connected component of . First, note that there are connected components that are uniquely -colored. We now show that for each color other than there are at most connected components which are -colored.
For color , observe that there are isolated vertices of color in and for each vertex there is exactly one -colored connected component contained in and for every vertex there is no -colored connected component in . Hence, there are connected components that are -colored.
For color , note that there are vertices which are -colored. Thus, there are less than connected components that are -colored.
For each color with , there are connected components in that are -colored. All other -colored vertices are contained in for some and those belong to -colored component by construction. Hence, there are connected components that are -colored.
For each color with , the whole path remains one connected component which is -colored. All other -colored vertices are contained in for some and since , there are at most connected components that are -colored.
Thus, if contains a clique of size , then the constructed instance is a yes-instance.
Conversely, we show that if the constructed instance of Gerrymandering over Graphs has a solution , then there is a clique of size in . Let be a set of exactly edges in such that the connected components of correspond to . Let be the set of vertices such that contains an edge of and let . For each vertex , let and be the number of connected components of which are -colored and -colored, respectively. Our goal is to show that forms a clique of size in . To this end, we derive an upper bound on the size of in terms of , , and :
For each vertex , there are at most edges in whose endpoints are -colored. Since there are isolated -colored vertices and isolated -colored vertices in , it follows that . Thus, contains at most edges in both of whose endpoints are -colored.
For each vertex , the edge set contains at most edges in where at least one endpoint is not -colored.
For each vertex , the edge set contains at most edges incident to a -colored endpoint in a for some edge .
For each vertex , there are exactly edges incident to a -colored endpoint that are contained in a for some edge . Thus, contains at most such edges.
Finally, we consider edges between inner vertices of for . Observe that if such an edge is in , then has one -colored component and one -colored component. Thus, contains at most
Summing over these edges yields that contains at most
edges. Here, the inequality is due to the fact that . Thus, , where
Next we show that . Recall that has isolated -colored vertices and isolated -colored vertices. Since the path contains at least one -colored part for every vertex , we obtain .
Notice that is monotonically increasing for and that from this follows that . Note that by the definition of . Consequently, we have and hence . Finally, note that for any solution where , we cannot remove any edges between two -colored vertices (as this would result in at least connected components that are -colored). Hence, for each and thus summing up all edges in without the edges between two -colored vertices yields
For to contain edges, it has to also hold that for each vertex . Hence, there are exactly edges in between two -colored vertices in for edges . Note that for each such edge it has to hold that both endpoints of are in as otherwise there are connected components in of color (where is an endpoint of ). Thus, there are vertices in that share edges between them, that is, induces a clique of size .
We next show how to connect the different paths of the construction to obtain a single connected path. For , we simply add a path of vertices between each connected component in the previous reduction (that results in multiple disconnected paths) where each vertex has a unique color. Note that there are exactly such paths and thus in total new edges. Finally, we set . The correctness of this adaption is straight-forward: If there is a solution for the instance consisting of multiple paths, then removing the newly introduced edges clearly gives a solution for the new instance consisting of a single path. If there is no solution for the instance consisting of multiple paths, then note that since is larger than the number of edges in the original construction and , at least one edge from each newly introduced path is removed. Hence, vertices that are in different connected components in the original construction are also in different connected components in any solution. Moreover, since all newly introduced vertices have unique colors and all vertices have the same weight, any color of a connected component in a solution for the instance consisting of multiple paths also has the same color in the newly constructed instance. ∎
In the above reduction, we use an unbounded number of colors. This appears to be inevitable since Gerrymandering over Graphs is polynomial-time solvable for any constant . We wonder whether there are other graph classes for which Gerrymandering over Graphs can be solved in polynomial time when is constant. Caterpillars form a possible candidate.
3 Complexity on trees
In this section, we first address the special case of three colors (NP-hard), then two colors (polynomial-time solvable), and finally we discuss the polynomial-time solvability for diameter-three trees.
Ito et al.  developed a pseudo-polynomial time algorithm for Gerrymandering over Graphs on trees for constant , which led them to ask whether it is also polynomial-time solvable for fixed . We show that Gerrymandering over Graphs on trees is weakly NP-hard even if , answering their question in the negative. In the following subsection, we will then show the polynomial-time solvability for . So we have a tight classification.
Gerrymandering over Graphs restricted to trees is weakly NP-hard even if .
We reduce from Partition, which is known to be NP-hard . Given a multi-set of non-negative integers, the task is to find a subset of exactly integers whose sum is , where . We can assume that is a multiple of (otherwise we multiply each element of by ). Let and let be some natural number greater than . For the construction, we use a set of three colors, where is the target color. We start with a star with a center vertex and a set of leaves. We color every vertex in the star with color . We assign the weights to the center and for each leaf . For each , we do the following.
We introduce two vertices and of color and two vertices and of color . Let , , and .
We add four edges , , , and .
We define the weights for each vertex in as
Observe that the weights are integral since is divisible by . In addition, observe that is -colored and that is -colored.
Illustrating the constructed graph is depicted in Figure 3.
Clearly, the constructed graph is a tree. To conclude the construction of the Gerrymandering over Graphs instance, we set .
We next show that the construction is correct. Suppose that there is a subset of size exactly such that . Then, the partition
where is a solution for the constructed instance of Gerrymandering over Graphs: First, observe that is -colored as and . We also observe that the singleton is -colored for each leaf , and hence has subsets which are -colored. Since is -colored and is -colored for each , exactly subsets of are -colored and exactly subsets of are -colored. Thus, is indeed a solution.
Conversely, suppose that there is a solution . We show that the Partition instance is a yes-instance. Note that there are at least parts in which are uniquely -colored. Since there are exactly vertices of color , each vertex of color is contained in a distinct part in . In particular, this means that for each leaf .
Let denote the subset containing the center , and let and denote the number of vertices of color and in , respectively. As each vertex of color or has weight at least , we have and . Since is uniquely -colored, we have
Here, the last inequality follows since . Thus, contains at most vertices.
Let be the collection of subsets of not containing any -colored vertices. Notice that and that . Now, consider some . We have or for some by construction. Since for all , we have and thus . Moreover, since there are vertices in , we have . Hence, and thus, for each part , it holds that yielding or for some . Let and . Since all are -colored and all are -colored, we have and . Then, since , we obtain .
The total weights of vertices of color and in are
respectively. Now, assume for the sake of contradiction that . Then, there exists an index . If , then each element in is smaller than , and hence
Since and , we obtain
and thus . Consequently, is a solution to the original instance of Partition. ∎
We continue with a complexity analysis for the case . Note that Gerrymandering over Graphs on trees is pseudo-polynomial-time solvable for any constant (and thereby for ) . To complement this result and also Theorem 2, we next show that for there is a polynomial-time algorithm for trees, adapting a pseudo-polynomial-time algorithm of Ito et al. [8, Theorem 4.5]. We thus obtained a dichotomy with respect to . The key difference is that we only store the maximum winning margin of the target color over the other color.
For , Gerrymandering over Graphs restricted to trees can be solved in time.
We assume that , where is the target color. We provide a polynomial-time algorithm for rooted trees. Note that any unrooted tree can be regarded as rooted by choosing an arbitrary vertex as its root. Let be the root of the input graph . For each vertex , let be the subtree of rooted at .
Our algorithm is based on dynamic programming. We iteratively find partial solutions (which will be defined shortly), starting from the leaves until reaching the root. Let be some vertex of and let be the children of . Let be a rooted tree on a single vertex , and for each let be the rooted subtree of induced by and the vertices of . For each vertex , each , and each (where denotes the number of vertices in ), we define such that is the maximum number of -colored parts among all partitions of the vertices in . Therein, we require that is connected for each and that . Moreover, we say that the color of is still undecided as is the only part that is still connected to the rest of the graph (through the parent of ) and therefore we neglect when computing . Further, for each vertex and each let
be the maximum winning margin of over in over all -partitions of the vertices in maximizing . Observe that a given instance is a yes-instance if and only if , where is the number of children of the root and where equals one if the predicate is true and zero otherwise.
We next show how to compute the values of and . We first initialize the values of and for as follows:
where is the set of vertices of . Note that implies as only contains a single vertex. Thus, it only remains to compute the values of and for and . For a partition of the vertices of that maximizes and , we have two cases: or . If , then the edge is removed and the maximum number of uniquely -colored subsets is the maximum sum of -colored subsets in and , that is,
Observe that since , we now count the part that contains and therefore include the last summand. Otherwise (that is, ), then the maximum number of uniquely -colored subsets is
For the computation of and , we have
Here, and are the indices maximizing the terms in Definitions (5) and (4), respectively. Regarding the running time, observe that we compute table entries for each vertex . Since a tree has edges, we compute by the handshaking lemma in total table entries and computing each table entry requires to sum up at most values (weights of vertices or precomputed table entries). Thus, the total running time is . ∎
Finally, we bridge the gap for trees of fixed diameter by generalizing the known polynomial-time algorithm for trees of diameter two  to trees of diameter three. It is also known that Gerrymandering over Graphs on trees of diameter four remains NP-hard .
The key observation is that a tree of diameter three can be obtained from two stars by adding an edge between their centers. Our algorithm then adapts a polynomial-time algorithm for stars .
For trees of diameter three, Gerrymandering over Graphs is solvable in time.
First, observe that a tree of diameter three is the same as two stars whose centers are connected by an edge . Let and be the two centers of the stars (). Our algorithm distinguishes between two cases: (i) and belong to the same part in a solution , and (ii) they belong to two different parts.444Technically, our algorithm computes whether there exists a solution for each of the two cases and reports a solution if it finds a solution for any of the two cases or rejects the input if it does not find a solution for any of the two cases. The subalgorithm for case (i) is completely analogous to the algorithm for Gerrymandering over Graphs on stars (trees of diameter two) by Ito et al. . We will present the whole algorithm for the sake of completeness. It will also be helpful in understanding the subalgorithm for case (ii) (which is an adaptation of the first subalgorithm). Both subalgorithms are based on the observation that each part of not containing or only consists of a single vertex.
We start with presenting the subalgorithm for case (i). The algorithm guesses555Whenever we “guess” something, we iterate over all possible cases and test whether this iteration yields a solution. If any iteration yields a solution, then we refer to this iteration in the proof. a color such that the part with is -colored (uniquely -colored if ). Moreover, the algorithm guesses the numbers and of -colored and -colored leaves that are not contained in (those leaves form their own parts in ). Let be the number of uniquely -colored parts in (note that if and otherwise). Note that for each color we have to guarantee that there are at most parts that are -colored in . Let be the number of -colored leaves. As proven by Ito et al. [8, Lemma 4.3], one can assume that any -colored leaf in is at least as heavy as the ones not in . So we can assume that is the sum of the heaviest -colored leaves plus . Similarly (also shown by Ito et al. [8, Lemma 4.3]), we can also assume that any -colored leaf (for ) not in is at least as heavy as the ones in . For each color , let be the smallest number of -colored leaves that cannot be included in . By definition, is the minimum number such that the sum of weights of all but the heaviest -colored leaves is at most (strictly less if ). Finally, we verify the following:
For each it holds that (and if removing the heaviest leaves results in being -colored).
The values of for all colors plus