# The Complexity of Gerrymandering Over Graphs: Paths and Trees

Roughly speaking, gerrymandering is the systematic manipulation of the boundaries of electoral districts to make a specific (political) party win as many districts as possible. While typically studied from a geographical point of view, addressing social network structures, the investigation of gerrymandering over graphs was recently initiated by Cohen-Zemach et al. [AAMAS 2018]. Settling three open questions of Ito et al. [AAMAS 2019], we classify the computational complexity of the NP-hard problem Gerrymandering over Graphs when restricted to paths and trees. Our results, which are mostly of negative nature (that is, worst-case hardness), in particular yield two complexity dichotomies for trees. For instance, the problem is polynomial-time solvable for two parties but becomes weakly NP-hard for three. Moreover, we show that the problem remains NP-hard even when the input graph is a path.

• 17 publications
• 14 publications
• 69 publications
05/13/2022

### Gerrymandering Trees: Parameterized Hardness

In a representative democracy, elections involve partitioning geographic...
02/23/2021

### A Refined Complexity Analysis of Fair Districting over Graphs

We study the NP-hard Fair Connected Districting problem: Partition a ver...
05/19/2019

### The algorithm by Ferson et al. is surprisingly fast: An NP-hard optimization problem solvable in almost linear time with high probability

Ferson et al. (Reliable computing 11(3), p. 207--233, 2005) introduced a...
04/27/2018

### Alleviating partisan gerrymandering: can math and computers help to eliminate wasted votes?

Partisan gerrymandering is a major cause for voter disenfranchisement in...
05/19/2022

### Line Planning in Public Transport: Bypassing Line Pool Generation

Line planning, i.e. choosing paths which are operated by one vehicle end...
02/12/2020

### Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

Recently, due to the genomic sequence analysis in several types of cance...
07/06/2019

### Node-Constrained Traffic Engineering: Theory and Applications

Traffic engineering (TE) is a fundamental task in networking. Convention...

## 1 Introduction

How to influence an election? One answer to this is gerrymandering [3, 7, 12]. Gerrymandering is the systematic manipulation of the boundaries of electoral districts in favor of a particular party. It has been studied in the political sciences for decades [11]. In recent years, various models of gerrymandering were investigated from an algorithmic and computational perspective. For instance, Lewenberg et al. [9] and Eiben et al. [5] studied the (parameterized) computational complexity of gerrymandering assuming that the voters are points in a two-dimensional space and the task is to place  polling stations where each voter is assigned to the polling station closest to her. Cohen-Zemach et al. [4] introduced a version of gerrymandering over graphs (which may be seen as models of social networks) where the question is whether a given candidate can win at least  districts. This leads to the question whether there is a partition of the graph into  connected subgraphs such that at least  of these are won by a designated candidate; herein,  and  are part of the input of the computational problem. Cohen-Zemach et al. showed that this version is NP-complete even when restricted to planar graphs. Following up on the pioneering work of Cohen-Zemach et al. [4], Ito et al. [8] performed a refined complexity analysis, particularly taking into account the special graph structures of cliques, paths, and trees. Indeed, their formal model is slightly different from the one of Cohen-Zemach et al. [4] and their work will be our main point of reference. Notably, both studies focus on the perhaps simplest voting rule, Plurality.

We mention in passing that earlier work also studied the special case of gerrymandering on grid graphs. More specifically, Apollonio et al. [1] analyzed gerrymandering in grid graphs where each district in the solution has to be of (roughly) the same size and they analyzed, focusing on two candidates (equivalently, two parties), the maximum possible win margin if the two candidates had the same amount of support. Later, Borodin et al. [2] also considered gerrymandering on grid graphs with two parties (expressed by colors red and blue), but here each vertex represents a polling station and thus is partially “red” and partially “blue” colored. They provided a worst-case analysis for a two-party situation in terms of the total fraction of votes the party responsible for the gerrymandering process gets. They also confirmed their findings with experiments.

To formally define our central computational problem, we continue with a few definitions. For a vertex-colored graph and for each color , let  be the set of -colored vertices. A vertex-weighted graph is -colored if for each color  it holds that . A vertex-weighted graph is uniquely -colored if  for each color . Thus, we arrive at the central problem of this work, going back to Ito et al. [8].

Gerrymandering over Graphs In: An undirected, connected graph , a weight function , a set  of colors, a target color , a coloring function , and an integer . ?: Is there a partition  of  into exactly  subsets  such that every , , induces a connected subgraph in  and the number of uniquely -colored induced subgraphs exceeds the number of -colored induced subgraphs for each ?

Figure 1 presents a simple example of Gerrymandering over Graphs. We remark that all our results except for Theorem 1 (that is, the NP-hardness on paths) also transfer to the slightly different model of Cohen-Zemach et al. [4].333In fact, we conjecture that the gerrymandering problem of Cohen-Zemach et al. [4] is polynomial-time solvable on paths.

We also use an equivalent interpretation of solution partitions  for Gerrymandering over Graphs. Since each part  has to induce a connected subgraph, in the spirit of edge deletion problems from algorithmic graph theory, we also represent solutions by a set of edges such that removing these yields the disjoint union of subgraphs induced by each part . In Figure 1, removing the edges  and  yields a solution.

Finally, regarding notation, for a color  we use  if  is of color  and  if  has another color. Further, we use  and .

### Known and new results.

As mentioned before, we essentially build our studies on the work of Ito et al. [8], in particular studying exactly the same computational problem. We only focus on the case of path and tree graphs as input, whereas they additionally studied cliques. For cliques, they showed NP-hardness already for and two colors. On the positive side, for cliques they provided a pseudo-polynomial-time algorithm for  and and a polynomial-time algorithm for each fixed . Moving to paths and trees, besides some positive algorithmic and hardness results Ito et al. [8] particularly left three open problems:

1. Existence of a polynomial-time algorithm for paths when  is part of the input.

2. Existence of a polynomial-time algorithm for trees when  is a constant.

3. Existence of a polynomial-time algorithm for trees of diameter exactly three.

Indeed, they called the first two questions the “main open problems” of their paper. We settle, all three questions, the first two in the negative by showing NP-hardness. See Table 1 for an overview on some old and our new results. Notably, our new results (partially together with the previous results of Ito et al. [8]) reveal two sharp complexity dichotomies for trees. For up to two colors, the problem is polynomial-time solvable, whereas it gets NP-hard with three or more colors; moreover, it is polynomial-time solvable for trees with diameter at most three but NP-hard for trees with diameter at least four. In the remainder of this work, we first present our results for paths, and then for trees.

## 2 NP-hardness on paths

Ito et al. [8] showed that Gerrymandering over Graphs on paths can be solved in polynomial time for fixed , and left open the question of polynomial-time solvability on paths when is unbounded. Negatively answering their question, we show that Gerrymandering over Graphs remains NP-hard on paths even if every vertex has unit weight.

###### Theorem 1.

Gerrymandering over Graphs restricted to paths is NP-hard even if all vertices have unit weight.

###### Proof.

We reduce from Clique on regular graph, which is NP-hard [10]. Let  be an instance of Clique, where  is -regular for some integer , and  is the sought solution size. The main idea is to first construct an equivalent instance of Gerrymandering over Graphs where the graph consists of disjoint paths. Afterwards, we slightly modify the reduction to obtain one connected path.

All vertices in the following constructions have weight one. Let  and be the number of vertices and edges in , respectively, and let . We introduce a path on vertices for each vertex  and a path  on four vertices for each edge . Moreover, we introduce an independent set of vertices. We denote by  the disjoint union of all  for , all  for , and . Note that  has  connected components.

We introduce colors , and a unique color for each , where  is the target color. We color  vertices of with color  and vertices of with color . For each vertex , we color the vertices in as follows.

• The first vertices receive color ,

• for each , the -th vertex receives color , and

• each remaining vertex receives a new color (which is distinct for each vertex).

An illustration of the path is shown in Figure 2. For each edge , we color the two inner vertices of with color  and the endpoints with colors  and , respectively. Finally, we set .

First, we show that if contains a clique of size , then the constructed instance of Gerrymandering over Graphs is a yes-instance. We will specify the set  of exactly  edges such that the connected components of  correspond to a solution. Note that each removal of an edge increases the number of connected components by exactly one.

• For each vertex , the edge set  contains all edges in that are not between two -colored vertices. There are such edges.

• For each vertex  and each edge , the edge set  contains the edge incident to the -colored vertex in . There are  such edges as each vertex in the input graph has  neighbors.

• For each edge  where both endpoints are contained in , the edge set  contains the edge between the two inner (-colored) vertices in . There are  such edges.

Thus, contains edges in total, leaving  connected components in the graph .

Now we examine the color of each connected component of . First, note that there are  connected components that are uniquely -colored. We now show that for each color  other than  there are at most  connected components which are -colored.

• For color , observe that there are  isolated vertices of color  in  and for each vertex  there is exactly one -colored connected component contained in and for every vertex  there is no -colored connected component in . Hence, there are  connected components that are -colored.

• For color , note that there are vertices which are -colored. Thus, there are less than  connected components that are -colored.

• For each color  with , there are  connected components in  that are -colored. All other -colored vertices are contained in for some  and those belong to -colored component by construction. Hence, there are  connected components that are -colored.

• For each color with , the whole path  remains one connected component which is -colored. All other -colored vertices are contained in  for some and since , there are at most  connected components that are -colored.

Thus, if  contains a clique of size , then the constructed instance is a yes-instance.

Conversely, we show that if the constructed instance of Gerrymandering over Graphs has a solution , then there is a clique of size  in . Let  be a set of exactly  edges in  such that the connected components of  correspond to . Let be the set of vertices  such that contains an edge of  and let . For each vertex , let and be the number of connected components of which are -colored and -colored, respectively. Our goal is to show that forms a clique of size  in . To this end, we derive an upper bound on the size of in terms of , and :

1. For each vertex , there are at most  edges in  whose endpoints are -colored. Since there are  isolated -colored vertices and  isolated -colored vertices in , it follows that . Thus, contains at most edges in both of whose endpoints are -colored.

2. For each vertex , the edge set  contains at most  edges in  where at least one endpoint is not -colored.

3. For each vertex , the edge set  contains at most  edges incident to a -colored endpoint in a  for some edge .

4. For each vertex , there are exactly  edges incident to a -colored endpoint that are contained in a  for some edge . Thus, contains at most  such edges.

5. Finally, we consider edges between inner vertices of for . Observe that if such an edge is in , then has one -colored component and one -colored component. Thus, contains at most

 (|K|2)+(∑v∈JN−ncv)=(n−|J|2)+∑v∈JN−ncv

such edges.

Summing over these edges yields that contains at most

 (n−ℓ−|J|)+(∑v∈J3ncv)+(∑v∈JN−ncv) + d⋅(n−|J|)+(n−|J|2)+(∑v∈JN−ncv) ≤(n−ℓ−|J|)+3N⋅|J|+ d⋅(n−|J|)+(n−|J|2)

edges. Here, the inequality is due to the fact that . Thus, , where

 f(x):=(n−ℓ−x)+3N⋅x+d⋅(n−x)+(n−x2).

Next we show that . Recall that has isolated -colored vertices and isolated -colored vertices. Since the path  contains at least one -colored part for every vertex , we obtain .

Notice that is monotonically increasing for and that from this follows that . Note that  by the definition of . Consequently, we have  and hence . Finally, note that for any solution where , we cannot remove any edges between two -colored vertices (as this would result in at least  connected components that are -colored). Hence,  for each  and thus summing up all edges in  without the edges between two -colored vertices yields

 |E′′|≤(∑v∈J2N+ncv)+dℓ+(ℓ2).

For  to contain  edges, it has to also hold that  for each vertex . Hence, there are exactly  edges in  between two -colored vertices in  for edges . Note that for each such edge  it has to hold that both endpoints of  are in  as otherwise there are  connected components in  of color  (where  is an endpoint of ). Thus, there are  vertices in  that share  edges between them, that is,  induces a clique of size .

We next show how to connect the different paths of the construction to obtain a single connected path. For , we simply add a path of  vertices between each connected component in the previous reduction (that results in multiple disconnected paths) where each vertex has a unique color. Note that there are exactly  such paths and thus in total  new edges. Finally, we set . The correctness of this adaption is straight-forward: If there is a solution for the instance consisting of multiple paths, then removing the newly introduced edges clearly gives a solution for the new instance consisting of a single path. If there is no solution for the instance consisting of multiple paths, then note that since  is larger than the number of edges in the original construction and , at least one edge from each newly introduced path is removed. Hence, vertices that are in different connected components in the original construction are also in different connected components in any solution. Moreover, since all newly introduced vertices have unique colors and all vertices have the same weight, any color of a connected component in a solution for the instance consisting of multiple paths also has the same color in the newly constructed instance. ∎

In the above reduction, we use an unbounded number of colors. This appears to be inevitable since Gerrymandering over Graphs is polynomial-time solvable for any constant . We wonder whether there are other graph classes for which Gerrymandering over Graphs can be solved in polynomial time when is constant. Caterpillars form a possible candidate.

## 3 Complexity on trees

In this section, we first address the special case of three colors (NP-hard), then two colors (polynomial-time solvable), and finally we discuss the polynomial-time solvability for diameter-three trees.

Ito et al. [8] developed a pseudo-polynomial time algorithm for Gerrymandering over Graphs on trees for constant , which led them to ask whether it is also polynomial-time solvable for fixed . We show that Gerrymandering over Graphs on trees is weakly NP-hard even if , answering their question in the negative. In the following subsection, we will then show the polynomial-time solvability for . So we have a tight classification.

###### Theorem 2.

Gerrymandering over Graphs restricted to trees is weakly NP-hard even if .

###### Proof.

We reduce from Partition, which is known to be NP-hard [6]. Given a multi-set  of  non-negative integers, the task is to find a subset  of exactly integers whose sum is , where . We can assume that  is a multiple of (otherwise we multiply each element of by ). Let and let be some natural number greater than . For the construction, we use a set  of three colors, where  is the target color. We start with a star with a center vertex  and a set  of  leaves. We color every vertex in the star with color . We assign the weights  to the center and  for each leaf . For each , we do the following.

• We introduce two vertices  and  of color  and two vertices  and  of color . Let , and .

• We add four edges , , , and .

• We define the weights for each vertex in as

 w(xqi) :=M+N⋅2i+ai, w(xri) :=M−N⋅2i, w(yri) :=M+N⋅2i−ai+2sn, and w(yqi) :=M−N⋅2i.

Observe that the weights are integral since  is divisible by . In addition, observe that  is -colored and that  is -colored.

Illustrating the constructed graph is depicted in Figure 3.

Clearly, the constructed graph is a tree. To conclude the construction of the Gerrymandering over Graphs instance, we set .

We next show that the construction is correct. Suppose that there is a subset  of size exactly such that . Then, the partition

 V={V′}∪{{ℓ}∣ℓ∈L}}∪{Xi∣i∈[n]∖I}∪{Yi∣i∈[n]∖I},

where is a solution for the constructed instance of Gerrymandering over Graphs: First, observe that is -colored as  and . We also observe that the singleton is -colored for each leaf , and hence  has subsets which are -colored. Since is -colored and is -colored for each , exactly  subsets of  are -colored and exactly  subsets of  are -colored. Thus, is indeed a solution.

Conversely, suppose that there is a solution . We show that the Partition instance is a yes-instance. Note that there are at least parts in  which are uniquely -colored. Since there are exactly vertices of color , each vertex of color  is contained in a distinct part in . In particular, this means that for each leaf .

Let denote the subset containing the center , and let  and denote the number of vertices of color  and  in , respectively. As each vertex of color  or  has weight at least , we have  and . Since  is uniquely -colored, we have

 max{wq(Vz),wr(Vz)}

Here, the last inequality follows since . Thus,  contains at most vertices.

Let be the collection of subsets of  not containing any -colored vertices. Notice that  and that . Now, consider some . We have or for some  by construction. Since  for all , we have and thus . Moreover, since there are  vertices in , we have . Hence,  and thus, for each part , it holds that  yielding  or  for some . Let  and . Since all  are -colored and all  are -colored, we have  and . Then, since , we obtain .

The total weights of vertices of color and in are

 wq(Vz) =∑i∈Ixw(xqi)+∑i∈Iyw(yqi)=Mn+∑i∈Ixai+N⎛⎝∑i∈Ix2i−∑i∈Iy2i⎞⎠ and (1)
 wr(Vz) =∑i∈Ixw(xri)+∑i∈Iyw(yri)=Mn+s−∑i∈Iyai+N⎛⎝∑i∈Iy2i−∑i∈Ix2i⎞⎠, (2)

respectively. Now, assume for the sake of contradiction that . Then, there exists an index . If , then each element in is smaller than , and hence

 ∑i∈Ix2i−∑i∈Iy2i=∑i∈Ix∖Iy2i−∑i∈Iy∖Ix2i≥2imax−∑i∈[imax−1]2i=2. (3)

Combining Equations 3 and 1 yields

 wq(Vz)≥Mn+∑i∈Ixai+2N≥Mn+2N>wp(Vz),

which is a contradiction to  being uniquely -colored. We analogously obtain a contradiction for and thus it holds that . Observe that for  Equation 1 implies  and Equation 2 implies .

Since  and , we obtain

 wq(Vz)=wr(Vz)=Mn+s/2

and thus . Consequently,  is a solution to the original instance of Partition. ∎

We continue with a complexity analysis for the case . Note that Gerrymandering over Graphs on trees is pseudo-polynomial-time solvable for any constant (and thereby for ) [8]. To complement this result and also Theorem 2, we next show that for  there is a polynomial-time algorithm for trees, adapting a pseudo-polynomial-time algorithm of Ito et al. [8, Theorem 4.5]. We thus obtained a dichotomy with respect to . The key difference is that we only store the maximum winning margin of the target color over the other color.

###### Proposition 1.

For , Gerrymandering over Graphs restricted to trees can be solved in  time.

###### Proof.

We assume that , where is the target color. We provide a polynomial-time algorithm for rooted trees. Note that any unrooted tree can be regarded as rooted by choosing an arbitrary vertex as its root. Let be the root of the input graph . For each vertex , let  be the subtree of  rooted at .

Our algorithm is based on dynamic programming. We iteratively find partial solutions (which will be defined shortly), starting from the leaves until reaching the root. Let be some vertex of  and let be the children of . Let be a rooted tree on a single vertex , and for each  let be the rooted subtree of induced by and the vertices of . For each vertex , each , and each  (where denotes the number of vertices in ), we define such that  is the maximum number of -colored parts among all partitions  of the vertices in . Therein, we require that  is connected for each  and that . Moreover, we say that the color of  is still undecided as  is the only part that is still connected to the rest of the graph (through the parent of ) and therefore we neglect  when computing . Further, for each vertex  and each  let

 Wiu(k′):=max(wp(Vk′)−wq(Vk′))

be the maximum winning margin of  over  in  over all -partitions  of the vertices in maximizing . Observe that a given instance is a yes-instance if and only if , where  is the number of children of the root  and where  equals one if the predicate  is true and zero otherwise.

We next show how to compute the values of  and . We first initialize the values of and  for as follows:

 Liu(1) :=0, Wiu(1) :=wp(Viu)−wq(Viu),

where  is the set of vertices of . Note that  implies  as  only contains a single vertex. Thus, it only remains to compute the values of  and  for and . For a partition of the vertices of  that maximizes and , we have two cases: or . If , then the edge  is removed and the maximum number of uniquely -colored subsets is the maximum sum of -colored subsets in  and , that is,

 Niu(k′):=maxj∈[k′−1] Li−1u(j)+Lvi(k′−j)+1[Wvi(k′−j)>0]. (4)

Observe that since , we now count the part that contains  and therefore include the last summand. Otherwise (that is, ), then the maximum number of uniquely -colored subsets is

 Miu(k′):=maxj∈[k′]Li−1u(j)+Lvi(k′−j+1). (5)

For the computation of  and , we have

 Liu(k′) :=max(Niu(k′),Miu(k′)), and Wiu(k′) :=⎧⎪⎨⎪⎩Wi−1u(j) if Niu(k′)>Miu(k′),Wi−1u(j′)+Wvi(k′−j′+1) if Niu(k′)

Here, and are the indices maximizing the terms in Definitions (5) and (4), respectively. Regarding the running time, observe that we compute  table entries for each vertex . Since a tree has  edges, we compute by the handshaking lemma in total  table entries and computing each table entry requires to sum up at most  values (weights of vertices or precomputed table entries). Thus, the total running time is . ∎

Finally, we bridge the gap for trees of fixed diameter by generalizing the known polynomial-time algorithm for trees of diameter two [8] to trees of diameter three. It is also known that Gerrymandering over Graphs on trees of diameter four remains NP-hard [8].

The key observation is that a tree of diameter three can be obtained from two stars by adding an edge between their centers. Our algorithm then adapts a polynomial-time algorithm for stars [8].

###### Proposition 2.

For trees of diameter three, Gerrymandering over Graphs is solvable in  time.

###### Proof.

First, observe that a tree of diameter three is the same as two stars whose centers are connected by an edge . Let  and  be the two centers of the stars (). Our algorithm distinguishes between two cases: (i)  and  belong to the same part in a solution , and (ii) they belong to two different parts.444Technically, our algorithm computes whether there exists a solution for each of the two cases and reports a solution if it finds a solution for any of the two cases or rejects the input if it does not find a solution for any of the two cases. The subalgorithm for case (i) is completely analogous to the algorithm for Gerrymandering over Graphs on stars (trees of diameter two) by Ito et al. [8]. We will present the whole algorithm for the sake of completeness. It will also be helpful in understanding the subalgorithm for case (ii) (which is an adaptation of the first subalgorithm). Both subalgorithms are based on the observation that each part of  not containing  or  only consists of a single vertex.

We start with presenting the subalgorithm for case (i). The algorithm guesses555Whenever we “guess” something, we iterate over all possible cases and test whether this iteration yields a solution. If any iteration yields a solution, then we refer to this iteration in the proof. a color  such that the part  with  is -colored (uniquely -colored if ). Moreover, the algorithm guesses the numbers  and  of -colored and -colored leaves that are not contained in  (those leaves form their own parts in ). Let be the number of uniquely -colored parts in  (note that if and otherwise). Note that for each color  we have to guarantee that there are at most  parts that are -colored in . Let  be the number of -colored leaves. As proven by Ito et al. [8, Lemma 4.3], one can assume that any -colored leaf in is at least as heavy as the ones not in . So we can assume that is the sum of the heaviest  -colored leaves plus . Similarly (also shown by Ito et al. [8, Lemma 4.3]), we can also assume that any -colored leaf (for ) not in is at least as heavy as the ones in . For each color , let  be the smallest number of -colored leaves that cannot be included in . By definition, is the minimum number such that the sum of weights of all but the  heaviest -colored leaves is at most  (strictly less if ). Finally, we verify the following:

• For each it holds that  (and  if removing the  heaviest leaves results in  being -colored).

• The values of for all colors plus  and  sums up to at most .

• There are enough leaves that are not -colored or -colored that can be removed after removing  vertices of each color  to achieve  parts.

If so, then we remove the remaining vertices arbitrarily to obtain a solution.

We continue with the subalgorithm for the case (ii). Although the algorithm is somewhat similar to the previous case, we compute the values (namely, and