Package for calculating compactness measures and quantifying gerrymandering.
The shape of an electoral district may suggest whether it was drawn with political motivations, or gerrymandered. For this reason, quantifying the shape of districts, in particular their compactness, is a key task in politics and civil rights. A growing body of literature suggests and analyzes compactness measures mathematically, but little consideration has been given to how these scores should be calculated in practice. Here, we consider the effects of a number of decisions that must be made in interpreting and implementing a set of popular compactness scores. We show that the choices made in quantifying compactness may themselves become political tools, with seemingly innocuous decisions leading to disparate scores. We show that when the full range of implementation flexibility is used, it can be abused to make clearly gerrymandered districts appear quantitatively reasonable. This complicates using compactness as a legislative or judicial standard to counteract unfair redistricting practices. This paper accompanies the release of packages in C++, Python, and R which correctly, efficiently, and reproducibly calculate a variety of compactness scores.READ FULL TEXT VIEW PDF
Quantifying the amount of polarization is crucial for understanding and
The ongoing debate on the ethics of self-driving cars typically focuses ...
This paper presents the construction of a Knowledge Graph about relation...
Using an ensemble of redistricting plans, we evaluate whether a given
We study the problem of interpreting trained classification models in th...
Mainstream machine learning conferences have seen a dramatic increase in...
Much work has been done in understanding human creativity and defining
Package for calculating compactness measures and quantifying gerrymandering.
We have identified a number of choices that must be made to compute a compactness score. In addition to the choice of (1) compactness definition, it is also important to consider how to handle (2) non-contiguous districts, (3) districts with holes, (4) political superunit boundaries, (5) map projections, (6) topography, (7) data resolution, (8) floating-point uncertainty, and (9) whether alternative choies were possible in drawing a district’s boundaries. These are considered independently below.
In combination, these choices provide potentially undesirable implementation flexibility. This flexibility can be abused: Different implementation choices applied to what is nominally the same data lead to very different conclusions about fairness of a districting plan.
To demonstrate this effect, we have selected ten U.S. Congressional Districts widely considered to be gerrymandered. Using an optimizer, we apply the full flexibility detailed in this paper and are able to find sets of implementation decisions for which these districts’ compactness scores are outliers when compared against the full distribution of districts’ scores. We are also able to find sets of decisions which make these districts appear reasonable. That is, we can exploit implementation flexibility to build a seemingly reasonable argument that these districts are both gerrymandered and not.
Figure 1 shows the effects of this adversarial choice of parameters. In the case of NC01, IL04, and PA07, it was possible to move the districts from obvious outliers to middle-of-the-pack status. In other cases, such as NC12, NC04, and TX35, this was not possible, but the districts can still be moved considerably closer to the mean, countering arguments that they are outliers.
The optimizer does not need to use extreme settings to produce the desired results. For example, TX33 appears most gerrymandered using the CvxHullPTB score (scores are defined below) at a 500 m simplification tolerance in a locally-optimized Lambert conformal conic projection all districts included in the distribution; it appears least gerrymandered using the ReockPT score with a 500 m tolerance in a Gall projection with districts comprising an entire state excluded.
Of the many compactness scores discussed in the literature, some are better able to cope with the complexities discussed here than others. Many of the more robust metrics, however, are also difficult or impossible to calculate using commonly-available software. For instance, QGIS (6) includes the area of multipolygons as a built-in display field, convex hulls as a function three menu levels deep, and has no functionality to calculate the minimum bounding circles needed for Reock scores. Other scores, such as bizarreness (4) have mathematical descriptions of a complex calculation, but no associated source code.
To address this situation, we have released a family of open source packages which share a common library designed to efficiently, reproducibly, and correctly calculate a variety of compactness scores. The basis of this ecosystem is compactnesslib,111https://github.com/gerrymandr/compactnesslib a C++ library and associated command-line interface which ingests bulk or single data in a variety of formats and calculates compactness scores. The python-mander Python package222https://github.com/gerrymandr/python-mander (available via pip333https://pypi.python.org/pypi/mander) and the mandeR R package444https://github.com/gerrymandr/mandeR provide high-level interfaces to this library. In addition, a QGIS plugin555https://github.com/gerrymandr/qgis-compactness provides GIS users an easy means of calculating scores (7, 8, 9, 10). This stack was utilized to produce the calculations in this paper: The complete source code for generating all the diagrams presented here is available at github.com/r-barnes/Barnes2018-compactness-implementation.
The measurement of compactness can be used as a tool to help detect and quantify gerrymandering. Numerous engineering and implementation decisions, however, must be made to calculate a score. Whether used unintentionally or maliciously, this flexibility has strong bearing on the quality of compactness measurements and can be leveraged to shape conclusions about the quality of a districting plan.
Beyond providing “best practices” for implementations of compactness standards, we intend the open source software accompanying this paper as a first step toward fair and accurate measurement of compactness, allowing scientists, politicians, and the public to evaluate aspects of their democracy using reproducible, mathematically well-founded, and computationally stable tools.
Finally, we remind the reader that the goal of all of this is to help governments represent their people. Compactness, while attractive as a quantitative metric, is a tool, not the end-game.
The foregoing highlights the importance of being clear about how a score is calculated. In general, a mathematical definition alone is not sufficient: Attention must be paid to data and algorithmic quality. Here, we suggest best practices for the calculation of compactness scores:
Scores. Be explicit about what each variable in a compactness score means. Does area include holes? Is it constrained by political superunits? How should non-contiguous districts be handled? Score names should be distinct and informative. Appending a clarifying suffix to the name of a score (e.g. PTSHp) informs readers as to what is being done. See our Methods for examples.
Projections. Scale distortion should be limited to less than 1.25% throughout the region of interest. Reasonable choices of national or local projections usually suffice.
Resolution. Use the best available resolution from a trusted source. Simplified or down-scaled data give altered results. Alternatively, choose a score which is robust to changes in resolution: hull-based scores seem to do well in this regard. The U.S. Census Bureau produces reasonable data designed such that all borders that are at the same resolution align. Ideally, districting data should be drawn from a common, trusted, non-partisan source, regardless of who is performing an analysis.
Scores which do not explicitly account for constraints imposed by superunit boundaries leave out valuable information about what was possible in drawing a district. That is, they may unfairly penalize a district for having an odd shape when no other shape was possible. Use a score that accounts for superunit borders. Be sure that borders are cropped to features such as major coastlines.
Choice. Before doing statistics on a set of district plans, eliminate those districts which encompass an entire political superunit, as no other choices of shape were possible.
Topography. We have not found including topography in the calculation of area to be a significant source of error, assuming the use of acceptable map projections.
Border coalignment. Coalignment of borders is a concern, though the effect was small in our data. To avoid problems, datasets used in an analysis should always be at the same resolution and carefully coaligned during their creation. In the U.S., Census data satisfies these requirements.
Floating-point considerations. We have not found the choice of single- or double-precision floating-point representations to be important in our calculations.
Transparency. A compactness score should not be accepted and cannot be interpreted without knowing the steps that went into creating it. From a scientific standpoint this relates strongly to reproducibility: We cannot trust what we cannot reproduce. Therefore, documentation is needed down to the equation level, and the release of source code is critical (11, 12, 13).
While the U.S. court system has declared that egregious gerrymandering is unconstitutional (14, 15, 16), the courts have thus far declined to adopt a quantitative standard by which gerrymandering can be judged; however, they have left open the possibility that a “workable standard” exists. (17) This paper demonstrates that any standard must be specified precisely and carefully, since differences in interpretation can have large effects on scores. Furthermore, this paper demonstrates that even a well-specified standard may judge unreasonable districts as being reasonable (see Figure 1). Therefore, any legally-mandated standard of compactness should leave open the possibility of challenges. Additionally, given the implementation flexibility discussed here and its potential for abuse, courts should not accept quantitative arguments unless the code used to build those arguments is made publicly accessible.
There are over 24 different measures of compactness in the literature, and no doubt many others exist. The measures break down into roughly five categories: (1) length vs. width, (2) area ratios, (3) perimeter-to-area ratios, (4) other geometric measures (moment of inertia, interior angles, &c.), and (5) measures incorporating population or other such information.(2, 3)
In this paper, we consider three widely-used compactness scores and their variants (Figure 2 provides a depiction):
All of the above scores are in the range with higher values indicating greater compactness. Districts with relatively low values might be suspected of having been gerrymandered.
Note that these scores are purely geometric; it may be that scores incorporating population densities or other demographic data provide a better means of measuring gerrymandering, but we do not pursue this direction in our discussion. It is likely that incorporating such additional data would exacerbate the issues we discuss.
All of these measures are under-defined: They assume that an electoral district is described by a single polygon without any holes. In reality, districts, such as those with islands (see Figure 2), often are comprised of many polygons. While holes in districts are rarer, they do occur. To resolve these difficulties, we suggest methods be defined with specific reference to multiple polygons and holes.
Here, whether or not contiguity is accounted for in a score will be indicated by the suffixes PT (polygons together) and PS (polygons separate). Whether or not holes are accounted for will be indicated by the suffixes AH (add holes) and SH (subtract holes). If there is ambiguity regarding whether area, perimeter, or some other quantity is being treated in this way, then terms such as PTaSHp (treat the area of the polygons together, subtract the perimeter of holes) may be used. The suffix B indicates that a score accounts for constraints imposed by the boundaries of political superunits.
There is no federal requirement that districts be contiguous, nor do many states require it. Indeed, the presence of islands (e.g. Hawaii) can make contiguity an impossibility. Non-contiguity may arise in other ways. Civil rights considerations have given Louisiana 01, depicted in Figure 2, two large portions separated by Louisiana 02; Louisiana 02 was drawn as a majority-minority district following the passage of the Voting Rights Act of 1965. Wisconsin’s 61st Assembly District (Figure 5) exemplifies a different situation. The city of Racine, WI, had a non-contiguous boundary as a by-product of annexation, yet Wisconsin required that its districts be composed entirely of wards. As a result, the district itself is non-contiguous and could not legally be drawn in any other way (21). For the 114th Congress 1:500,000 resolution data, 84 of 441 districts are non-contiguous. Of the non-contiguous districts the largest number of subdivisions was 580 (in Alaska) and the median was 5.
The question then is whether a district should be treated as a single unit or several independent units. Treating the district as a single unit by, e.g., enclosing it in a single hull, will tend to result in lower compactness scores indicative of gerrymandering. Treating the district as a separate units and summing the areas of the units’ enclosing hulls will result in higher compactness scores indicating less gerrymandering.
Mathematically speaking, although Polsby-Popper is usually calculated as being proportional to , there are at least two possibilities for extending this formula to non-contiguous districts, in particular and , where enumerates the non-contiguous subregions of the district. Note that although the original Polsby-Popper score is bound to the range , this is not true of the first of these alternatives. Here, we use the latter alternative.
Ultimately, special attention should be given to non-contiguous districts to determine whether they result from natural features, legitimate legal requirements, or electoral engineering. Figure 7 shows the effect the foregoing interpretations can have on compactness scores. The wide gap between different interpretations of what is nominally the same score supports the need for exactitude in both language and implementation.
Holes are relatively rare in districting, but many of the same considerations apply. Wisconsin 61, discussed previously, has a legally-mandated hole (Figure 5). Texas 18 very nearly surrounds the urban core of Houston and could, in a low-resolution dataset, be assigned a hole. Holes also appear as artifacts of the digitization process (Figure 8). For the 114th Congress 1:500,000 resolution data, four of 441 districts have holes as artifacts.
Districts are constrained by borders imposed by higher geopolitical units as well as by nature. Compactness scores that do not account for such constraints may assign inappropriately low scores to a district. The panhandles of Florida and Oklahoma, as well as Kentucky’s border with the Ohio River (see Figure 4), contain electoral districts whose shape, at least in part, cannot be dictated by politics. The same is true of almost any coastal district since islands and peninsulas must be included, but lengthen their perimeters. Louisiana (see Figure 4) exmplifies this.
Some scores can be modified to account for this issue. They can be marked with the suffix B (borders accounted for). For example, in the case of the convex hull and Reock scores, if the hull or minimum bounding circle is intersected with a state polygon, the result is a better representation of what was possible and, therefore, a better indicator of whether gerrymandering took place. Taking this into account can have a considerable impact on compactness scores (Figure 9). Those scores, such as the Polsby-Popper, which cannot be modified to account for borders, are calculated as described elsewhere without consideration of borders.
The boundaries of electoral districts, states, and countries may include large maritime regions, as shown in Figure 3. Insofar as these regions generally cannot be populated, save for areas immediately adjacent to the shore, their inclusion in compactness calculations may serve to hide the effects of gerrymandering. Input data should be cropped to major coastlines to account for this, though, doing so is not a panacea: coastlines tend to be fractal (see Figure 14).
As Figure 6 shows, border data, especially when drawn from disparate sources, may not always co-align. We attempted to quantify this effect by overlaying high-resolution district data with medium-resolution state data and found that the impact was usually small (see Figure 10 for details). Problems can be avoided entirely by using data which is co-aligned, such as is available from the U.S. Census.
If only one possible plan exists for a district, that district cannot be gerrymandered and should be excluded from analysis. In the Census Bureau data (5) used here, 13 congressional districts, including Alaska, Delaware, and Vermont, had only one congressional district. No matter how oddly shaped these districts are, they are not gerrymandered.
Although scores are often defined as though districts exist on a plane, in reality districts are wrapped around the curvature of the Earth and local topographical features. Several interpretations of scores are possible: districts could be mapped to the plane using a projection designed to minimize distortion across an entire country, a subdivision of a country such as a state, or even the district itself. Alternatively, variables could be calculated on the sphere or WGS84 ellipsoid. As Figure 11 shows, despite all the possibilities, compactness measures appear to be stable to reasonable choices among localized (country-scale) map projections used in practice. Alaska demonstrates what happens when an unreasonable choice is made: its score in a projection suitable for the conterminous United States differs that in an Alaska-specific projection by up to 20%.
Clearly, using a global projection such as the standard Mercator induces too much distortion. This implies that Web Mercator (EPSG:3857) should never be used for compactness calculations, despite its ubiquitous use on the internet. Across all districts, scores, and projections, the absolute score difference between a district as measured in a locally-optimal projection versus a conterminous projection was less than 0.009 in 99% of cases. The other 1% of cases comprise districts such as Alaska and American Somoa, which are outside the region of interest for the conterminous projections. Given this, nation-sized projections—excluding outlying states and territories—are likely reasonable choices. Quantitatively, the conterminous Albers Equal Area (EPSG:102003) projection has a maximum scale distortion of 1.25% (22): this value hence can be taken as an upper limit on what is acceptable for any projection and is our recommended choice for districts in the conterminous United States.
A different effect of mapping electoral districts to a plane is that topography, such as mountains, is left out of quantities such as area and perimeter. As a result, the true land area and overland distance between points is under-estimated. Using the 30 m USGS National Elevation Dataset(23), we calculated the surface area of districts using RichDEM’s implementation (24) of an algorithm by Jenness (25) and modeled perimeter as the summed length of all the raster elevation cells at the edge of a district. The difference in Polsby-Popper scores between the topographic and non-topographic data was less than 0.03 for all districts, with 75% of districts having deviations less than 0.005. This should be expected given that Kansas (and every other state) is provably flatter than a pancake. (26)
|Kentucky 03||Louisiana 01|
Resolution can be thought of as the density of points describing a boundary. Figure 4 shows the same district at several resolutions; lower resolutions lead to simpler shapes usually, but not always, by reducing the length of the perimeter. The U.S. Census Bureau releases boundary data of Congressional Districts in four resolutions: full, 1:500k, 1:5M, and 1:20M (5). The full-resolution data is available as “TIGER/Line” data whereas the other resolutions are available as “Cartographic Boundary Shapefiles.” At these resolutions the perimeters of the districts of the 114th Congress are defined by an average of 8914, 1531, 322, and 70 points, respectively.
As services move online and onto mobile devices with constrained processing, it will be tempting for practitioners to introduce lower-resolution or simplified data into compactness measurements. Even in the high-performance environments used for automated redistricting efforts (27), low-resolution data is tempting as it may yield substantial savings on compute time. Ultimately, we find that the choice of resolution has a substantial impact on compactness scores (Figure 13 and 15) with the Polsby-Popper score especially affected. This adds to a growing list of criticisms of the Polsby-Popper score. (1, 4)
Since data may be supplied to users by outside sources, adversarial inputs are possible: A high-frequency wave applied to the boundary of a district may be visually imperceptible while introducing substantial alterations to a district’s score. The Koch snowflake is an extreme example of this: It has an arbitrarily-long perimeter surrounding a finite area (Figure 14). More practically, data may contain digitization or simplification artifacts that only become apparent under significant magnification, as shown in Figure 8.
The foregoing considerations change not only what the values of the calculated scores are, but also the relative ordering of the scores (Figure 16). If this is quantified using Spearman’s rank correlation coefficient (Figure 17), it is apparent that different scores give markedly different rankings. Thus, any ranking of districts by compactness is thoroughly tied to and arises from choices made in developing the scores. Figure 1 explores this issue further.
Computers generally store fractional values based on the IEEE754 specification using either the 32-bit single-precision type, which gives about 7 decimal places of precision, or the 64-bit double-precision type, which gives about 15 decimal places of precision. In terms of decimal degrees, the former provides approximately centimeter accuracy while the latter provides nanometer accuracy; thus, single-precision is sufficient for storing geographic coordinates. However, performing mathematics on fractional numbers, especially 32-bit types, is known to give potentially erroneous results (28).
We tested for this by computing all of the scores mentioned here using both 32-bit and 64-bit IEE754 compliant types, with the latter taken as the “true” value. No score had a percent difference between the two of more than 0.027%.
The open source software described here had its genesis in the Geometry of Redistricting workshop held at Tufts University August 7–11, 02017. John Connors helped develop the mandeR package. Max Gardner, Aaron Dennis, Daniel McGlone, and Ariel M’ndange-Pfupfu helped develop the python-mander package. Ariel M’ndange-Pfupfu and Vanessa Archambault helped develop the QGIS plugin. Computation and data utilized XSEDE’s Comet supercomputer (29). Travel funding for RB and research support for JS was provided by a Prof. Amar G. Bose Research Grant and an Amazon Research Award. In-kind support was provided by Isaac B., Hannah J., Kelly K., Vivian L., and Jerry W.
Goldberg D (1991) What every computer scientist should know about floating-point arithmetic.ACM Computing Surveys (CSUR) 23(1):5–48.
|District||Score Value||Diff from Mean||Score Name||Tolerance||Projection||Choice|