The story of the Flint Water Crisis is long and has many facets, involving government failures, public health challenges, and social and economic justice. As Flint struggled financially after the 2008 housing crisis, the state of Michigan installed emergency managers to implement several cost saving measures. One of these actions was to switch Flint’s drinking water source from the Detroit system to the local Flint river in April 2014. The new water had different chemical characteristics which were overlooked by water officials. Of course many water systems have lead pipes, but these pipes are typically coated with layers of deposits, and the water is treated appropriately in order to prevent corrosion and the leaching of heavy metals.
City officials failed to follow such necessary procedures, the pipes began to corrode, Flint’s drinking water started to give off a different color and smell (Fonger, 2015b), and Flint residents were exposed to elevated levels of lead for nearly two years before the problems received proper attention. In August 2015 environmental engineers raised alarm bells about contaminated water111 Prior work by the authors involved estimation of water lead contamination
Prior work by the authors involved estimation of water lead contamination(Abernethy et al., 2016). (Torrice, 2016), not long after a pediatrician observed a jump in the number of Flint children with high blood lead levels222For further analysis of blood lead levels, see (Potash et al., 2015)(Hanna-Attisha et al., 2016), and by January 2016 the Flint Water Crisis was international news.
As attention to the problem was growing, government officials at all levels got involved in managing the damage and pushing recovery efforts. In looking for the primary source of lead in Flint’s water distribution, attention turned to Flint’s water service lines, the pipes that connect homes to the city water system. These service lines are hypothesized to be the prime contributor to lead water contamination across the United States (Sandvig et al., 2008). Service lines, therefore, became a top priority for the City of Flint in February 2016. The Michigan state legislature eventually appropriated $27M towards the expensive process of replacing these lines at large scale; later the U.S. Congress allocated another nearly $100M towards the recovery effort. The group directed to execute the replacement program was called Flint Fast Action and Sustainability program (FAST Start), and their task was to remove as many hazardous service lines as possible up to funding levels.
The primary obstacle that the FAST Start team has faced throughout their work is uncertainty about the locations of lead or galvanized pipes. Although the U.S. Environmental Protection Agency requires cities to maintain an active inventory of lead service line locations, Flint failed to do so. Service line materials are in theory documented during original construction or renovation, but in practice these records are often incomplete or lost. Most importantly, because the information is buried underground, it is costly to determine the material composition of even a single pipe. Digging up an entire water service line pipe under a resident’s yard costs thousands of dollars. City officials were uncertain about the total number of hazardous service lines in the city, with estimates ranging from a few thousand to tens of thousands. Uncertainty about the service line material for individual homes has dramatic cost implications, as construction crews will end up excavating pipes that do not need to be replaced. These questions—how many pipes need to be replaced and which home’s pipes need remediation—are at the core of the work in this paper.
Beginning in 2016, our team began collaborating directly with Flint city officials, analyzing the available data to provide statistical and algorithmic support to guide decision making and data collection, focusing primarily on the work of the FAST Start pipe replacement efforts. By assembling a rich suite of datasets, including thousands of water samples, information on pipe materials, and city records, we have been able to accurately estimate the locations of homes needing service line replacement, as well as those with safe pipes, in order to target recovery resources more effectively. Specifically, we have combined statistical models with active learning methods that sequentially seek out homes with hazardous water infrastructure. Along the way we have developed web-based and mobile applications for coordination among government offices, contractors, and residents. Over time, the number of homes’ service lines inspected and replaced has increased, as seen in Figure 1.
In the present paper, we detail the challenges faced by decision-makers in Flint, and describe our nearly two years of work to support their efforts. With the understanding that many municipalities across the US and the world will need to undertake similar steps, we propose a generic framework which we call ActiveRemediation, that lays out a data driven approach to efficiently replace hazardous water infrastructure at large scale. We describe our implementation of ActiveRemediation in Flint, and describe the empirical performance and potential for cost savings. To our knowledge, this is the first attempt to predict the pipe materials house-by-house throughout a water system using incomplete data and also the first to propose a statistical method for adaptively selecting homes for inspection to replace hazardous materials in the most cost effective manner. This work illustrates a holistic, data-driven approach which can be replicated in other cities, thereby enhancing water infrastructure renovation effort with data-driven approaches.
Key Results. Among our main results, we emphasize that our predictive model is empirically accurate for estimating whether a Flint home’s pipes are safe/unsafe, with an AUROC score of nearly 0.92, and a true positive rate of 97%. Since our approach involves a sequential protocol that manages the selection of homes for inspection and replacement based on our statistical model, we are also able to compare the model’s total remediation cost to that of the existing protocol of officials. ActiveRemediation reduces the costly error rate (fraction of unnecessary replacements) to 2%, lowering the effective cost of each replacement by 10% and yielding about $10M in potential savings.
Methodology. Let us now give a birds-eye view of our methodological template. ActiveRemediation manages the inspection and replacement of water service lines across a city, with the long-term objective of replacing the largest number of hazardous pipes in a city under a limited budget. The formal in-depth exposition of this framework will be given in Section 3.
Since the process of identifying and replacing these lines around a city is naturally sequential, the decisions and observations made earlier in the process ought to guide decisions made at future stages. With this in mind, our framework continuously maintains three subroutines that are updated as data arrives. Following the outline in Algorithm 1, the first of these is a StatisticalModel, that generates probabilistic estimates of the material type of both the public and private portion of each home’s service lines. The input of this model is property data, water test results, historical records, and observed service line materials. The second subroutine is InspectionDecisionRule, the decision procedure that that generates a (randomized) set of homes for inspection. This should be viewed as an active learning protocol, with the goal of “focused exploration.” The third routine, ReplacementDecisionRule, makes decisions as to which homes should receive line replacements; for reasons we discuss below, we typically assume that ReplacementDecisionRule is a greedy algorithm.
Roadmap. This paper is structured as follows. We begin in Section 2 by laying out the datasets available to us, with the story given chronologically to describe the shifting narrative as information emerged. We then explain the ActiveRemediation framework in greater detail in Section 3, and sketch out the statistical model mixed with the prediction, inspection, and decision-making framework. In Section 4 we employ ActiveRemediation on the data available in Flint, to show the empirical performance of our proposed methods in an actual environment, as well as in a simulated environment leveraged from Flint’s data. We finish by detailing the potential for significant cost savings using our approach.
2. Emerging Data Story of Flint’s Pipes
We now describe the various sources of data and the timeline during which these became available. This is summarized in Table 1 and more precise chronology is given throughout this section. More details will be available in the full version of this work.
2.1. Pre-crisis Information – Through mid-2015
In this section, we explain the relevant datasets that had been collected and maintained prior to the water crisis. This information, as we discovered later, was limited in both depth and quality.
|2016 Feb.||Attributes for all 55k parcels provided by the City of Flint|
|2016 Feb.||SL records digitized by M. Kaufman at UM Flint GIS|
|2016 March||Pilot Program, 36 homes visited, 33 SLs replaced|
|2016 June||Michigan DEQ provides SL private-portion inspections dataset|
|2016 Sept.||Phase One begins, contractors use our mobile data collection app|
|2016 Oct.||Fast Start begins hydrovac inspections to verify some home SLs|
|2016 Oct.||Congress appropriated $100M in WIIN Act.|
|2016 Dec.||Fast Start & authors release report: 20-30k replacements needed|
|2017 March||Federal court orders 18k homes to receive SL replacement by 2019|
|2017 Sept.||Fast Start replaced 4,419 hazardous service lines so far, identifying composition of a total of 6,506 homes.|
2.1.1. Parcel Data
The city of Flint generously provided us with a dataset describing each of the 55,893 parcels in the city. These data include a unique identifier for each parcel and a set of columns describing City-recorded attributes of each home, such as the property owner, address, value, and building characteristics. A complete list of the parcel features is discussed in our previous work (Chojnacki et al., 2017). The distributions of the age of homes and their estimated values (Figure 2) tell an important story about the kinds of properties in Flint.
2.1.2. City Records of Service Lines
Initially, Flint struggled to produce any record of the materials in the city’s service lines. Eventually, officials discovered a set of over 100,000 index cards in the basement of the water department333http://www.npr.org/2016/02/01/465150617/flint-begins-the-long-process-of-fixing-its-water-problem (see top of Figure 3). As part of a pro bono collaboration, the handwritten records have been digitized by Captricity.com and provided to the City of Flint.444We would like to thank Captricity, especially their machine learning team, Michael Zamora, Michael Zamora, David Shewfelt, and Kayla Pak for making the data accessible. Around the same time, a set of hand-annotated maps were discovered that contained markings for each parcel that specified a record of each home’s service line (bottom of Figure 3). The map data was digitized by a group of students from the GIS Center at the University of Michigan-Flint lead by the director Prof. Martin Kaufman (Fonger, 2015a). Many of the entries in the city’s records list two materials for a given record, such as “Copper/Lead,” but they do not specify the precise meaning of the multiple labels. However, our latest evidence suggests that, at least in the typical case, the double records were intended to specify that the second label (“Lead” in “Copper/Lead”) indicates the public service line material (water main to curb stop), and the first label describes the private service line (curb stop to home), while an entry that is simply given as “Copper” may refer to both sections or only one. Lastly, there are a number of entries in the records that say “Copper/?” for the service line material, indicating missing information for the service line on the original handwritten records. Many other records are simply blank, recorded as “Unknown/Other.”
2.2. Peak of Crisis & Replacement Pilot
In the wake of the crisis the State of Michigan began to discuss plans for lead abatement in Flint. It had become clear to lawmakers in Michigan that they would need to invest in a large-scale removal of lead pipes from the city. To begin, FAST Start initiated a pilot phase, with the goal of replacing the service lines of a small set of residences. Flint’s Mayor and the FAST Start team awarded a contract to Rowe Engineering to replace pipes at 36 homes around the city. They selected these homes based on risk factors including the presence of high water lead levels, pregnant women, and children younger than 6 years old. Nearly all of the homes, 33 of 36, had some hazardous material (lead or galvanized) in one or both portions of the service lines, while only 3 were safe. Therefore, the number of homes with physical verifications of both service line portions through September 2016 was only 36 out of over 55,000 homes. A map showing the progress of replacement in Flint can be found in Figure 1.
|Verified SL Materials (Public-Private)|
Meanwhile, in order to gather reliable information about private part of the service lines, the Michigan Department of Environmental Quality (DEQ) directed a team of officials and volunteers from the local plumbers union to personally inspect a sample of the homes of Flint residents. The public portion of the service line runs entirely under the street and sidewalk, while the private portion runs directly into the basement of the residents’ home. Thus, the private portion can be inspected without any digging. The DEQ inspectors submitted their inspection results. As of June of 2016, the department had collected a data from over 3,000 home inspections. We consider this data to be reliable, since it was curated by DEQ officials who provided it to our team. This dataset allowed us to partially evaluate the reliability of the city records discussed in Section 2.1
. It is important to note that the comparison is not “apples to apples,” as the DEQ inspections were private-portion only whereas the labels in the city records did not specify which portion of the line was indicated. We report the confusion matrix between DEQ inspection data and city records in Table2. The comparison suggests that, while the records were correlated with ground truth, the discrepancies were substantial.
2.3. Large-Scale Replacement, Mid-2016 to Now
Our group at the University of Michigan began engaging with the FAST Start team in the summer of 2016. One of the critical decisions the team needed to make was the selection of homes that would be recommended for service line replacement. According to the FAST Start payment agreements, contractors receive roughly half ($2500) the cost of a full replacement ($5,000) for excavated homes with copper on both public and private portions, due to removing concrete, refilling concrete, machine use, and labor. The choice of homes was deemed critically important, as the excavation of a home’s service line that discovers a “safe” (e.g., copper) pipe is effectively wasted money, aside from the benefit of learning of the pipe’s true material. Our work has focused on minimizing such unnecessary excavations, using the tools we describe below.
2.3.1. Early Replacement Activity and Findings (Fall 2016)
By summer 2016, FAST Start had selected a set of 200 homes for replacement, scheduled to begin August, . This selection is called Phase One. Like the Pilot Phase, their criteria included the presence of high water lead levels, pregnant women, children under six years old, as well as veterans and the elderly. In the present section, we describe how we helped facilitate data collection for Phase One, and how the results forced us to rethink our objectives and adjust our models.
By late September 2016, the early data from the service line replacement program began to arrive, and the rate of lead and other hazardous pipes discovered was alarming; 96% (165/171) of excavations revealed lead in the public portion of the line. These findings differed significantly from the city records, which had previously indicated that among those homes only 40% would contain lead in either portion. As data from Phase One arrived it was becomingly increasingly clear that likely over 20,000 homes have unsafe pipes serving their water – dramatically higher than earlier estimates. Critically, as these discoveries were being made, a debate was taking place in the U.S. Congress discussing the possibility of more than $100M in funding for the Flint’s recovery efforts.
With the debate in the Congress ongoing, our team decided to put out an informal report to raise the alarm about the extent of the lead issue, and several news outlets reported on our findings (e.g. Carmody and Brush, 2016; Dolan, 2016). This effort lead to a formal report in November of 2016 that provided a more precise estimate of the number of lead replacements likely to be needed (Moore, 2016), which was provided to the city’s mayor, the DEQ, and the U.S. Environmental Protection Agency. Our report, based on comparing the city records and the data gathered from contractors, suggested that the number of needed replacements would be between 20,600 and 37,100. The large range accounts for the inherent uncertainty in data collection and model assumptions, as well as the question of occupancy. One challenge that is specific to Flint is the fact that around one third of the city’s homes are not occupied, a rate that is the highest in the country555https://www.reuters.com/article/us-flint-vacancies-idUSKCN0VK08L.
2.3.2. Contractor Data Collection Application
With thousands of homes scheduled to have their water service lines excavated by multiple contractors, the collection and management of the data generated by this large-scale effort would prove to be a logistical challenge. While initially there was a plan in place to collect data via paper forms that would later get transferred to a spreadsheet, it was increasingly clear that digitally recording information, and storing it centrally, would be a more effective strategy and less prone to error.
Our team volunteered to facilitate the data collection efforts. In the fall of 2016, we developed a web and mobile application with various access levels. The latest version of this app is a custom-built web application using Python and the web framework using Flask. The users, on-site contractors as well as DEQ and Fast Start officials, are asked to select homes and to fill in essential information about service line work accomplished at each site. This information includes the excavated pipe materials, lengths, dates, and data on the home’s residents. The output of the form appears in real-time in a live database with mapping capabilities. We adopted a tiered permissions structure with password-protected information to maintain the privacy of the data. The app continues to be used as of this writing for tracking progress for the public and for paying contractors for completed work.
2.3.3. Hydrovac Digging: Inspection without Replacement
The foremost challenge of a large-scale service line replacement program is the uncertainty about which homes possess safe service lines and which homes have lines made of hazardous materials. As of the summer of 2016, the only concrete verified data on pipe materials across the city consisted of the 36 data points provided by the Rowe engineering. By the end of Phase One, this number increased to about 250 homes. At this point, the excavation of pipes at a single home would cost anywhere from $2,500-5,000, a prohibitively high cost for data collection. At the same time, the available replacement data consisted of cherry-picked homes: houses were selected for line replacement if they were presumed to have an overwhelming likelihood of lead. These addresses and were highly concentrated in only three neighborhoods (see Figure 1) and provided nothing close to a representative sample of the broader city. We therefore realized, and emphasized to members of FAST Start, that the effort required a cheaper, quicker, and more statistically sound method to gather data.
After a lengthy discussion with water infrastructure experts and contractors, a new alternative emerged: hydrovac inspections. A hydro-vacuum truck, or simply a hydrovac (see Figure 5), has two main components: a high-pressure jet of water used to loosen soil and a powerful vacuum hose that sucks the loosened material into a holding tank. The hydrovac technique allows workers to dig a small hole quickly and then inspect whatever is observed underground. It is ideal for determining service line materials, as it can dig at the location of the home’s curb box (connects the home’s service line at the property line to the water main), and observe the pipe materials for both the public and private portions of the service line. The cost can be as low as $250 per inspection and often does not require prior approval from residents, as the digging site is mostly confined to city property. One limitation is that the hydrovac can only dig through the soil, and not through driveway or sidewalk pavement. This limitation led to unsuccessful excavations 20%-25% of the time, according to the hydrovac engineers.
The selection of homes for hydrovac inspection was one of the primary contributions of our team to FAST Start’s efforts, and we were given wide discretion for “sampling” homes. This reflects the political and logistical challenges of service line replacement, as full excavation of service lines required a much longer process with oversight by the city council. We would emphasize that, in the following section where we describe our sequential decision protocols, our primary focus was on the model and inspection subroutines, and we assume the replacements are made using a simple greedy strategy.
3. Prediction & Decision Framework
In this section, we formally define the sequential decision-making problem for a city, in our case the city of Flint, seeking to remove all of the lead service lines from its homes under the following conditions: (i) for almost all homes, the service line materials of homes are unknown; (ii) there is a method of inspection to collect information; (iii) it is costly to excavate service lines that do not need to be replaced; and (iv) there is a fixed budget for replacement and inspection.
There are total homes in the city, and it is unknown which homes need new service lines. We let the unknown label for home be , taking on the value 1 if the home needs a replacement and 0 otherwise. Note that a home needs replacement if either the public or
private portion of the service line is hazardous. We also have information about each home, denoted by a vector, with features, that describe it (see Section 2). We want to learn the label given , for each . We divide the procedure to find out these labels into two steps: first, a statistical model for prediction (StatisticalModel); and second, an algorithm that decides which homes to observe next (InspectionDecisionRule).
There is another decision rule, ReplacementDecisionRule, that determines which pipes to replace next. ReplacementDecisionRule is a greedy
algorithm. That is this algorithm recommends that the replacement crew should go to the homes with the highest probabilities of having hazardous pipes. Given that, ourInspectionDecisionRule is focused on learning, and ReplacementDecisionRule uses that learning to reduce costs.
In this section, we describe StatisticalModel, which assign a probability that a service line contains hazardous materials. StatisticalModel is a novel combination of predictive modeling using machine learning and Bayesian data analysis. First, a machine learning prediction model gives a prediction for the public and private portion of each home’s service line using known features. These predictions then become the parameters to prior distributions in a hierarchical Bayesian model designed to correct some of the limitations to the machine learning model.
3.1.1. Machine Learning Layer
The machine learning layer of StatisticalModel outputs a probability of having a hazardous service line material for each home for which the material is unknown. Specifically, this layer gives a prediction, , the probability that service line portion for home is hazardous, and is a vector of features, described in Section 2.1. After examining several models empirically (see Section 4.1) we chose the machine learning layer,
, to be XGBoost, a boosted ensemble of classification trees(Chen and Guestrin, 2016).
3.1.2. Hierarchical Bayesian Spatial Model Layer
One limitation of classification algorithms is how they handle unobserved variables, which may be correlated with the outcome. We address this limitation with a hierarchical Bayesian spatial model. This accounts for unobserved heterogeneity related to geographic location and similiarity of homes, which is used in hierarchical spatial models with conditional autoregressive structure (Gelman et al., 2014; Gelfand and Vounatsou, 2003; Lee, 2011, 2013). Empirically, each geographic region across the city (e.g., voting precincts) has a different number of observed service lines. While a city-level (pooled) model ignores precinct differences and a separate (unpooled) model for each precinct is limited by small sample sizes or even no observations, our full hierarchical (partially pooled) model strikes a balance with shrinkage. Precincts with little information will have their parameters pulled towards the city-wide distribution. Details of the Bayesian model, and how these are combined with the machine learning layer, are explained further in the full version of the paper.
Now we describe InspectionDecisionRule, which utilizes active learning (Balcan and Feldman, 2013; Balcan et al., 2010; Liu et al., 2008) to efficiently allocate scarce resources to find and replace hazardous service lines. In general, a decision-maker may choose any active learning algorithm for inspection. In this work, we implement a version of Importance Weighted Active Learning (IWAL).
|observable feature space for each parcel/home|
|observable features for home , label for home|
|indicates “home inspected/replaced at ?”|
|indicates “learned ’s’ label via inspect./replace?”|
|indicates “learned ’s label at ?”|
|indicates “already know ’s label at ?”|
|cost of inspect., successful SLR, & failed SLR|
|set of labeled/unlabeled data at|
3.2.1. Active Learning Setup: Inspection and Replacement
We begin by describing the problem of efficiently locating and replacing hazardous pipes in a pool-based active learning framework (see Algorithm 2). Consider a budget of total queries and a pool of unlabeled homes. Then at each time period the algorithm will produce a probability vector that gives the probability that any home is chosen at .
Contractors can determine the material of a service line via either hydrovac inspection or service line replacement. When home is chosen for hydrovac inspection at time , we denote . When the service line for home is replaced at time , we denote . Once inspected or replaced, is known for all subsequent rounds and becomes 1 or 0, and we define if home has been observed through round . and are the number of hydrovac and replacement visits, respectively. The number of successful replacements is denoted as (true positives) and the number of unnecessary replacements as (false positives).
We initially set , and let be the set of homes whose service line material is unknown at time , and be the set of homes with known service line materials. Finally, the budget also allows for a fixed number of inspections for each period. The problem is how to select these homes with unknown labels at each period to maximize information gained.
3.2.2. Simple Active Learning Heuristics: Uniform and Greedy
We first propose several benchmark strategies for selecting homes for inspection. This family of algorithms randomly alternate between random exploration of the unobserved data and greedy inspection of the highest-predicted hazardous homes. As we see in Table 4, these decision rules differ in the costs they incur.
HVI uniform (egreedy(1.0)): Select homes uniformly at random from the pool of those with unknown service lines.
HVI greedy (egreedy(0.0)) Select the homes most likely to have hazardous service lines, based on current model estimates.
HVI -greedy (egreedy()): For a fraction of the inspections, select greedily, that is select homes for HVI based on the highest predicted likelihood of danger. For the remaining fraction, select homes uniformly at random for HVI. We experiment with values . Also, we note that HVI uniform and HVI greedy are special cases, with set to and , respectively.
3.2.3. Importance Weighted Active Learning
We propose an algorithm that takes in the current beliefs about whether each home has hazardous pipe material, and outputs a decision of which homes should be inspected next period. This proposal is a variant of the Importance Weighted Active Learning (IWAL) algorithm (Beygelzimer et al., 2009). The key idea behind IWAL is to sample unlabelled data from a biased distribution, with more weighted placed on examples with greater uncertainty, and then after obtaining the desired labels to incorporate the new date on the next iteration of model training. Our implementation of this approach takes the part of InspectionDecisionRule which is core to Algorithm 2. A full explanation of our IWAL implementation will be available in the full version of the paper.
3.2.4. Analyzing Costs
There are two categories of costs incurred in Algorithm 2: hydrovac inspections and replacement visits. Hydrovac inspections always cost the same amount and are denoted . Service line replacement costs, however, depend on what is actually in the ground. If contractors excavate a service line that does not need be replaced, we still incur a cost for labor and equipment, even though no replacement occurred. On the other hand, if contractors uncover a line that needs to be replaced then the direct cost of replacement is .
But effective cost per successful replacement is greater than its direct cost, and we define formally it as , where
(See Algorithm 2). In Flint, hydrovac inspection costs are summarized in Table 4. We note that the effective cost of a successful replacement is driven by two factors: the model accuracy () and the ratio of their costs, . Since unnecessary replacement visits can be avoided by prior inspection with a hydrovac, these two metrics, which naturally vary by city, will be critical guides to applying this approach to other cities.
|Homes visited by rule|
|Hydrovac Inspection||Replacement Visit||(Cost)||Uniform||None||10%|
|Finds Safety||not needed||($250)||230||0||23|
|Finds Danger||replaces Danger||($5,250)||770||0||77|
|Effective Cost per Successful Replacement:||$5,325||$5,747||$5,705|
4. An Empirical Analysis in Flint
In our empirical analysis, we use the data of the confirmed service line material from the 6,505 homes identified and replaced by Flint FAST Start, as of September 30, 2017 collected via our data collection app. This data is combined with our supplementary datasets describing homes (Section 2) and we train a suite of classification models to predict the presence of hazardous service line materials for a given home, and the predictive power of each model is measured on hold-out sets of homes (Section 4.1). After selecting a strong empirical model, we utilize the model predictions in our decision-making algorithms, which recommend those homes which will be most informative for inspection, and also those most likely containing hazardous service line materials for replacement (Section 4.2).
We emphasize that our methods and models were utilized by FAST Start officials for the management of the hydrovac process, and during the early days of the efforts we were given discretion over which homes would receive inspections. We used this freedom to select statistically representative samples, as well as targeted inspections on homes of interest. In practice, our modeling efforts had less impact on the choice of replacement homes, as these decisions carried greater political and logistical challenges.
4.1. Classification Algorithm Performance
Selecting a robust, precise, unbiased, and properly calibrated classification algorithm is key for our proposed active learning framework. Ultimately, the selected decision-making algorithm requires both accurate and well-calibrated probability estimates when selecting the next round of homes to investigate. To select such a classification model, we employ several machine learning model and compare them across various performance metrics. These metrics include the Area Under Receiver Operating Characteristic curve (AUROC), learning curves, and confusion matrices (including accuracy and precision). Using these scores, we find that tree-based methods are the most successful and robust category of models for this data. In particular, the model for gradient boosted trees implemented in the packageXGBoost exhibits the strongest performance with a fewest data points.
4.1.1. ROC and Learning Curves
The overall accuracy of the best performing XGBoost model, based on a holdout set of 1,606 homes (25% of available data), is 91.6%, with a false-positive rate of 3% and false-negative rate of 27%. The homes falling in the top 81% of predicted probabilities are classified as having hazardous service lines. The ROC curves and AUROC scores show XGBoost’s superior performance with an AUROC score of 0.939 on average in a range of [0.925, 0.951], Figure6 and 7). While the ROC curves show a single run of each model, the AUROC scores are shown as distributions of 100 bootstrapped samples obtained using a stratified cross-validation strategy with 75%/25% of the data randomly selected for training/validation. We further examine AUROC scores using learning curves (Figure 8), using random subsets of data to illustrate diminishing returns of additional data on model performance using AUROC. We also introduce, temporal learning curves. These temporal learning curves reflect the exact order of data collection in 2016-17, and they show the AUROC as we re-estimate the model every two-week period to predict the danger for all remaining not-yet-visited homes. We finally ensure that the model’s predicted probabilities, which we use to quantify our prediction uncertainty, are indeed well-calibrated probabilities. 666
While not shown here, we also considered ExtraTrees, AdaBoost (with decision tree classifier), and Ridge Regression (regularized with L2 loss), but performance was lower than the three presented. Full details on hyperparameter optimization will be available in the full version.
4.1.2. Risk factors
Now that we have a robust predictive model, we can look at which features of a home and its surrounding neighborhood are the most predictive feature in identifying homes with hazardous service lines. But we are cautious to not make any causal claims from this analysis. We obtain the feature importance values777We calculate feature importance by weight, which is the normalized frequency with which a feature appears in a tree amongst the ensemble. produced by each model by training with 20 bootstrapped samples of the data and reported the average feature importance values. The most informative home features relate to its age, value, and location, suggesting that the context (place and time) in which the home was built, as expected, is strongly correlated with service line material. For instance, homes built during and before World War II and those that are lower in value are more likely to contain lead in their public service line. Two additional features were the city records and the DEQ private SL inspection reports. Each was shown to be a noisy but useful predictor, as indicated earlier in Table 2.
4.2. ActiveRemediation: Evaluation
We now discuss our implementation of the ActiveRemediation framework applied to the particular case of Flint’s large-scale pipe replacement program. With over $100M in investment, Flint is a perfect testbed to compare the performance of our proposed methods (developed in Section 3.2) with the actual empirical performance of the work of FAST Start thus far. Our goal is to show a high potential for savings by minimizing the number of unnecessary replacement visits, thus replacing more hazardous lines under the same budget.
4.2.1. Experimental testbed, and potential biases.
Any experimental framework needs a quality dataset, with known labels for a large sample which we can evaluate our procedure. Fortunately for the City of Flint, where contractors have been working for over 18 months, we have a total of 6,506 observations of service line materials. A natural choice for an experimental environment, which we call ActualFlint, is to use the set of observed homes in Flint as a template for the overall city, i.e. a municipality with precisely 6,506 homes whose service line material we can query as needed.
A major challenge of relying solely on observed data is that the actual home selection process is biased, in both the hydrovac inspections and the line replacements. While a certain fraction of the home selection was random, it was often reasonably arbitrary due to political and logistical constraints. For instance, many of the homes selected for service line replacement were chosen to maximize lead discovery. To assess the effect of sample bias, we developed an experimental environment, SimulatedFlint, in which we suppose Flint contains only those properties not in
the observed dataset. For this dataset, labels are assigned based on the labeled hold-out data. With observed data as training, we used a K-Nearest-Neighbors (KNN) classifier to estimate a probability for each unknown home, and then sampled a Bernoulli random variable – ”safe”/”unsafe” – to assign labels. This randomized dataset has lower potential selection bias concerns. In the reported results below, we focus onActualFlint, but we note that results from SimulatedFlint were nearly equivalent.
4.2.2. Backtesting Simulation on ActualFlint
We quantify the cost savings from implementing our algorithm by comparing the sequential selection of homes from the proposed decision rules to what the Flint FAST Start initiative actually did in 2016-17. The goal is to stretch the allocated funds to remove hazardous pipes from as many homes as possible. One source of inefficiency in spending is unnecessary service line replacement (SLR) visits (the false-positive error rate). Therefore, our key performance metric is the SLR hit rate, i.e. the percentage of homes visited for replacement that required replacement.
The proposed approach greatly improves the hit rate. Our key finding from the simulation shows that we predict a reduced rate of costly unnecessary replacements visits from 18.8% (actual) to 2.0% (proposed). Figure 9 illustrates the direct comparison of hit rates for our proposed approach, IWAL(0.7), based on our ActualFlint simulation, compared to Flint FAST Start.
Second, the cost savings are substantial. The proposed algorithm, with a higher hit rate, increases the number of homes that receive service line replacements for the same number of visits. This, in turn, reduces the effective cost of a successful service line replacement. The effective cost includes both the direct costs of successful replacement visit and the average costs incurred by exploring homes from hydrovac inspections or unnecessary replacement visits. Having access to the exact same set of 6,505 homes actually observed, we find that the algorithm on average saves an additional 10.7% in funds per successful replacement (see Table 5). Across 18,000 total planned service line replacements, this would extend to an expected savings of about $11M out of current spending. In terms of the overall removal of lead pipes, this is approximately equivalent to 2,100 additional homes in the city that would receive safe water lines. These estimates are made using the current costs in Flint, where hydrovac inspection costs , unnecessary replacement costs , and successful replacement costs .
|Actual||Proposed Algorithm .|
|For every 1 successful replacement:|
|Effective cost||$5,818||$5,196||($5,186 to $5,208)|
|Predicted savings ($)||–||$621.7||($610.4 to $632.4)|
|Predicted savings (%)||–||10.7%||(10.5% to 10.9%)|
|For every 1,000 successful replacements, the savings generate:|
|Extra inspections||–||94||(92 to 96)|
|Extra replacements||–||120||(117 to 122)|
|For 18,000 successful replacement:|
|Predicted savings ($ in millions)||–||$11.18m||($10.99m to $11.39m)|
The proposed approach outperforms a competitive set of natural benchmark strategies. Instead of only comparing our proposed method to what actually occurred, we also consider a range of alternative methods. In particular, greedy (egreedy with 0% exploration) inspects the highest rate of hazardous homes inspected (HVI hitrate 91%), and uniform (egreedy with 100% exploration) inspects the lowest (63%). But IWAL does better with a more principled approach, selecting homes that are likely to be most informative, with risk probabilities near 70%. Figure 10
shows how IWAL and two greedy heuristics differ. Higher HVI hit rate is not better; instead, it is the choice of which homes to explore with inspection that matters. The uncertainty in performace of each algorithm comes from sampling variation from running 25 independent simulated experiments. We prefer IWAL to alternatives because it has greater savings and is less sensitive to tuning parameters.
We acknowledge some assumptions in our simulations. First, we only consider the cost of each job and not the time required for crews to move between homes, where there may be logistical issues with redirecting teams around the city. Second, in this analysis we have treated the ActualFlint as having only 6,506 homes of which all are visited. This creates an arbitrary finite end point, as the algorithm runs out of homes with unsafe service lines. To avoid this effect, the above calculations, figures, and tables are based on the first 4,500 replacement visits and 2,250 hydrovac inspections. Of course, to validate this, we would need access to a larger set, and thus we turn to our larger simulation using a full size of Flint. Finally, the results are robust to resource allocation schedule and batch size. We recognize that we used a schedule of SLR and HVI activities different than Flint FAST Start. To disentangle the confound between our choice of algorithms and the schedule, we ran an additional version of the ActualFlint backtest, with the schedule as closely aligned with Flint FAST Start in 2016-17 as possible. Across alternative scenarios tested the results differed only slightly.
4.2.3. Results from SimulatedFlint
In our second simulation, we demonstrate the potential value of deploying the algorithm at scale and characterize the long-term performance of the algorithms. Via SimulatedFlint we find that the proposed algorithms, with the aim of replacing hazardous lines from 18,000 homes out of a simulated city of 48,000 homes, can achieve 11.8% savings relative to the current rate of spending. The best algorithm using IWAL yields an average effective cost of $5,133 per successful replacement, better than $5,818 observed in Flint (Table 5). As a final note, the proposed algorithms’ SLR hit rates are all above 98.0%.
The authors would like to thank the FAST Start team for their phenomenal work and openness to collaboration. This includes Brigadier General (Ret.) Michael McDaniel, Ryan Doyle, Major Nicholas Anderson, and Kyle Baisden. Professors Lutgarde Raskin and Terese Olson, environmental engineering faculty at U-M, provided invaluable scientific support throughout. We are incredibly grateful to the work of, and communication with, Professor Martin Kaufman and Troy Rosencrantz at U-M Flint’s GIS Center. We would like to thank Captricity, especially their machine learning team, Michael Zamora, Michael Zamora, David Shewfelt, and Kayla Pak for making the data accessible, and Kuang Chen for the generous support. We had major support from Mark Allison and his team of U-M Flint students. Rebecca Pettengill was enormously generous with her time and ability to help in the Flint community. We thank U-M Professors Marc Zimmerman and Rebecca Cunningham for their encouraging and helpful discussions. Among the many students involved in this work, we would like to recognize the roles of Jonathan Stroud and Chengyu Dai. And this work would not have happened without the expertise and enthusiasm of the students in the Michigan Data Science Team (MDST,http://midas.umich.edu/mdst/, (Farahi and Stroud, 2018)). The authors appreciate the many seminar and conference participants at U-M and elsewhere for their feedback on the academic work. The authors gratefully acknowledge the financial support of the Michigan Institute for Data Science (MIDAS), U-M’s Ross School of Business, Google.org, and National Science Foundation CAREER grant IIS 1453304.
- Abernethy et al. (2016) Jacob Abernethy, Cyrus Anderson, Chengyu Dai, Arya Farahi, Linh Nguyen, Adam Rauh, Eric Schwartz, Wenbo Shen, Guangsha Shi, Jonathan Stroud, et al. 2016. Flint Water Crisis: Data-Driven Risk Assessment Via Residential Water Testing. arXiv preprint arXiv:1610.00580 (2016).
- Balcan et al. (2010) Maria-Florina Balcan, Steve Hanneke, and Jennifer Wortman Vaughan. 2010. The true sample complexity of active learning. Machine learning 80, 2 (2010), 111–139.
- Balcan and Feldman (2013) Maria-Florina F Balcan and Vitaly Feldman. 2013. Statistical active learning algorithms. In Advances in Neural Information Processing Systems. 1295–1303.
- Beygelzimer et al. (2009) Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. 2009. Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning. ACM, 49–56.
- Carmody and Brush (2016) Steve Carmody and Mike Brush. 2016. Flint might have a bigger problem with lead pipes than previously thought. http://michiganradio.org/post/flint-might-have-bigger-problem-lead-pipes-previously-thought. (2016). (Accessed Feb, 16, 2017).
- Chen and Guestrin (2016) Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785–794.
- Chojnacki et al. (2017) Alex Chojnacki, Chengyu Dai, Arya Farahi, Guangsha Shi, Jared Webb, Daniel T. Zhang, Jacob Abernethy, and Eric Schwartz. 2017. A Data Science Approach to Understanding Residential Water Contamination in Flint. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17). ACM, New York, NY, USA, 1407–1416. https://doi.org/10.1145/3097983.3098078
- Dolan (2016) Matthew Dolan. 2016. Far more Flint homes have lead lines than expected, report shows. (2016). http://www.freep.com/story/news/local/michigan/flint-water-crisis/2016/09/28/more-than-half-flint-homes-could-have-lead-lines-report-shows/91225284/ (Accessed Feb, 16, 2017).
- Farahi and Stroud (2018) Arya Farahi and Jonathan Stroud. 2018. The Michigan Data Science Team: A Data Science Education Program with Significant Social Impact.
- Fonger (2015a) Ron Fonger. 2015a. Flint data on lead water lines stored on 45,000 index cards. (2015). http://www.mlive.com/news/flint/index.ssf/2015/10/flint_official_says_data_on_lo.html (Accessed Feb, 16, 2017).
- Fonger (2015b) Ron Fonger. 2015b. Here’s how that toxic lead gets into Flint water. http://www.mlive.com/news/flint/index.ssf/2015/10/see_step_by_step_how_lead_is_g.html. (2015). (Accessed Feb, 16, 2017).
Alan E Gelfand and
Penelope Vounatsou. 2003.
Proper multivariate conditional autoregressive models for spatial data analysis.Biostatistics 4, 1 (2003), 11–15.
- Gelman et al. (2014) Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. 2014. Bayesian data analysis. Vol. 2. Chapman & Hall/CRC Boca Raton, FL, USA.
- Hanna-Attisha et al. (2016) Mona Hanna-Attisha, Jenny LaChance, Richard Casey Sadler, and Allison Champney Schnepp. 2016. Elevated blood lead levels in children associated with the Flint drinking water crisis: a spatial analysis of risk and public health response. American journal of public health 106, 2 (2016), 283–290.
- Lee (2011) Duncan Lee. 2011. A comparison of conditional autoregressive models used in Bayesian disease mapping. Spatial and Spatio-temporal Epidemiology 2, 2 (2011), 79–89.
- Lee (2013) Duncan Lee. 2013. CARBayes: an R package for Bayesian spatial modeling with conditional autoregressive priors. Journal of Statistical Software 55, 13 (2013), 1–24.
- Liu et al. (2008) Alexander Liu, Goo Jun, and Joydeep Ghosh. 2008. Active learning with spatially sensitive labeling costs. In NIPS Workshop on Cost-sensitive Learning.
- Moore (2016) Kristin Moore. 2016. Number of Service Lines that Need Replacing in Flint Rises to 29,100, According to Study. https://www.cityofflint.com/2016/12/01/number-of-service-lines-that-need-replacing-in-flint-rises-to-29100-according-to-study/. (2016). (Accessed Feb, 16, 2017).
- Potash et al. (2015) Eric Potash, Joe Brew, Alexander Loewi, Subhabrata Majumdar, Andrew Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, and Rayid Ghani. 2015. Predictive modeling for public health: Preventing childhood lead poisoning. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2039–2047.
- Sandvig et al. (2008) Anne Sandvig, P Kwan, G Kirmeyer, B Maynard, D Mast, R Rhodes Trussell, S Trussell, A Cantor, and A Prescott. 2008. Contribution of service line and plumbing fixtures to lead and copper rule compliance issues. Environmental Protection Agency, Water Environment Research Foundation.
- Torrice (2016) Michael Torrice. 2016. How Lead Ended Up in Flint’s Tap Water. Chem. Eng. News 94, 7 (2016), 26–29.