Predicting and Interpreting Smell Data Obtained from Smell Pittsburgh
Urban air pollution has been linked to various human health considerations, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently concentrated. All smell report data are publicly accessible online. These reports are also sent to the local health department and visualized on a map along with air quality data from monitoring stations. This visualization provides a comprehensive overview of the local pollution landscape. Additionally, with these reports and air quality data, we developed a model to predict upcoming smell events and send push notifications to inform communities. Our evaluation of this system demonstrates that engaging residents in documenting their experiences with pollution odors can help identify local air pollution patterns, and can empower communities to advocate for better air quality.READ FULL TEXT VIEW PDF
Predicting and Interpreting Smell Data Obtained from Smell Pittsburgh
Air pollution has been associated with adverse impacts on human health, including respiratory and cardiovascular diseases (Kampa and Castanas, 2008; Pope III and Dockery, 2006; Dockery et al., 1993; Prüss-Üstün and Neira, 2016; WHO, 2016). Addressing air pollution often involves negotiations between corporations and regulators, who hold power to improve air quality. However, the communities, who are directly affected by the pollution, are rarely influential in policy-making. Their voices typically fail to persuade decision-makers because collecting and presenting reliable evidence to support their arguments is resource-intensive. Forming such evidence requires collecting and analyzing multiple sources of data over a large geographic area and an extended period. This task is challenging due to the requirements of financial resources, organizational networks, and access to technology. Due to the power imbalance and resource inequality, affected residents usually rely on experts in governmental agencies, academic institutions, or non-governmental organizations to analyze and track pollution sources.
A straightforward solution is to empower the affected communities directly. In this research, we demonstrate how citizen science can be used for communities to pool resources and efforts to gather evidence for advocacy. Data-driven evidence, especially when integrated with narratives, is essential for communities to make sense of local environmental issues and take action (Ottinger, 2017b). However, citizen-contributed data is often held in low regard because the information can be unreliable or include errors during data entry. Also, sufficient citizen participation and data transparency are required for the evidence to be influential. For instance, the city involved in this study, Pittsburgh, is one of the ten most polluted cities in the United States (American Lung Association, 2017). Currently, Pittsburgh citizens report air quality problems to the local health department via its phone line or website.
Nevertheless, the quality of the gathered data is doubtful. Citizens may not remember the exact time and location that pollution odors occurred. Asking citizens to submit complaints retrospectively is hard for capturing accurate details and prone to errors. Such errors can result in missing or incomplete data that can affect the outcome of statistical analysis to identify pollution sources (Devillers and Jeansoulin, 2006). Furthermore, the reporting process is not transparent and does not encourage citizens to contribute data. There is no real-time feedback or ways of sharing experiences to forge a sense of community. Without data that adequately represents the community, it is difficult to know if an air pollution problem is at a neighborhood or city-wide scale. This approach is inadequate for data collection and hinders the participation in bringing air quality issues to the attention of regulators and advocating for policy changes.
Because of these challenges, resident-reported smell data did not gain much attention as a critical tool for monitoring air pollution. However, literature has shown that the human olfactory can distinguish more than one trillion odors (Bushdid et al., 2014) and outperform sensitive measuring equipment in odor detection tasks (Shepherd, 2004). Although there have been discussions about the potential of using smell to indicate pollution events and support decision making (Ottinger, 2010; Obrist et al., 2014), no prior works collected long-term smell data at a city-wide scale and studied if these data are useful for air pollution monitoring and community advocacy.
We propose a system, Smell Pittsburgh (sme, 2018b), for citizens to report pollution odors to the local health department with accurate time and GPS location data via smartphones. The system visualizes odor complaints in real-time, which enables residents to confirm their experiences by viewing if others also share similar experiences. Additionally, we present a dataset of smell reports and air quality measurements from nearby monitoring stations over 21 months (sme, 2018a). We use the dataset to develop a model that predicts upcoming pollution odors and send push notifications to users. We also apply machine learning to identify relationships between smell reports and air quality measurements. Finally, we describe qualitative and quantitative studies for understanding changes in user engagement and motivation. To the best of our knowledge, Smell Pittsburgh is the first system of its kind that demonstrates the potential of collecting and using smell data to form evidence about air quality issues at a city-wide scale. Although stakeholders typically view odor experiences as subjective and noisy, our work shows that smell data is beneficial in identifying urban air pollution patterns and empowering communities to pursue a sustainable environment.
This research is rooted in citizen science, which empowers amateurs and professionals to form partnerships and produce scientific knowledge (Science Communication Unit, 2013; Bonney et al., 2014; Bonney et al., 2016; McKinley et al., 2015; Eitzel et al., 2017). Historically, there exist both research and community-oriented strategies (Cooper and Lewenstein, 2016). Research-oriented citizen science aims to address large-scale research questions which are infeasible for scientists to tackle alone (Bonney et al., 2009a; Silvertown, 2009; Cohn, 2008; Dickinson et al., 2012; Dickinson and Bonney, 2012; Miller-Rushing et al., 2012; Bonney et al., 2009b; Cooper et al., 2007). Research questions under this strategy are often driven by professional scientists. Researchers applying this strategy study how scientists can encourage the public to participate in scientific research. In contrast, community-oriented citizen science aims to democratize science by equipping citizens with tools to directly target community concerns for advocacy (Irwin, 1995; Greaves and Lishman, 1980; Wilsdon et al., 2005; Stilgoe, 2009; Irwin, 2001; Paulos et al., 2008; Irwin, 2006; Stilgoe et al., 2014; Ottinger, 2016; Chari et al., 2017; Hsu, 2018; Corburn, 2005). Research questions under this strategy are often driven by community members, exploring how scientists can engage in social and ethical issues that are raised by citizens or communities. Our research focuses on the community-oriented approach. This approach is highly related to sustainable Human-Computer Interaction (DiSalvo et al., 2009; DiSalvo et al., 2010; Brynjarsdottir et al., 2012; Blevis, 2007; Mankoff et al., 2007; Dourish, 2010), which studies the intervention of information technology for increasing the awareness of sustainability, changing user behaviors, and influencing attitudes of affected communities. We seek to generate scientific knowledge from community data to support citizen-driven exploration, understanding, and dissemination of local air quality concerns.
Modern technology allows communities to collect data that can contextualize and express their concerns. There are typically two types of community data, which are generated from either sensors or proactive human reports. Each type of data provides a small fragment of evidence. When it comes to resolving and revealing community concerns, human-reported data can show how experiences of residents are affected by local issues, but it is typically noisy, ambiguous, and hard to quantify at a consistent scale. Sensing data can complement human-reported data by providing temporally dense and reliable measurements of environmental phenomena but fails to explain how these phenomena affect communities. Without integrating both types of data, it is difficult to understand the context of local concerns and produce convincing evidence.
Human-reported data includes observations contributed by users. Modern computational tools can collect volunteered geographic information (Haklay, 2013) and aggregate them to produce scientific knowledge. However, most prior works focused on collecting general information of particular interest, rather than data of a particular type of human sense, such as odor. Ushahidi gathers crisis information via text messages or its website to provide timely transparent information to a broader audience (Okolloh, 2009). Creek Watch is a monitoring system which enabled citizens to report water flow and trash data in creeks (Kim et al., 2011). Sensr is a tool for creating environmental data collection and management applications on mobile devices without programming skills (Kim et al., 2013a, 2015). Encyclopedia of Life is a platform for curating species information contributed by professionals and non-expert volunteers (Rotman et al., 2012). eBird is a crowdsourcing platform to engage birdwatchers, scientists, and policy-makers to collect and analyze bird data collaboratively (Sullivan et al., 2014, 2009). Tiramisu was a transit information system for collecting GPS location data and problem reports from bus commuters (Zimmerman et al., 2011). One of the few examples focusing on information of a specific modality is NoiseTube, a mobile application that empowered citizens to report noise via their mobile phones and mapped urban noise pollution on a geographical heatmap (Maisonneuve et al., 2009; D’Hondt et al., 2013). The tool could be utilized for not only understanding the context of urban noise pollution but also measuring short-term or long-term personal exposure.
Sensing data involves environmental measurements quantified with sensing devices or systems, which enable citizens to monitor their surroundings with minimal to no assistance from experts. However, while many prior works used sensors to monitor air pollution, none of them complemented the sensing data with human-reported data. MyPart is a low-cost and calibrated wearable sensor for measuring and visualizing airborne particles (Tian et al., 2016). Speck is an indoor air quality sensor for measuring and visualizing fine particulate matter (Taylor and Nourbakhsh, 2015; Taylor, 2016). Kim et al. implemented an indoor air quality monitoring system to gather air quality data from commercial sensors (Kim et al., 2013b). Kuznetsov et al. developed multiple air pollution monitoring systems which involved low-cost air quality sensors and a map-based visualization (Kuznetsov et al., 2011; Kuznetsov et al., 2013). Insights from these works showed that sensing data, especially accompanied by visualizations, could provide context and evidence that might raise awareness and engage local communities to participate in political activism. But none of these work asked users to report odors, and thus can not directly capture how air pollution affects the living quality of community members.
Citizen science data are typically high-dimensional, noisy, potentially correlated, and spatially or temporally sparse. The collected data may also suffer from many types of bias and error that sometimes can even be unavoidable (Budde et al., 2017; Bird et al., 2014). Making sense of such noisy data has been a significant concern in citizen science (Ottinger, 2017a; Newman et al., 2012), especially for untrained contributors (Cohn, 2008; Ottinger, 2010; Bonney et al., 2014; Den Broeder et al., 2016). To assist community members in identifying evidence from large datasets efficiently, prior projects used machine learning algorithms to predict future events or interpret collected data (Bishop, 2006; Mitchell, 1997; Hastie et al., 2009; James et al., 2013; Jordan and Mitchell, 2015; Bird et al., 2014; Bellinger et al., 2017).
Prediction techniques aim to forecast the future accurately based on previous observations. Zheng et al. developed a framework to predict air quality readings of a monitoring station over the next 48 hours based on meteorological data, weather forecasts, and sensor readings from other nearby monitoring stations (Zheng et al., 2015). Azid et al.2014). Donnelly et al.
combined kernel regression and multiple linear regression to forecast the concentrations of nitrogen dioxide over the next 24 and 48 hours(Donnelly et al., 2015). Hsieh et al. utilized a graphical model to predict the air quality of a given location grid based on data from sparse monitoring stations (Hsieh et al., 2015)
. These studies applied prediction techniques to help citizens plan daily activities and also inform regulators in controlling air pollution sources. Most of these studies focus on forecasting or interpolating sensing data. To the best of our knowledge, none of them considered human-reported data in their predictive models.
Interpretation techniques aim to extract knowledge from the collected data. This knowledge can help to discover potential interrelationships between predictors and responses, which is known to be essential in analyzing the impacts of environmental issues in the long-term (Brown, 1992; Den Broeder et al., 2016). Gass et al.
investigated the joint effects of outdoor air pollutants on emergency department visits for pediatric asthma by applying Decision Tree learning(Gass et al., 2014). The authors suggested using Decision Tree learning to hypothesize about potential joint effects of predictors for further investigation. Stingone et al. trained decision trees to identify possible interaction patterns between air pollutants and math test scores of kindergarten children (Stingone et al., 2017). Hochachka et al. fused traditional statistical techniques with boosted regression trees to extract species distribution patterns from the data collected via the eBird platform (Hochachka et al., 2012). These previous studies utilized domain knowledge to fit machine learning models with high explanatory powers on filtered citizen science data. In this paper, we also used Decision Tree to explore hidden interrelationships in the data. This extracted knowledge can reveal local concerns and serve as convincing evidence for communities in taking action.
Our goals are (i) to develop a system that can lower the barriers to contribute smell data and (ii) to make sure the data is useful in studying the impact of urban air pollution and advocating for better air quality. Each goal yields a set of design challenges.
Outside the scope of citizen science, a few works have collected human-reported smell data in various manners. However, these manners are not suitable for our projects. For example, prior works have applied a smell-walking approach to record and map the landscape of smell experiences by recruiting participants to travel in cities (Henshaw, 2013; Quercia et al., 2015, 2016). This process is labor intensive and hard for long-term air quality monitoring. Hsu et al. has also demonstrated that resident-reported smell reports, collected via Google Forms, can form evidence about air pollution when combined with data from cameras and air quality sensors (Hsu et al., 2017; Hsu et al., 2016). While Google Form is usable for a small-size study, it would not be effective in collecting smell reports on a city-wide scale with more than 300,000 affected residents over several years. Therefore, we developed a mobile system to records GPS locations and timestamps automatically. The system is specialized for gathering smell data at a broad temporal and geographical scale.
There is a lack of research in understanding the potential of using smell as an indicator of urban air pollution. Moreover, we recognized that there are various methods of collecting, presenting, and using the data. It is not feasible to explore and evaluate all possible methods without deploying the system in the real-world context. These challenges form a wicked problem (Conklin, 2005; Rittel and Webber, 1973), which refers to problems that have no precise definition, cannot be fully observed at the beginning, are unique and depend on context, have no opportunities for trial and error, and have no optimal or “right” solutions. In response to this challenge, our design principle is inspired by how architects and urban designers address wicked problems. When approaching a community or city-scale problem, architects and urban planners first explore problem attributes (as defined in (Pena and Parshall, 2012)) and then design specific solutions based on prior empirical experiences. We made use of an existing network of community advocacy groups, including ACCAN (ACCAN, 2018), GASP (GASP, 2018), Clean Air Council (CAC, 2018), PennFuture (PennFuture, 2018), and PennEnvironment (PennEnvironment, 2015). These groups were pivotal in shaping the design of Smell Pittsburgh and providing insights into how to engage the broader Pittsburgh community.
Moreover, to sustain participation, we visualized smell report data on a map and also engage residents through push notifications. To add more weight to citizen-contributed pollution odor report, we engineered the application to send smell reports directly to the Allegheny County Health Department (ACHD). This strategy ensured that the local health department could access high-resolution citizen-generated pollution data to ascertain better and address potential pollution sources in our region. We met and worked with staff in ACHD to determine how they hoped to utilize smell report data and adjusted elements of the application to better suit their needs, such as sending data directly to their database and using these data as evidence of air pollution. Based on their feedback, the system submitted all smell reports to the health department, regardless of the smell rating. This approach provided ACHD with a more comprehensive picture of the local pollution landscape.
In summary, when developing Smell Pittsburgh, we considered the system as an ongoing infrastructure to sustain communities over time (as mentioned in (Dantec and DiSalvo, 2013)), rather than a software product which solves a single well-defined problem. The system is designed to influence citizen participation and reveals community concerns simultaneously, which is different from observational studies that use existing data, such as correlating air quality keywords from social media contents with environmental sensor measurements (Ford et al., 2017).
Smell Pittsburgh is a system, distributed through iOS and Android devices, to collect smell reports and track urban pollution odors. We now describe two system features: (1) a mobile interface for submitting and visualizing odor complaints and (2) push notifications for predicting the potential presence of odor events.
Users could report odor complaints via Smell Pittsburgh from their mobile devices via the submission console (Figure 1, left). To submit a report, users first selected a smell rating from 1 to 5, with one being “just fine” and five being “about as bad as it gets.” These ratings, their color, and the corresponding descriptions were designed by affected local community members to mimic the US EPA Air Quality Index (EPA, 2014). Also, users could fill out optional text fields where they could describe the smell (e.g., industrial, rotten egg), their symptoms related to the odor (e.g., headache, irritation), and their personal experiences. Once a user submitted a smell report, the system sent it to the local health department and anonymously archived it on our backend database. Users could decide if they were willing to provide their contact information to the health department through the system setting panel. Regardless of the setting, our database did not record the personal information.
The system visualized smell reports on a map that also depicted fine particulate matter and wind data from government-operated air quality monitoring stations (Figure 1
, right). All smell reports were anonymous, and their geographical locations were skewed to preserve privacy. When clicking or tapping on the playback button, the application animated 24 hours of data for the currently selected day, which served as convincing evidence of air quality concerns. Triangular icons indicated smell reports with colors that correspond to smell ratings. Users could click on a triangle to view details of the associated report. Circular icons showed government-operated air quality sensor readings with colors based on the Air Quality Index(EPA, 2014) to indicate the severity of particulate pollution. Blue arrows showed wind directions measured from nearby monitoring stations. The timeline on the bottom of the map represented the concentration of smell reports per day with grayscale squares. Users could view data for a date by selecting the corresponding square.
Smell Pittsburgh sent post hoc and predictive event notifications to encourage participation. When there were a sufficient number of poor odor reports during the previous hour, the system sent a post hoc event notification: “Many residents are reporting poor odors in Pittsburgh. Were you affected by this smell event? Be sure to submit a smell report!” The intention of sending this notification was to encourage users to check and report if they had similar odor experiences. Second, to predict the occurrence of abnormal odors in the future, we applied machine learning to model the relationships between aggregated smell reports and air quality measurements from the past. We defined the timely and geographically aggregated reports as smell events, which indicated that there would be many high-rating smell reports within the next 8 hours. Each day, whenever the model predicted a smell event, the system sent a predictive notification: “Local weather and pollution data indicates there may be a Pittsburgh smell event in the next few hours. Keep a nose out and report smells you notice.” The goal of making the prediction was to support users in planning daily activities and encourage community members to pay attention to the air quality. To keep the prediction system updated, we computed a new machine learning model every Sunday night based on the data collected previously.
The evaluation shows that using smell experiences is practical for revealing urban air quality concerns and empowering communities to advocate for a sustainable environment. Our goal is to evaluate the impact of deploying interactive systems on communities rather than the usability (e.g., the time of completing tasks). We believe that it is more beneficial to ask “Is the system influential?” instead of “Is the system useful?” We now discuss three studies: (i) system usage information of smell reports and interaction events, (ii) a dataset for predicting and interpreting smell event patterns, and (iii) a survey of attitude changes and motivation factors.
In this study, we show the usage patterns on mobile devices by parsing server logs and Google Analytics events. From our initial testing with the community on September 2016 to the end of September 2018, we had about 5,790 and 1,070 installations (rounded to the nearest 10) of Smell Pittsburgh on iOS and Android devices respectively in the United States. We excluded data generated during the system stability testing phase in September and October 2016. From our soft launch in November 2016 to the end of September 2018 over 23 months, there were 3,917 unique anonymous users (estimated by Google Analytics) in Pittsburgh. Our users contributed 17,280 smell reports, 582,108 alphanumeric characters in the submitted text fields, and 163,609 events of interacting with the visualization (e.g., clicking on icons on the map). Among all reports, 76% of them had ratings greater than two.
To investigate the distribution of smell reports and interaction events among our users, we divided all users into four types: enthusiasts, explorers, contributors, and observers (Table 1). Contributors submitted reports but did not interact with the visualization. Observers interacted with the visualization but did not submit reports. Enthusiasts submitted more than 6 reports and interacted with the visualization more than 31 times. Thresholds 6 and 31 were the median of the number of submitted reports and interaction events for all users respectively, plus their interquartile ranges. Explorers submitted 1 to 6 reports or interacted with the visualization 1 to 31 times. We were interested in four variables with different distributions among user groups, which represented their characteristics (Table 2
). First, for each user, we computed the number of submitted reports and interaction events. Then, for each smell report, we calculated the number of alphanumeric characters in the submitted text fields. Finally, for interaction events that involved viewing previous data, we computed the time difference between hit timestamps and data timestamps. These two timestamps represented when users interacted with the system and when the data were archived respectively. Distributions of all variables differed from normal distributions (normality test p<.001 and="" skewed="" toward="" were="" zero.="">
The user group study showed highly skewed user contributions. About 32% of the users submitted only one report. About 48% and 81% of the users submitted less than three and ten reports respectively, which aligned with the typical pattern in citizen science projects that many volunteers participated for only a few times (Sauermann and Franzoni, 2015). Moreover, these three user groups differed regarding the type and amount of data they contributed. Table 1 shows that enthusiasts, corresponding to less than 10% of the users, contributed about half of the data overall. Table 2 indicates the characteristics of these groups. Enthusiasts tended to contribute more smell reports, the number of alphanumeric characters of reports, and interaction events. Observers tended to browse data that were far away from the interaction time. Further investigation of the enthusiast group revealed a moderate positive association (Pearson correlation coefficient r=.50, n=375, p<.001) between the number of submitted reports and the number of user interaction events.
To identify critical topics in citizen-contributed smell reports, we analyzed the frequency of words (unigram) and phrases (bigram) in the text fields. We used python NLTK package (Bird et al., 2009) to remove stop words and group similar words with different forms (lemmatization). Figure 2 shows that high-frequency words and phrases mostly described industrial pollution odors and symptoms of air pollution exposure, especially hydrogen sulfide that has rotten egg smell and can cause a headache, dizziness, eye irritation, sore throat, cough, nausea, and shortness of breath (Lindenmann et al., 2010; Reiffenstein et al., 1992; Guidotti, 2010; Council et al., 2009). This finding inspired us to examine how hydrogen sulfide affected urban odors in the next study.
In this study, we show that human-reported smell data, despite noisy, can still enable prediction and contribute scientific knowledge of interpretable air pollution patterns. We first constructed and introduced a dataset with air quality sensor readings and smell reports from October 31 in 2016 to September 27 in 2018 (sme, 2018a)
. The sensor data were recorded hourly by twelve government-operated monitoring stations at different locations in Pittsburgh, which included timestamps, particulate matters, sulfur dioxide, carbon monoxide, nitrogen oxides, ozone, hydrogen sulfide, and wind information (direction, speed, and standard deviation of direction). The smell report data contained timestamps, zip-codes, smell ratings, descriptions of sources, symptoms, and comments. For privacy preservation, we dropped the GPS location (latitude and longitude) of the smell reports and used zip-codes instead.
We framed the smell event prediction as a supervised learning task to approximate the functionthat maps a predictor matrix
to a response vector. The predictor matrix and the response vector represented air quality data and smell events respectively. To build , we re-sampled air quality data over the previous hour at the beginning of each hour. For example, at 3 pm, we took the mean value of sensor readings between 2 pm and 3 pm to construct a new sample. Wind directions were further decomposed into cosine and sine components. To equalize the effect of predictors, we normalized each column of matrix
to zero mean and unit variance. Missing values were replaced with the corresponding mean values.
To build that represents smell events, we aggregated high-rating smell reports over the future 8 hours at the beginning of each hour. We specifically chose the geographic regions that have sufficient amount of data during aggregation (Figure 3
). For instance, at 3 pm, we took the sum of smell ratings with values higher than two between 3 pm and 11 pm to obtain a confidence score, which represented agreements of how likely a smell event occurred. The scores were further divided into positive and negative classes (with or without smell events) by using threshold 40. In this way, we simplify the task to a binary classification problem, with 64 predictor variables (columns of) and 16,766 samples (rows of and ). The distribution of classes was highly imbalanced (only 8% positive). Besides classification, we also applied a regression approach to predict the confidence scores directly without thresholding initially. Then the predicted scores were thresholded post hoc with value 40 to produce positive and negative classes, which enabled us to compare the performance of these two approaches.
When performing classification and regression, we added 3-hour lagged predictor variables, days of the week, hours of the day, and days of the month into the original predictor variable, which expanded its length from 64 to 195. The lagged duration was chosen during model selection. We implemented two models, Random Forest(Breiman, 2001) and Extremely Randomized Trees (Geurts et al., 2006), by using python scikit-learn package (Pedregosa et al., 2011). These algorithms build a collection of decision trees using the CART algorithm (Leo Breiman, 1984), where the leaves represent the responses and the branches represent the logical conjunction of predictors in . There were three tunable model parameters: the number of trees in the model, the number of features to select randomly for splitting a tree node, and the minimum number of samples required to split a tree node. For simplicity, we fixed the number of trees (1,000 for classification and 200 for regression) and chose other parameters during model selection.
To evaluate model performance, we defined and computed true positives (TP), false positives (FP), and false negatives (FN) to obtain precision, recall, and F-score(Powers, 2011) (Figure 5). We first merged consecutive positive samples to compute the starting and ending time of smell events. Then, if a predicted event overlapped with a ground truth event, we counted this event as a TP. Otherwise, we counted a non-overlapped predicted event as an FP. For ground truth events that had no overlapping predicted events, we counted them as FN. When computing these metrics, we considered only daytime events because residents rarely submitted reports during nighttime (Figure 4). We defined daytime from 5 am to 7 pm. Since the model predicted if a smell event would occur in the next 8 hours, we only evaluated the prediction generated from 5 am to 11 am.
We chose model parameters by using time-series cross-validation (Kohavi, 1995; Arlot and Celisse, 2010), where the entire dataset was partitioned and rolled into several pairs of training and testing subsets for evaluation. Because our predictors and responses were all time-dependent, we used previous samples to train the models and evaluated them on future data. We first divided all samples into folds, with each fold approximately representing a week. Then, starting from fold 49, we took the previous 48 folds as training data (about 8,000 samples) and the current fold as testing data (about 168 samples). This procedure was iterated for the rest of the folds, which reflected the setting of the deployed system where a new model was trained on every Sunday night by using data from the previous 48 weeks. Table 3
reports the evaluation metrics after cross-validating the models 100 times with various random seeds.
While these models enabled us to predict future smell events, they were typically considered as black box models and not suitable for interpreting patterns. Although these models provided feature importances, interpreting these weights could be problematic because several predictors in the dataset were highly correlated, which might appear less significant than other uncorrelated counterparts. Inspired by several previous works related to extracting knowledge from data (Shaikhina et al., 2017; Gass et al., 2014; Caruana et al., 2006)
, we utilized a white box model, Decision Tree, to explain a representative subset of predictors and samples, which were selected by applying feature selection(Guyon and Elisseeff, 2003)
and cluster analysis. One can view this process as performing model compression to distill the knowledge in a large black box model into a compact model that is explainable to human(Bucilua et al., 2006; Hinton et al., 2015).
During data interpretation, we only considered the classification approach due to better performance. First, we used domain knowledge to manually select features. As there were many highly correlated features, selecting a subset of them arbitrarily for extracting patterns was impractical. The knowledge obtained from informal community meetings and the result discovered in the text analysis (Figure 2) suggested that hydrogen sulfide might be the primary source of smell events. This finding inspired us to chose hydrogen sulfide, wind direction, wind speed, and standard deviation of wind direction from all of the other available predictors. The current and up to 2-hour lagged predictor variables were all included. Also, we added interaction terms of all predictors, such as hydrogen sulfide multiplied by the sine component of wind direction. This manual feature selection procedure produced 781 features.
Then, we used DBSCAN (Ester et al., 1996) to cluster positive samples and to choose a representative subset. The distance matrix for clustering was derived from a Random Forest fitted on the manually selected features. For each sample pair, we counted the number of times that the pair appeared in the same leaf for all trees in the model. The numbers were treated as the similarity of sample pairs and scaled to the range between 0 and 1. We converted the similarity into distance by using . This procedure identified a cluster with about 25% of positive samples from 50% of the smell events.
Finally, we used recursive feature elimination (Guyon et al., 2002) to remove 50 features that had the smallest weights iteratively, which resulted in 30 most important features. These feature importance weights were computed by fitting a Random Forest. We trained a Decision Tree using the CART algorithm (Leo Breiman, 1984) to interpret the cluster and the selected 30 features. Parameters for data interpretation (DBSCAN, Random Forest, and Decision Tree) were selected by using cross-validation. Table 3 reports the evaluation metrics after cross-validating the model for 100 times with random seeds. The result showed that the model was capable of explaining the underlying pattern of about 50% of the smell events, which was a joint effect of wind information and hydrogen sulfide readings (Figure 6).
In this study, we show that the system can motivate active community members to contribute data and increase their self-efficacy, beliefs about how well an individual can achieve desired effects through actions (Bandura, 1977). We recruited adult participants via snowball sampling (Biernacki and Waldorf, 1981). We administered and delivered an anonymous online survey via email to community advocacy groups and asked them to distribute the survey to others. Paper surveys were also provided. All responses were kept confidential, and there was no compensation. We received 29 responses in total over one month from March 20th to April 20th, 2018. Four responses were excluded due to incomplete questions or no experiences in interacting with the system, which gave 25 valid survey responses. There were 8 males, 16 females, and 1 person with undisclosed gender information. All but one participant had a Bachelor’s degree at minimum. The demographics of the sample population (Table 4) were not typical for the region. The survey had three sections: (1) Self-Efficacy Changes, (2) Motivation Factors, (3) System Usage Information.
For Self-Efficacy Changes, we measured changes to user confidence mitigating air quality problems. This section was framed as a retrospective pre-post self-assessment. The items were divided between pre-assessment, “BEFORE you knew about or used Smell Pittsburgh,” and post-assessment, “AFTER you knew about or used Smell Pittsburgh.” For both assessments, we used a scale developed by the Cornell Lab of Ornithology (Porticella et al., 2017b; DEVISE, 2010). The scale was customized for air quality to suit our purpose. The scale consisted of eight Likert-type items (from 1 “Strongly Disagree” to 5 “Strongly Agree”).
|Other (the open-response text field)||9||36%|
|At least once per month||7||28%|
|At least once per week||4||16%|
|At least once per day||3||12%|
|At least once per year||2||8%|
|I submitted smell reports.||22||88%|
|I checked other people’s smell reports on the map visualization.||22||88%|
|I opened Smell Pittsburgh when I noticed unusual smell.||22||88%|
|I discussed Smell Pittsburgh with other people.||21||84%|
|I provided my contact information when submitting smell reports.||14||56%|
|I paid attention to smell event alert notifications provided by Smell Pittsburgh.||13||52%|
|I shared Smell Pittsburgh publicly online (e.g. email, social media, news blog).||13||52%|
|I clicked on the playback button to view the animation of smell reports.||9||36%|
|I took screenshots of Smell Pittsburgh.||9||36%|
|I mentioned or presented Smell Pittsburgh to regulators.||6||24%|
|I downloaded smell reports data from the Smell Pittsburgh website.||4||16%|
The Motivation Factors section was based on a scale developed by the Cornell Lab of Ornithology (Porticella et al., 2017a; DEVISE, 2010) with 14 Likert-type items (from 1 “Strongly Disagree” to 5 “Strongly Agree”). The scale was customized for air quality and measured both internal (7 items) and external motivations (7 items). Examples of internal motivations included enjoyment during participation and the desire to improve air quality. Examples of external motivations included the attempt to gain rewards and to avoid negative consequences if not taking actions. A text field with question “Are there other reasons that you use Smell Pittsburgh?” was provided for open responses.
In the System Usage Information section, we collected individual experiences with Smell Pittsburgh. We documented participation level through a multiple-choice and multiple-response question, “How did you use Smell Pittsburgh?” as shown in Figure 7 (right). This question allowed participants to select from a list of 11 activities, which include submitting reports, interacting with the system, sharing experiences, and disseminating data (Table 6). We identified the frequency of system usage through a multiple-choice question, “How often do you use Smell Pittsburgh?” as shown in Table 5. Text fields were provided for both of the above two questions.
At the end of the survey, we asked an open-response question “Do you have any other comments, questions, or concerns?” Our analysis is presented below along with each related question and selected quotes. Bold emphases in the quotes were added by researchers to highlight key user sentiments.
For Self-Efficacy, we averaged the scale items to produce total self-efficacy pre score (Mdn=3.50) and post score (Mdn=4.13) for each participant (Figure 7
, left). A two-tailed Wilcoxon Signed-Ranks test (a nonparametric version of a paired t-test) indicated a statistically significant difference (W=13.5, Z=-3.79, p<.001 during="" finding="" in="" increases="" indicated="" participation.="" self-efficacy="" that="" there="" this="" were="">
For Motivation Factors, we averaged the internal (Mdn=4.29) and external (Mdn=3.14) motivation scores for each participant (Figure 7, center). A two-tailed Wilcoxon Signed-Ranks test indicated a statistically significant difference (W=0, Z=-4.29, p<.001). This result suggested that internal factors were primary motivations for our participants rather than external factors. Open-ended answers showed that nine participants (36%) mentioned that the system enabled them to contribute data-driven evidence efficiently and intuitively.
“I used to try to use the phone to call in complaints, but that was highly unsatisfactory. I never knew if my complaints were even registered. With Smell Pittsburgh, I feel that I’m contributing to taking data, as well as to complaining when it’s awful. […]”
“It’s seems to be the most effective way to report wood burning that can fill my neighborhood with the smoke and emissions from wood burning.”
“The Smell app quantifies observations in real time. Researchers can use this qualitative information along quantitative data in real time. Added benefit is to have [the health department] receive this information in real time without having to make a phone call or send separate email. I have confidence that the recording of Smell app data is quantified more accurately than [the health department]’s.”
“It is an evidence based way for a citizen to register what is going on with the air where I live and work.”
“I believe in science and data and think this can help build a case. […]”
Also, four participants (16%) indicated the benefit to validate personal experiences based on the data provided by others.
“I used to (and sometimes still do) call reports in to [the health department]. I love how the map displays after I post a smell report. Wow! I’m not alone!”
“It validates my pollution experiences because others are reporting similar experiences.”
“I like using it for a similar reason that I like checking the weather. It helps me understand my environment and confirms my sense of what I’m seeing.”
We also found that altruism, the concern about the welfare of others, was another motivation. Six participants (24%) mentioned the desire to address climate changes, activate regulators, raise awareness of others, expand air quality knowledge, influence policy-making, and build a sense of community.
“Because climate change is one of our largest challenges, […] Also, the ACHD isn’t as active as they should be, and needs a nudge.”
“I use [Smell Pittsburgh] to demonstrate to others how they can raise their own awareness. I’ve also pointed out to others that many who have grown up in this area of Western PA have grown up with so much pollution, to them air pollution has become normalized and many do not even smell the pollution any more. This is extremely dangerous and disturbing.”
“I want to help expand the knowledge and education of air quality in Pittsburgh and believe the visuals Smell Pittsburgh provides is the best way to do that.”
“I believe in the power of crowd-sourced data to influence policy decisions. I also believe that the air quality activism community will find more willing participants if there is a very easy way for non-activists to help support clean air, and the app provides that mechanism. It is basically a very easy onramp for potential new activists. The app also acts as a way for non-activists to see that they are not alone in their concerns about stinky air, which I believe was a major problem for building momentum in the air quality community prior to the app’s existence.”
For System Usage Information, we reported the counts for system usage frequency questions (Table 5). The result showed that our users had a wide variety of system usage frequency. Open-responses indicated that instead of using the system regularly, eight participants (32%) only submitted reports whenever they experienced poor odors. To quantify participation levels, we counted the number of selected choices for each participant, as shown in Figure 7 (right). We found that our participation levels were normally distributed. In the open-response text field for this question, two participants (8%) mentioned using personal resources to help promote the system.
“I ran a Google Adwords campaign to get people to install Smell Pittsburgh. It turns out that about $6 of ad spending will induce someone to install the app.”
“I take and share so many screenshots! Those are awesome. […] I also made two large posters of the app screen– one on a very bad day, and one on a very good day. I bring them around to public meetings and try to get county officials to look at them.”
In the open-ended question to freely provide comments and concerns, two participants (8%) were frustrated about the lack of responses from regulators and unclear values of using the data to take action.
“After using this app for over a year, and making many dozens of reports, I haven’t once heard from the [health department]. That is disappointing, and makes me wonder, why bother? […] Collecting this data is clever, but towards what end? I sometimes don’t see the point in continuing to report.”
“It wasn’t clear when using the app that my submission was counted […]. I want to be able to see directly that my smell reports are going somewhere and being used for something. […]”
Also, five (20%) participants suggested augmenting the current system with new features and offering this mobile computing tool to more cities. Such features involved reporting smell retrospectively and viewing personal submission records.
“I get around mostly by bike
, so it is difficult to report smells the same moment I smell them. I wish I could report smells in a different location than where I am so that Icould report the smell once I reach my destination.”
“It would be nice to be able to add a retroactive report. We often get the strong sulfur smells in Forest Hills in the middle of the night […] but I strongly prefer to not have to log in to my phone at 3 am to log the report as it makes it harder to get back to sleep.”
“This app should let me see/download all of my data: how many times I reported smells, what my symptoms and comments were and how many times the [health department] didn’t respond […]”
We have shown that the smell data gathered through the system is practical in identifying local air pollution patterns. We released the system in September 2016. During 2017, the system collected 8,720 smell reports, which is 10-fold more than the 796 complaints collected by the health department regulators in 2016. All smell reports in our system had location data, while the location information was missing from 45% of the regulator-collected complaints. Although there is a significant increase in data quantity (as described in the system usage study), researchers may criticize the reliability of the citizen-contributed data, since lay experiences may be subjective, inconsistent, and prone to noise. Despite these doubts, in the smell dataset study, we have applied machine learning to demonstrate the predictive and explainable power of these data. It is viable to forecast upcoming smell events based on previous observations. We also extracted connections between predictors and responses that reveal a dominant local air pollution pattern, which is a joint effect of hydrogen sulfide and wind information. This pattern could serve as hypotheses for future epidemiological studies. Since users tended not to report when there is no pollution odor, we recommend aggregating smell ratings by time to produce samples with no smell events. According to the experiments of different models, we suggest using the classification approach by thresholding smell ratings instead of the regression approach. In reality, the effect of 10 and 100 smell reports with high ratings may be the same for a local geographical region. It is highly likely that the regression function tried to fit the noise after a certain threshold that indicates the presence of a smell event.
We have also shown that the transparency of smell data empowered communities to advocate for better air quality. The findings in the survey study suggested that the system lowered the barrier for communities to contribute and communicate data-driven evidence. Although the small sample size limited the survey study, the result showed increases in self-efficacy after using the system. Several participants were even willing to use their resources to encourage others to install the system and engage in odor reporting. Moreover, in July 2018, activists attended the Board of Health meeting with the ACHD (Allegheny County Health Department) and presented a printed 230-foot-long scroll of more than 11,000 complaints submitted through the system. These reports allowed community members to ground their personal experiences with concrete and convincing data. The presented smell scroll demonstrated strong evidence about the impact of air pollution on the living quality of citizens, which forced regulators to respond to the air quality problem publicly. The deputy director of environmental health mentioned that ACHD would enact rigorous rules for coke plants: “Every aspect of the activity and operation of these coke plants will have a more stringent standard applied (Clift, 2018; Hopey, 2018).” In this case, Smell Pittsburgh rebalanced the power relationships between communities and regulators.
We have explained the design, deployment, and evaluation of a mobile smell reporting system for Pittsburgh communities to collect and visualize odor events. However, our survey study only targeted community activists, which led to a relatively small sample size. The commitment of these users may be driven by internal motivations, such as altruism, instead of the system. Also, due to system limitations in tracking user behaviors and the sparsity of air pollution events, we leave the analysis of whether push notifications encourage user engagement to future work. Additionally, our community members might be unique in their characteristics, such as the awareness of the air quality problem, the tenacity of advocacy, and the power relationships with other stakeholders. Involving citizens to address urban air pollution collaboratively is a wicked problem by its nature, so there is no guarantee that our success and effectiveness can be replicated in a different context. It is possible that interactive systems like Smell Pittsburgh can only be practical for communities with specific characteristics, such as high awareness. Future research is needed to study the impact of deploying this system in other cities that have similar or distinct community characteristics compared to Pittsburgh. It can also be beneficial to explore ways to connect citizens and regulators, such as visualizing smell reports by voting districts, providing more background information with demographics and industry data, and sending push notifications regarding health agency public meetings.
Furthermore, from a machine-learning standpoint, community-powered projects such as Smell Pittsburgh often face two challenges that compromise model performances: data sparsity and label unreliability. Recent research has shown that deep neural networks can predict events effectively when equipped with a significant amount of training data (LeCun et al., 2015). However, the number of collected smell reports are far away from such level due to the limited size of the community and active users. The participated 3,917 users out of the 300,000 residents (1.3%) is not sufficient to cover the entire Pittsburgh area. Additionally, in our case, air pollution incidents can only be captured at the moment because our communities lack resources to deploy reliable air quality monitoring sensors. It is impractical to annotate these incidents off-line such as in Galaxy Zoo (Raddick et al., 2013)
. While adopting transfer learning could take advantage of existing large-scale datasets from different domains to boost our performance(Pan and Yang, 2010), data sparsity is a nearly inevitable issue that must be taken into consideration for many community-powered projects. Another issue is the label unreliability. There is no real “ground truth” air pollution data in our case. The smell events defined in this research were based on the consensus from a group of users. As a consequence, the quality of the labels for the prediction and interpretation task could be influenced by confirmation bias, where people tend to search for information that confirms their prior beliefs. Such type of systematic error may be difficult to avoid, especially for community-powered projects. Future work to address these two challenges involves adding more predictors (e.g., weather forecasts, air quality index) and using generalizable data interpretation techniques that can explain any predictive model to identify more patterns (Ribeiro et al., 2016).
This paper explores the design and impact of a mobile smell reporting system, Smell Pittsburgh, to empower communities in advocating for better air quality. The system enables citizens to submit and visualize odor experiences without the assistance from professionals. The visualization presents the context of air quality concerns from multiple perspectives as evidence. In our evaluation, we studied the distribution of smell reports and interaction events among different types of users. We also constructed a smell event dataset to study the value of these citizen-contributed data. By adopting machine learning, we developed a model to predict smell events and send push notifications accordingly. We also trained an explainable model to reveal connections between air quality sensor readings and smell events. Using a survey, we studied motivation factors for submitting smell reports and measured user attitude changes after using the system. Finally, we discussed limitations and future directions: deploying the system in multiple cities and using advanced techniques for pattern recognition. We envision that this research can inspire engineers, designers, and researchers to develop systems that support community advocacy and empowerment.
Gene selection for cancer classification using support vector machines.Machine learning 46, 1-3 (2002), 389–422.
inAir: a longitudinal study of indoor air quality measurements and visualizations. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2745–2754.
Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2(IJCAI’95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137–1143. http://dl.acm.org/citation.cfm?id=1643031.1643047
Why should i trust you?: Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135–1144.