Log In Sign Up

Smell Pittsburgh: Community-Empowered Mobile Smell Reporting System

by   Yen-Chia Hsu, et al.
Penn State University
Carnegie Mellon University

Urban air pollution has been linked to various human health considerations, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently concentrated. All smell report data are publicly accessible online. These reports are also sent to the local health department and visualized on a map along with air quality data from monitoring stations. This visualization provides a comprehensive overview of the local pollution landscape. Additionally, with these reports and air quality data, we developed a model to predict upcoming smell events and send push notifications to inform communities. Our evaluation of this system demonstrates that engaging residents in documenting their experiences with pollution odors can help identify local air pollution patterns, and can empower communities to advocate for better air quality.


Community-Empowered Mobile Smell Reporting System

Urban air pollution has been linked to various human health consideratio...

Smell Pittsburgh: Engaging Community Citizen Science for Air Quality

Urban air pollution has been linked to various human health concerns, in...

Community-Empowered Air Quality Monitoring System

Developing information technology to democratize scientific knowledge an...

CleanAirNowKC: Building Community Power by Improving Data Accessibility

As cities continue to grow globally, air pollution is increasing at an a...

Using Machine Learning to Predict Air Quality Index in New Delhi

Air quality has a significant impact on human health. Degradation in air...

CommunityClick: Capturing and Reporting Community Feedback from Town Halls to Improve Inclusivity

Local governments still depend on traditional town halls for community c...

Code Repositories


Predicting and Interpreting Smell Data Obtained from Smell Pittsburgh

view repo

1. Introduction

Figure 1. The user interface of Smell Pittsburgh. The left image shows the submission console for describing smell characteristics, explaining symptoms, and providing notes for the local health department. The right image shows the visualization of smell reports, sensors, and wind directions.

Air pollution has been associated with adverse impacts on human health, including respiratory and cardiovascular diseases (Kampa and Castanas, 2008; Pope III and Dockery, 2006; Dockery et al., 1993; Prüss-Üstün and Neira, 2016; WHO, 2016). Addressing air pollution often involves negotiations between corporations and regulators, who hold power to improve air quality. However, the communities, who are directly affected by the pollution, are rarely influential in policy-making. Their voices typically fail to persuade decision-makers because collecting and presenting reliable evidence to support their arguments is resource-intensive. Forming such evidence requires collecting and analyzing multiple sources of data over a large geographic area and an extended period. This task is challenging due to the requirements of financial resources, organizational networks, and access to technology. Due to the power imbalance and resource inequality, affected residents usually rely on experts in governmental agencies, academic institutions, or non-governmental organizations to analyze and track pollution sources.

A straightforward solution is to empower the affected communities directly. In this research, we demonstrate how citizen science can be used for communities to pool resources and efforts to gather evidence for advocacy. Data-driven evidence, especially when integrated with narratives, is essential for communities to make sense of local environmental issues and take action (Ottinger, 2017b). However, citizen-contributed data is often held in low regard because the information can be unreliable or include errors during data entry. Also, sufficient citizen participation and data transparency are required for the evidence to be influential. For instance, the city involved in this study, Pittsburgh, is one of the ten most polluted cities in the United States (American Lung Association, 2017). Currently, Pittsburgh citizens report air quality problems to the local health department via its phone line or website.

Nevertheless, the quality of the gathered data is doubtful. Citizens may not remember the exact time and location that pollution odors occurred. Asking citizens to submit complaints retrospectively is hard for capturing accurate details and prone to errors. Such errors can result in missing or incomplete data that can affect the outcome of statistical analysis to identify pollution sources (Devillers and Jeansoulin, 2006). Furthermore, the reporting process is not transparent and does not encourage citizens to contribute data. There is no real-time feedback or ways of sharing experiences to forge a sense of community. Without data that adequately represents the community, it is difficult to know if an air pollution problem is at a neighborhood or city-wide scale. This approach is inadequate for data collection and hinders the participation in bringing air quality issues to the attention of regulators and advocating for policy changes.

Because of these challenges, resident-reported smell data did not gain much attention as a critical tool for monitoring air pollution. However, literature has shown that the human olfactory can distinguish more than one trillion odors (Bushdid et al., 2014) and outperform sensitive measuring equipment in odor detection tasks (Shepherd, 2004). Although there have been discussions about the potential of using smell to indicate pollution events and support decision making (Ottinger, 2010; Obrist et al., 2014), no prior works collected long-term smell data at a city-wide scale and studied if these data are useful for air pollution monitoring and community advocacy.

We propose a system, Smell Pittsburgh (sme, 2018b), for citizens to report pollution odors to the local health department with accurate time and GPS location data via smartphones. The system visualizes odor complaints in real-time, which enables residents to confirm their experiences by viewing if others also share similar experiences. Additionally, we present a dataset of smell reports and air quality measurements from nearby monitoring stations over 21 months (sme, 2018a). We use the dataset to develop a model that predicts upcoming pollution odors and send push notifications to users. We also apply machine learning to identify relationships between smell reports and air quality measurements. Finally, we describe qualitative and quantitative studies for understanding changes in user engagement and motivation. To the best of our knowledge, Smell Pittsburgh is the first system of its kind that demonstrates the potential of collecting and using smell data to form evidence about air quality issues at a city-wide scale. Although stakeholders typically view odor experiences as subjective and noisy, our work shows that smell data is beneficial in identifying urban air pollution patterns and empowering communities to pursue a sustainable environment.

2. Related Work

This research is rooted in citizen science, which empowers amateurs and professionals to form partnerships and produce scientific knowledge (Science Communication Unit, 2013; Bonney et al., 2014; Bonney et al., 2016; McKinley et al., 2015; Eitzel et al., 2017). Historically, there exist both research and community-oriented strategies (Cooper and Lewenstein, 2016). Research-oriented citizen science aims to address large-scale research questions which are infeasible for scientists to tackle alone (Bonney et al., 2009a; Silvertown, 2009; Cohn, 2008; Dickinson et al., 2012; Dickinson and Bonney, 2012; Miller-Rushing et al., 2012; Bonney et al., 2009b; Cooper et al., 2007). Research questions under this strategy are often driven by professional scientists. Researchers applying this strategy study how scientists can encourage the public to participate in scientific research. In contrast, community-oriented citizen science aims to democratize science by equipping citizens with tools to directly target community concerns for advocacy (Irwin, 1995; Greaves and Lishman, 1980; Wilsdon et al., 2005; Stilgoe, 2009; Irwin, 2001; Paulos et al., 2008; Irwin, 2006; Stilgoe et al., 2014; Ottinger, 2016; Chari et al., 2017; Hsu, 2018; Corburn, 2005). Research questions under this strategy are often driven by community members, exploring how scientists can engage in social and ethical issues that are raised by citizens or communities. Our research focuses on the community-oriented approach. This approach is highly related to sustainable Human-Computer Interaction (DiSalvo et al., 2009; DiSalvo et al., 2010; Brynjarsdottir et al., 2012; Blevis, 2007; Mankoff et al., 2007; Dourish, 2010), which studies the intervention of information technology for increasing the awareness of sustainability, changing user behaviors, and influencing attitudes of affected communities. We seek to generate scientific knowledge from community data to support citizen-driven exploration, understanding, and dissemination of local air quality concerns.

2.1. Community Data in Citizen Science

Modern technology allows communities to collect data that can contextualize and express their concerns. There are typically two types of community data, which are generated from either sensors or proactive human reports. Each type of data provides a small fragment of evidence. When it comes to resolving and revealing community concerns, human-reported data can show how experiences of residents are affected by local issues, but it is typically noisy, ambiguous, and hard to quantify at a consistent scale. Sensing data can complement human-reported data by providing temporally dense and reliable measurements of environmental phenomena but fails to explain how these phenomena affect communities. Without integrating both types of data, it is difficult to understand the context of local concerns and produce convincing evidence.

2.1.1. Human-Reported Data

Human-reported data includes observations contributed by users. Modern computational tools can collect volunteered geographic information (Haklay, 2013) and aggregate them to produce scientific knowledge. However, most prior works focused on collecting general information of particular interest, rather than data of a particular type of human sense, such as odor. Ushahidi gathers crisis information via text messages or its website to provide timely transparent information to a broader audience (Okolloh, 2009). Creek Watch is a monitoring system which enabled citizens to report water flow and trash data in creeks (Kim et al., 2011). Sensr is a tool for creating environmental data collection and management applications on mobile devices without programming skills (Kim et al., 2013a, 2015). Encyclopedia of Life is a platform for curating species information contributed by professionals and non-expert volunteers (Rotman et al., 2012). eBird is a crowdsourcing platform to engage birdwatchers, scientists, and policy-makers to collect and analyze bird data collaboratively (Sullivan et al., 2014, 2009). Tiramisu was a transit information system for collecting GPS location data and problem reports from bus commuters (Zimmerman et al., 2011). One of the few examples focusing on information of a specific modality is NoiseTube, a mobile application that empowered citizens to report noise via their mobile phones and mapped urban noise pollution on a geographical heatmap (Maisonneuve et al., 2009; D’Hondt et al., 2013). The tool could be utilized for not only understanding the context of urban noise pollution but also measuring short-term or long-term personal exposure.

2.1.2. Sensing Data

Sensing data involves environmental measurements quantified with sensing devices or systems, which enable citizens to monitor their surroundings with minimal to no assistance from experts. However, while many prior works used sensors to monitor air pollution, none of them complemented the sensing data with human-reported data. MyPart is a low-cost and calibrated wearable sensor for measuring and visualizing airborne particles (Tian et al., 2016). Speck is an indoor air quality sensor for measuring and visualizing fine particulate matter (Taylor and Nourbakhsh, 2015; Taylor, 2016). Kim et al. implemented an indoor air quality monitoring system to gather air quality data from commercial sensors (Kim et al., 2013b). Kuznetsov et al. developed multiple air pollution monitoring systems which involved low-cost air quality sensors and a map-based visualization (Kuznetsov et al., 2011; Kuznetsov et al., 2013). Insights from these works showed that sensing data, especially accompanied by visualizations, could provide context and evidence that might raise awareness and engage local communities to participate in political activism. But none of these work asked users to report odors, and thus can not directly capture how air pollution affects the living quality of community members.

2.2. Machine Learning for Citizen Science

Citizen science data are typically high-dimensional, noisy, potentially correlated, and spatially or temporally sparse. The collected data may also suffer from many types of bias and error that sometimes can even be unavoidable (Budde et al., 2017; Bird et al., 2014). Making sense of such noisy data has been a significant concern in citizen science (Ottinger, 2017a; Newman et al., 2012), especially for untrained contributors (Cohn, 2008; Ottinger, 2010; Bonney et al., 2014; Den Broeder et al., 2016). To assist community members in identifying evidence from large datasets efficiently, prior projects used machine learning algorithms to predict future events or interpret collected data (Bishop, 2006; Mitchell, 1997; Hastie et al., 2009; James et al., 2013; Jordan and Mitchell, 2015; Bird et al., 2014; Bellinger et al., 2017).

2.2.1. Prediction

Prediction techniques aim to forecast the future accurately based on previous observations. Zheng et al. developed a framework to predict air quality readings of a monitoring station over the next 48 hours based on meteorological data, weather forecasts, and sensor readings from other nearby monitoring stations (Zheng et al., 2015). Azid et al.

used principal component analysis and an artificial neural network to identify pollution sources and predict air pollution 

(Azid et al., 2014). Donnelly et al.

combined kernel regression and multiple linear regression to forecast the concentrations of nitrogen dioxide over the next 24 and 48 hours 

(Donnelly et al., 2015). Hsieh et al. utilized a graphical model to predict the air quality of a given location grid based on data from sparse monitoring stations (Hsieh et al., 2015)

. These studies applied prediction techniques to help citizens plan daily activities and also inform regulators in controlling air pollution sources. Most of these studies focus on forecasting or interpolating sensing data. To the best of our knowledge, none of them considered human-reported data in their predictive models.

2.2.2. Interpretation

Interpretation techniques aim to extract knowledge from the collected data. This knowledge can help to discover potential interrelationships between predictors and responses, which is known to be essential in analyzing the impacts of environmental issues in the long-term (Brown, 1992; Den Broeder et al., 2016). Gass et al.

investigated the joint effects of outdoor air pollutants on emergency department visits for pediatric asthma by applying Decision Tree learning 

(Gass et al., 2014). The authors suggested using Decision Tree learning to hypothesize about potential joint effects of predictors for further investigation. Stingone et al. trained decision trees to identify possible interaction patterns between air pollutants and math test scores of kindergarten children (Stingone et al., 2017). Hochachka et al. fused traditional statistical techniques with boosted regression trees to extract species distribution patterns from the data collected via the eBird platform (Hochachka et al., 2012). These previous studies utilized domain knowledge to fit machine learning models with high explanatory powers on filtered citizen science data. In this paper, we also used Decision Tree to explore hidden interrelationships in the data. This extracted knowledge can reveal local concerns and serve as convincing evidence for communities in taking action.

3. Design Principles and Challenges

Our goals are (i) to develop a system that can lower the barriers to contribute smell data and (ii) to make sure the data is useful in studying the impact of urban air pollution and advocating for better air quality. Each goal yields a set of design challenges.

3.1. Collecting Smell Data at Scale with Ease

Outside the scope of citizen science, a few works have collected human-reported smell data in various manners. However, these manners are not suitable for our projects. For example, prior works have applied a smell-walking approach to record and map the landscape of smell experiences by recruiting participants to travel in cities (Henshaw, 2013; Quercia et al., 2015, 2016). This process is labor intensive and hard for long-term air quality monitoring. Hsu et al. has also demonstrated that resident-reported smell reports, collected via Google Forms, can form evidence about air pollution when combined with data from cameras and air quality sensors (Hsu et al., 2017; Hsu et al., 2016). While Google Form is usable for a small-size study, it would not be effective in collecting smell reports on a city-wide scale with more than 300,000 affected residents over several years. Therefore, we developed a mobile system to records GPS locations and timestamps automatically. The system is specialized for gathering smell data at a broad temporal and geographical scale.

3.2. What is Useful Data? A Wicked Problem

There is a lack of research in understanding the potential of using smell as an indicator of urban air pollution. Moreover, we recognized that there are various methods of collecting, presenting, and using the data. It is not feasible to explore and evaluate all possible methods without deploying the system in the real-world context. These challenges form a wicked problem (Conklin, 2005; Rittel and Webber, 1973), which refers to problems that have no precise definition, cannot be fully observed at the beginning, are unique and depend on context, have no opportunities for trial and error, and have no optimal or “right” solutions. In response to this challenge, our design principle is inspired by how architects and urban designers address wicked problems. When approaching a community or city-scale problem, architects and urban planners first explore problem attributes (as defined in (Pena and Parshall, 2012)) and then design specific solutions based on prior empirical experiences. We made use of an existing network of community advocacy groups, including ACCAN (ACCAN, 2018), GASP (GASP, 2018), Clean Air Council (CAC, 2018), PennFuture (PennFuture, 2018), and PennEnvironment (PennEnvironment, 2015). These groups were pivotal in shaping the design of Smell Pittsburgh and providing insights into how to engage the broader Pittsburgh community.

Moreover, to sustain participation, we visualized smell report data on a map and also engage residents through push notifications. To add more weight to citizen-contributed pollution odor report, we engineered the application to send smell reports directly to the Allegheny County Health Department (ACHD). This strategy ensured that the local health department could access high-resolution citizen-generated pollution data to ascertain better and address potential pollution sources in our region. We met and worked with staff in ACHD to determine how they hoped to utilize smell report data and adjusted elements of the application to better suit their needs, such as sending data directly to their database and using these data as evidence of air pollution. Based on their feedback, the system submitted all smell reports to the health department, regardless of the smell rating. This approach provided ACHD with a more comprehensive picture of the local pollution landscape.

In summary, when developing Smell Pittsburgh, we considered the system as an ongoing infrastructure to sustain communities over time (as mentioned in (Dantec and DiSalvo, 2013)), rather than a software product which solves a single well-defined problem. The system is designed to influence citizen participation and reveals community concerns simultaneously, which is different from observational studies that use existing data, such as correlating air quality keywords from social media contents with environmental sensor measurements (Ford et al., 2017).

4. System

Smell Pittsburgh is a system, distributed through iOS and Android devices, to collect smell reports and track urban pollution odors. We now describe two system features: (1) a mobile interface for submitting and visualizing odor complaints and (2) push notifications for predicting the potential presence of odor events.

4.1. Submitting and Visualizing Smell Reports

Users could report odor complaints via Smell Pittsburgh from their mobile devices via the submission console (Figure 1, left). To submit a report, users first selected a smell rating from 1 to 5, with one being “just fine” and five being “about as bad as it gets.” These ratings, their color, and the corresponding descriptions were designed by affected local community members to mimic the US EPA Air Quality Index (EPA, 2014). Also, users could fill out optional text fields where they could describe the smell (e.g., industrial, rotten egg), their symptoms related to the odor (e.g., headache, irritation), and their personal experiences. Once a user submitted a smell report, the system sent it to the local health department and anonymously archived it on our backend database. Users could decide if they were willing to provide their contact information to the health department through the system setting panel. Regardless of the setting, our database did not record the personal information.

The system visualized smell reports on a map that also depicted fine particulate matter and wind data from government-operated air quality monitoring stations (Figure 1

, right). All smell reports were anonymous, and their geographical locations were skewed to preserve privacy. When clicking or tapping on the playback button, the application animated 24 hours of data for the currently selected day, which served as convincing evidence of air quality concerns. Triangular icons indicated smell reports with colors that correspond to smell ratings. Users could click on a triangle to view details of the associated report. Circular icons showed government-operated air quality sensor readings with colors based on the Air Quality Index

(EPA, 2014) to indicate the severity of particulate pollution. Blue arrows showed wind directions measured from nearby monitoring stations. The timeline on the bottom of the map represented the concentration of smell reports per day with grayscale squares. Users could view data for a date by selecting the corresponding square.

4.2. Sending Push Notifications

Smell Pittsburgh sent post hoc and predictive event notifications to encourage participation. When there were a sufficient number of poor odor reports during the previous hour, the system sent a post hoc event notification: “Many residents are reporting poor odors in Pittsburgh. Were you affected by this smell event? Be sure to submit a smell report!” The intention of sending this notification was to encourage users to check and report if they had similar odor experiences. Second, to predict the occurrence of abnormal odors in the future, we applied machine learning to model the relationships between aggregated smell reports and air quality measurements from the past. We defined the timely and geographically aggregated reports as smell events, which indicated that there would be many high-rating smell reports within the next 8 hours. Each day, whenever the model predicted a smell event, the system sent a predictive notification: “Local weather and pollution data indicates there may be a Pittsburgh smell event in the next few hours. Keep a nose out and report smells you notice.” The goal of making the prediction was to support users in planning daily activities and encourage community members to pay attention to the air quality. To keep the prediction system updated, we computed a new machine learning model every Sunday night based on the data collected previously.

5. Evaluation

The evaluation shows that using smell experiences is practical for revealing urban air quality concerns and empowering communities to advocate for a sustainable environment. Our goal is to evaluate the impact of deploying interactive systems on communities rather than the usability (e.g., the time of completing tasks). We believe that it is more beneficial to ask “Is the system influential?” instead of “Is the system useful?” We now discuss three studies: (i) system usage information of smell reports and interaction events, (ii) a dataset for predicting and interpreting smell event patterns, and (iii) a survey of attitude changes and motivation factors.

5.1. System Usage Study

In this study, we show the usage patterns on mobile devices by parsing server logs and Google Analytics events. From our initial testing with the community on September 2016 to the end of September 2018, we had about 5,790 and 1,070 installations (rounded to the nearest 10) of Smell Pittsburgh on iOS and Android devices respectively in the United States. We excluded data generated during the system stability testing phase in September and October 2016. From our soft launch in November 2016 to the end of September 2018 over 23 months, there were 3,917 unique anonymous users (estimated by Google Analytics) in Pittsburgh. Our users contributed 17,280 smell reports, 582,108 alphanumeric characters in the submitted text fields, and 163,609 events of interacting with the visualization (e.g., clicking on icons on the map). Among all reports, 76% of them had ratings greater than two.

GA Events
Enthusiasts 9.6% 52.6% 64.4% 47.7%
Explorers 40.4% 38.9% 29.3% 30.5%
Contributors 11.8% 8.5% 6.2%
Observers 38.2% 21.8%
Size (N) 3,917 17,280 582,108 163,609
Table 1. Percentage (rounded to the first decimal place) and the total size of user groups. Abbreviation “GA” means Google Analytics. Characters mean the number of characters that user entered in the text fields of reports.
GA events
Enthusiasts 168 1820 11787 1018
Explorers 22 1013 1514 1334
Contributors 11 1014
Observers 99 2152
All 33 1419 1417 1230
Table 2. Statistics of user groups (median interquartile range), rounded to the nearest integer. Symbol means “for each.” Abbreviation “GA” means Google Analytics. Characters mean the number of characters that user entered in the text fields of reports. Hours difference means the number of hours between the hit and data timestamps.

To investigate the distribution of smell reports and interaction events among our users, we divided all users into four types: enthusiasts, explorers, contributors, and observers (Table 1). Contributors submitted reports but did not interact with the visualization. Observers interacted with the visualization but did not submit reports. Enthusiasts submitted more than 6 reports and interacted with the visualization more than 31 times. Thresholds 6 and 31 were the median of the number of submitted reports and interaction events for all users respectively, plus their interquartile ranges. Explorers submitted 1 to 6 reports or interacted with the visualization 1 to 31 times. We were interested in four variables with different distributions among user groups, which represented their characteristics (Table 2

). First, for each user, we computed the number of submitted reports and interaction events. Then, for each smell report, we calculated the number of alphanumeric characters in the submitted text fields. Finally, for interaction events that involved viewing previous data, we computed the time difference between hit timestamps and data timestamps. These two timestamps represented when users interacted with the system and when the data were archived respectively. Distributions of all variables differed from normal distributions (normality test p<.001 and="" skewed="" toward="" were="" zero.="">

The user group study showed highly skewed user contributions. About 32% of the users submitted only one report. About 48% and 81% of the users submitted less than three and ten reports respectively, which aligned with the typical pattern in citizen science projects that many volunteers participated for only a few times (Sauermann and Franzoni, 2015). Moreover, these three user groups differed regarding the type and amount of data they contributed. Table 1 shows that enthusiasts, corresponding to less than 10% of the users, contributed about half of the data overall. Table 2 indicates the characteristics of these groups. Enthusiasts tended to contribute more smell reports, the number of alphanumeric characters of reports, and interaction events. Observers tended to browse data that were far away from the interaction time. Further investigation of the enthusiast group revealed a moderate positive association (Pearson correlation coefficient r=.50, n=375, p<.001) between the number of submitted reports and the number of user interaction events.

Figure 2. Text analysis of high frequency words (unigram) and phrases (bigram) in the text fields of all smell reports. Most of them describe industrial pollution odors and symptoms of air pollution exposure, especially hydrogen sulfide (rotten egg smell).

To identify critical topics in citizen-contributed smell reports, we analyzed the frequency of words (unigram) and phrases (bigram) in the text fields. We used python NLTK package (Bird et al., 2009) to remove stop words and group similar words with different forms (lemmatization). Figure 2 shows that high-frequency words and phrases mostly described industrial pollution odors and symptoms of air pollution exposure, especially hydrogen sulfide that has rotten egg smell and can cause a headache, dizziness, eye irritation, sore throat, cough, nausea, and shortness of breath (Lindenmann et al., 2010; Reiffenstein et al., 1992; Guidotti, 2010; Council et al., 2009). This finding inspired us to examine how hydrogen sulfide affected urban odors in the next study.

5.2. Smell Dataset Study

Figure 3. The distribution of smell reports on selected zip code regions. The integers on each zip code region indicate the number of reports.
Figure 4. The average smell values aggregated by hour of day and day of week. Our users rarely submit smell reports at nighttime.

In this study, we show that human-reported smell data, despite noisy, can still enable prediction and contribute scientific knowledge of interpretable air pollution patterns. We first constructed and introduced a dataset with air quality sensor readings and smell reports from October 31 in 2016 to September 27 in 2018 (sme, 2018a)

. The sensor data were recorded hourly by twelve government-operated monitoring stations at different locations in Pittsburgh, which included timestamps, particulate matters, sulfur dioxide, carbon monoxide, nitrogen oxides, ozone, hydrogen sulfide, and wind information (direction, speed, and standard deviation of direction). The smell report data contained timestamps, zip-codes, smell ratings, descriptions of sources, symptoms, and comments. For privacy preservation, we dropped the GPS location (latitude and longitude) of the smell reports and used zip-codes instead.

We framed the smell event prediction as a supervised learning task to approximate the function

that maps a predictor matrix

to a response vector

. The predictor matrix and the response vector represented air quality data and smell events respectively. To build , we re-sampled air quality data over the previous hour at the beginning of each hour. For example, at 3 pm, we took the mean value of sensor readings between 2 pm and 3 pm to construct a new sample. Wind directions were further decomposed into cosine and sine components. To equalize the effect of predictors, we normalized each column of matrix

to zero mean and unit variance. Missing values were replaced with the corresponding mean values.

To build that represents smell events, we aggregated high-rating smell reports over the future 8 hours at the beginning of each hour. We specifically chose the geographic regions that have sufficient amount of data during aggregation (Figure 3

). For instance, at 3 pm, we took the sum of smell ratings with values higher than two between 3 pm and 11 pm to obtain a confidence score, which represented agreements of how likely a smell event occurred. The scores were further divided into positive and negative classes (with or without smell events) by using threshold 40. In this way, we simplify the task to a binary classification problem, with 64 predictor variables (columns of

) and 16,766 samples (rows of and ). The distribution of classes was highly imbalanced (only 8% positive). Besides classification, we also applied a regression approach to predict the confidence scores directly without thresholding initially. Then the predicted scores were thresholded post hoc with value 40 to produce positive and negative classes, which enabled us to compare the performance of these two approaches.

When performing classification and regression, we added 3-hour lagged predictor variables, days of the week, hours of the day, and days of the month into the original predictor variable, which expanded its length from 64 to 195. The lagged duration was chosen during model selection. We implemented two models, Random Forest 

(Breiman, 2001) and Extremely Randomized Trees (Geurts et al., 2006), by using python scikit-learn package (Pedregosa et al., 2011). These algorithms build a collection of decision trees using the CART algorithm (Leo Breiman, 1984), where the leaves represent the responses and the branches represent the logical conjunction of predictors in . There were three tunable model parameters: the number of trees in the model, the number of features to select randomly for splitting a tree node, and the minimum number of samples required to split a tree node. For simplicity, we fixed the number of trees (1,000 for classification and 200 for regression) and chose other parameters during model selection.

Figure 5. This figure shows true positives (TP), false positives (FP), and false negatives (FN). The x-axis represents time. The blue and red boxes indicate ground truth and predicted smell events respectively.

To evaluate model performance, we defined and computed true positives (TP), false positives (FP), and false negatives (FN) to obtain precision, recall, and F-score

(Powers, 2011) (Figure 5). We first merged consecutive positive samples to compute the starting and ending time of smell events. Then, if a predicted event overlapped with a ground truth event, we counted this event as a TP. Otherwise, we counted a non-overlapped predicted event as an FP. For ground truth events that had no overlapping predicted events, we counted them as FN. When computing these metrics, we considered only daytime events because residents rarely submitted reports during nighttime (Figure 4). We defined daytime from 5 am to 7 pm. Since the model predicted if a smell event would occur in the next 8 hours, we only evaluated the prediction generated from 5 am to 11 am.

Precision Recall F-score
For prediction:
Classification ET 0.870.01 0.590.01 0.700.01
Classification RF 0.800.02 0.570.01 0.660.01
Regression ET 0.570.01 0.760.01 0.650.01
Regression RF 0.540.01 0.750.01 0.630.01
For interpretation:
Decision Tree 0.730.04 0.810.05 0.770.04
Table 3. Cross-validation of model performances (mean standard deviation). We run this experiment for 100 times with random seeds. Abbreviations “ET” and “RF” indicate Extremely Randomized Trees and Random Forest respectively, which are used for predicting upcoming smell events. The Decision Tree, different from the others, is for interpreting air pollution patterns on a subset of the entire dataset.

We chose model parameters by using time-series cross-validation (Kohavi, 1995; Arlot and Celisse, 2010), where the entire dataset was partitioned and rolled into several pairs of training and testing subsets for evaluation. Because our predictors and responses were all time-dependent, we used previous samples to train the models and evaluated them on future data. We first divided all samples into folds, with each fold approximately representing a week. Then, starting from fold 49, we took the previous 48 folds as training data (about 8,000 samples) and the current fold as testing data (about 168 samples). This procedure was iterated for the rest of the folds, which reflected the setting of the deployed system where a new model was trained on every Sunday night by using data from the previous 48 weeks. Table 3

reports the evaluation metrics after cross-validating the models 100 times with various random seeds.

While these models enabled us to predict future smell events, they were typically considered as black box models and not suitable for interpreting patterns. Although these models provided feature importances, interpreting these weights could be problematic because several predictors in the dataset were highly correlated, which might appear less significant than other uncorrelated counterparts. Inspired by several previous works related to extracting knowledge from data (Shaikhina et al., 2017; Gass et al., 2014; Caruana et al., 2006)

, we utilized a white box model, Decision Tree, to explain a representative subset of predictors and samples, which were selected by applying feature selection

(Guyon and Elisseeff, 2003)

and cluster analysis. One can view this process as performing model compression to distill the knowledge in a large black box model into a compact model that is explainable to human

(Bucilua et al., 2006; Hinton et al., 2015).

Figure 6. The right map shows smell reports and sensor readings with important predictors at 10:30 am on December 3, 2017. The left graph shows the first five depth levels of the Decision Tree with F-score 0.81, which explains the pattern of about 50% smell events. The first line of a tree node indicates the ratio of the number of positive (with smell event) and negative samples (no smell event). The second and third lines of the node show the feature and its threshold for splitting. The most important predictor is the interaction between the north-south wind directions at Parkway and the previous 1-hour hydrogen sulfide readings at Liberty (=.58, =16,766, <.001), where means the point-biserial correlation of the predictor and smell events.

During data interpretation, we only considered the classification approach due to better performance. First, we used domain knowledge to manually select features. As there were many highly correlated features, selecting a subset of them arbitrarily for extracting patterns was impractical. The knowledge obtained from informal community meetings and the result discovered in the text analysis (Figure 2) suggested that hydrogen sulfide might be the primary source of smell events. This finding inspired us to chose hydrogen sulfide, wind direction, wind speed, and standard deviation of wind direction from all of the other available predictors. The current and up to 2-hour lagged predictor variables were all included. Also, we added interaction terms of all predictors, such as hydrogen sulfide multiplied by the sine component of wind direction. This manual feature selection procedure produced 781 features.

Then, we used DBSCAN (Ester et al., 1996) to cluster positive samples and to choose a representative subset. The distance matrix for clustering was derived from a Random Forest fitted on the manually selected features. For each sample pair, we counted the number of times that the pair appeared in the same leaf for all trees in the model. The numbers were treated as the similarity of sample pairs and scaled to the range between 0 and 1. We converted the similarity into distance by using . This procedure identified a cluster with about 25% of positive samples from 50% of the smell events.

Finally, we used recursive feature elimination (Guyon et al., 2002) to remove 50 features that had the smallest weights iteratively, which resulted in 30 most important features. These feature importance weights were computed by fitting a Random Forest. We trained a Decision Tree using the CART algorithm (Leo Breiman, 1984) to interpret the cluster and the selected 30 features. Parameters for data interpretation (DBSCAN, Random Forest, and Decision Tree) were selected by using cross-validation. Table 3 reports the evaluation metrics after cross-validating the model for 100 times with random seeds. The result showed that the model was capable of explaining the underlying pattern of about 50% of the smell events, which was a joint effect of wind information and hydrogen sulfide readings (Figure 6).

5.3. Survey Study

In this study, we show that the system can motivate active community members to contribute data and increase their self-efficacy, beliefs about how well an individual can achieve desired effects through actions (Bandura, 1977). We recruited adult participants via snowball sampling (Biernacki and Waldorf, 1981). We administered and delivered an anonymous online survey via email to community advocacy groups and asked them to distribute the survey to others. Paper surveys were also provided. All responses were kept confidential, and there was no compensation. We received 29 responses in total over one month from March 20th to April 20th, 2018. Four responses were excluded due to incomplete questions or no experiences in interacting with the system, which gave 25 valid survey responses. There were 8 males, 16 females, and 1 person with undisclosed gender information. All but one participant had a Bachelor’s degree at minimum. The demographics of the sample population (Table 4) were not typical for the region. The survey had three sections: (1) Self-Efficacy Changes, (2) Motivation Factors, (3) System Usage Information.

18-24 25-34 35-44 45-54 55-64 65-74 Sum
Associate 0 0 1 0 0 0 1
Bachelor 2 2 2 0 1 1 8
Master 0 2 2 0 0 4 8
Doctoral 0 1 1 1 5 0 8
Sum 2 5 6 1 6 5 25
Table 4. Demographics of participants. Columns and rows represent ages and education levels.

For Self-Efficacy Changes, we measured changes to user confidence mitigating air quality problems. This section was framed as a retrospective pre-post self-assessment. The items were divided between pre-assessment, “BEFORE you knew about or used Smell Pittsburgh,” and post-assessment, “AFTER you knew about or used Smell Pittsburgh.” For both assessments, we used a scale developed by the Cornell Lab of Ornithology (Porticella et al., 2017b; DEVISE, 2010). The scale was customized for air quality to suit our purpose. The scale consisted of eight Likert-type items (from 1 “Strongly Disagree” to 5 “Strongly Agree”).

Count Percentage
Other (the open-response text field) 9 36%
At least once per month 7 28%
At least once per week 4 16%
At least once per day 3 12%
At least once per year 2 8%
Table 5. Frequency of system usage (sorted by percentage).
Count Percentage
I submitted smell reports. 22 88%
I checked other people’s smell reports on the map visualization. 22 88%
I opened Smell Pittsburgh when I noticed unusual smell. 22 88%
I discussed Smell Pittsburgh with other people. 21 84%
I provided my contact information when submitting smell reports. 14 56%
I paid attention to smell event alert notifications provided by Smell Pittsburgh. 13 52%
I shared Smell Pittsburgh publicly online (e.g. email, social media, news blog). 13 52%
I clicked on the playback button to view the animation of smell reports. 9 36%
I took screenshots of Smell Pittsburgh. 9 36%
I mentioned or presented Smell Pittsburgh to regulators. 6 24%
I downloaded smell reports data from the Smell Pittsburgh website. 4 16%
Table 6. The multiple-choice question for measuring participation level (sorted by percentage).

The Motivation Factors section was based on a scale developed by the Cornell Lab of Ornithology (Porticella et al., 2017a; DEVISE, 2010) with 14 Likert-type items (from 1 “Strongly Disagree” to 5 “Strongly Agree”). The scale was customized for air quality and measured both internal (7 items) and external motivations (7 items). Examples of internal motivations included enjoyment during participation and the desire to improve air quality. Examples of external motivations included the attempt to gain rewards and to avoid negative consequences if not taking actions. A text field with question “Are there other reasons that you use Smell Pittsburgh?” was provided for open responses.

In the System Usage Information section, we collected individual experiences with Smell Pittsburgh. We documented participation level through a multiple-choice and multiple-response question, “How did you use Smell Pittsburgh?” as shown in Figure 7 (right). This question allowed participants to select from a list of 11 activities, which include submitting reports, interacting with the system, sharing experiences, and disseminating data (Table 6). We identified the frequency of system usage through a multiple-choice question, “How often do you use Smell Pittsburgh?” as shown in Table 5. Text fields were provided for both of the above two questions.

Figure 7. The distributions of self-efficacy changes, motivations, and participation level for our survey responses. The red lines in the middle of the box indicate the median. The red-filled diamonds represent the mean. The top and bottom edges of a box indicate 75% () and 25% (

) quantiles respectively. The boxes show inter-quantile ranges

. The top and bottom whiskers show and

respectively. Black hollow circles show outliers.

At the end of the survey, we asked an open-response question “Do you have any other comments, questions, or concerns?” Our analysis is presented below along with each related question and selected quotes. Bold emphases in the quotes were added by researchers to highlight key user sentiments.

For Self-Efficacy, we averaged the scale items to produce total self-efficacy pre score (Mdn=3.50) and post score (Mdn=4.13) for each participant (Figure 7

, left). A two-tailed Wilcoxon Signed-Ranks test (a nonparametric version of a paired t-test) indicated a statistically significant difference (W=13.5, Z=-3.79, p<.001 during="" finding="" in="" increases="" indicated="" participation.="" self-efficacy="" that="" there="" this="" were="">

For Motivation Factors, we averaged the internal (Mdn=4.29) and external (Mdn=3.14) motivation scores for each participant (Figure 7, center). A two-tailed Wilcoxon Signed-Ranks test indicated a statistically significant difference (W=0, Z=-4.29, p<.001). This result suggested that internal factors were primary motivations for our participants rather than external factors. Open-ended answers showed that nine participants (36%) mentioned that the system enabled them to contribute data-driven evidence efficiently and intuitively.

“I used to try to use the phone to call in complaints, but that was highly unsatisfactory. I never knew if my complaints were even registered. With Smell Pittsburgh, I feel that I’m contributing to taking data, as well as to complaining when it’s awful. […]”

“It’s seems to be the most effective way to report wood burning that can fill my neighborhood with the smoke and emissions from wood burning.”

“The Smell app quantifies observations in real time. Researchers can use this qualitative information along quantitative data in real time. Added benefit is to have [the health department] receive this information in real time without having to make a phone call or send separate email. I have confidence that the recording of Smell app data is quantified more accurately than [the health department]’s.”

“It is an evidence based way for a citizen to register what is going on with the air where I live and work.”

“I believe in science and data and think this can help build a case. […]”

Also, four participants (16%) indicated the benefit to validate personal experiences based on the data provided by others.

“I used to (and sometimes still do) call reports in to [the health department]. I love how the map displays after I post a smell report. Wow! I’m not alone!

“It validates my pollution experiences because others are reporting similar experiences.”

“I like using it for a similar reason that I like checking the weather. It helps me understand my environment and confirms my sense of what I’m seeing.”

We also found that altruism, the concern about the welfare of others, was another motivation. Six participants (24%) mentioned the desire to address climate changes, activate regulators, raise awareness of others, expand air quality knowledge, influence policy-making, and build a sense of community.

“Because climate change is one of our largest challenges, […] Also, the ACHD isn’t as active as they should be, and needs a nudge.”

“I use [Smell Pittsburgh] to demonstrate to others how they can raise their own awareness. I’ve also pointed out to others that many who have grown up in this area of Western PA have grown up with so much pollution, to them air pollution has become normalized and many do not even smell the pollution any more. This is extremely dangerous and disturbing.”

“I want to help expand the knowledge and education of air quality in Pittsburgh and believe the visuals Smell Pittsburgh provides is the best way to do that.”

“I believe in the power of crowd-sourced data to influence policy decisions. I also believe that the air quality activism community will find more willing participants if there is a very easy way for non-activists to help support clean air, and the app provides that mechanism. It is basically a very easy onramp for potential new activists. The app also acts as a way for non-activists to see that they are not alone in their concerns about stinky air, which I believe was a major problem for building momentum in the air quality community prior to the app’s existence.”

For System Usage Information, we reported the counts for system usage frequency questions (Table 5). The result showed that our users had a wide variety of system usage frequency. Open-responses indicated that instead of using the system regularly, eight participants (32%) only submitted reports whenever they experienced poor odors. To quantify participation levels, we counted the number of selected choices for each participant, as shown in Figure 7 (right). We found that our participation levels were normally distributed. In the open-response text field for this question, two participants (8%) mentioned using personal resources to help promote the system.

“I ran a Google Adwords campaign to get people to install Smell Pittsburgh. It turns out that about $6 of ad spending will induce someone to install the app.”

“I take and share so many screenshots! Those are awesome. […] I also made two large posters of the app screen– one on a very bad day, and one on a very good day. I bring them around to public meetings and try to get county officials to look at them.”

In the open-ended question to freely provide comments and concerns, two participants (8%) were frustrated about the lack of responses from regulators and unclear values of using the data to take action.

“After using this app for over a year, and making many dozens of reports, I haven’t once heard from the [health department]. That is disappointing, and makes me wonder, why bother? […] Collecting this data is clever, but towards what end? I sometimes don’t see the point in continuing to report.”

It wasn’t clear when using the app that my submission was counted […]. I want to be able to see directly that my smell reports are going somewhere and being used for something. […]”

Also, five (20%) participants suggested augmenting the current system with new features and offering this mobile computing tool to more cities. Such features involved reporting smell retrospectively and viewing personal submission records.

“I get around mostly by bike

, so it is difficult to report smells the same moment I smell them. I wish I could report smells in a different location than where I am so that I

could report the smell once I reach my destination.”

“It would be nice to be able to add a retroactive report. We often get the strong sulfur smells in Forest Hills in the middle of the night […] but I strongly prefer to not have to log in to my phone at 3 am to log the report as it makes it harder to get back to sleep.”

“This app should let me see/download all of my data: how many times I reported smells, what my symptoms and comments were and how many times the [health department] didn’t respond […]”

6. Discussion

We have shown that the smell data gathered through the system is practical in identifying local air pollution patterns. We released the system in September 2016. During 2017, the system collected 8,720 smell reports, which is 10-fold more than the 796 complaints collected by the health department regulators in 2016. All smell reports in our system had location data, while the location information was missing from 45% of the regulator-collected complaints. Although there is a significant increase in data quantity (as described in the system usage study), researchers may criticize the reliability of the citizen-contributed data, since lay experiences may be subjective, inconsistent, and prone to noise. Despite these doubts, in the smell dataset study, we have applied machine learning to demonstrate the predictive and explainable power of these data. It is viable to forecast upcoming smell events based on previous observations. We also extracted connections between predictors and responses that reveal a dominant local air pollution pattern, which is a joint effect of hydrogen sulfide and wind information. This pattern could serve as hypotheses for future epidemiological studies. Since users tended not to report when there is no pollution odor, we recommend aggregating smell ratings by time to produce samples with no smell events. According to the experiments of different models, we suggest using the classification approach by thresholding smell ratings instead of the regression approach. In reality, the effect of 10 and 100 smell reports with high ratings may be the same for a local geographical region. It is highly likely that the regression function tried to fit the noise after a certain threshold that indicates the presence of a smell event.

We have also shown that the transparency of smell data empowered communities to advocate for better air quality. The findings in the survey study suggested that the system lowered the barrier for communities to contribute and communicate data-driven evidence. Although the small sample size limited the survey study, the result showed increases in self-efficacy after using the system. Several participants were even willing to use their resources to encourage others to install the system and engage in odor reporting. Moreover, in July 2018, activists attended the Board of Health meeting with the ACHD (Allegheny County Health Department) and presented a printed 230-foot-long scroll of more than 11,000 complaints submitted through the system. These reports allowed community members to ground their personal experiences with concrete and convincing data. The presented smell scroll demonstrated strong evidence about the impact of air pollution on the living quality of citizens, which forced regulators to respond to the air quality problem publicly. The deputy director of environmental health mentioned that ACHD would enact rigorous rules for coke plants: “Every aspect of the activity and operation of these coke plants will have a more stringent standard applied (Clift, 2018; Hopey, 2018).” In this case, Smell Pittsburgh rebalanced the power relationships between communities and regulators.

6.1. Limitation

We have explained the design, deployment, and evaluation of a mobile smell reporting system for Pittsburgh communities to collect and visualize odor events. However, our survey study only targeted community activists, which led to a relatively small sample size. The commitment of these users may be driven by internal motivations, such as altruism, instead of the system. Also, due to system limitations in tracking user behaviors and the sparsity of air pollution events, we leave the analysis of whether push notifications encourage user engagement to future work. Additionally, our community members might be unique in their characteristics, such as the awareness of the air quality problem, the tenacity of advocacy, and the power relationships with other stakeholders. Involving citizens to address urban air pollution collaboratively is a wicked problem by its nature, so there is no guarantee that our success and effectiveness can be replicated in a different context. It is possible that interactive systems like Smell Pittsburgh can only be practical for communities with specific characteristics, such as high awareness. Future research is needed to study the impact of deploying this system in other cities that have similar or distinct community characteristics compared to Pittsburgh. It can also be beneficial to explore ways to connect citizens and regulators, such as visualizing smell reports by voting districts, providing more background information with demographics and industry data, and sending push notifications regarding health agency public meetings.

Furthermore, from a machine-learning standpoint, community-powered projects such as Smell Pittsburgh often face two challenges that compromise model performances: data sparsity and label unreliability. Recent research has shown that deep neural networks can predict events effectively when equipped with a significant amount of training data (LeCun et al., 2015). However, the number of collected smell reports are far away from such level due to the limited size of the community and active users. The participated 3,917 users out of the 300,000 residents (1.3%) is not sufficient to cover the entire Pittsburgh area. Additionally, in our case, air pollution incidents can only be captured at the moment because our communities lack resources to deploy reliable air quality monitoring sensors. It is impractical to annotate these incidents off-line such as in Galaxy Zoo (Raddick et al., 2013)

. While adopting transfer learning could take advantage of existing large-scale datasets from different domains to boost our performance 

(Pan and Yang, 2010), data sparsity is a nearly inevitable issue that must be taken into consideration for many community-powered projects. Another issue is the label unreliability. There is no real “ground truth” air pollution data in our case. The smell events defined in this research were based on the consensus from a group of users. As a consequence, the quality of the labels for the prediction and interpretation task could be influenced by confirmation bias, where people tend to search for information that confirms their prior beliefs. Such type of systematic error may be difficult to avoid, especially for community-powered projects. Future work to address these two challenges involves adding more predictors (e.g., weather forecasts, air quality index) and using generalizable data interpretation techniques that can explain any predictive model to identify more patterns (Ribeiro et al., 2016).

7. Conclusion

This paper explores the design and impact of a mobile smell reporting system, Smell Pittsburgh, to empower communities in advocating for better air quality. The system enables citizens to submit and visualize odor experiences without the assistance from professionals. The visualization presents the context of air quality concerns from multiple perspectives as evidence. In our evaluation, we studied the distribution of smell reports and interaction events among different types of users. We also constructed a smell event dataset to study the value of these citizen-contributed data. By adopting machine learning, we developed a model to predict smell events and send push notifications accordingly. We also trained an explainable model to reveal connections between air quality sensor readings and smell events. Using a survey, we studied motivation factors for submitting smell reports and measured user attitude changes after using the system. Finally, we discussed limitations and future directions: deploying the system in multiple cities and using advanced techniques for pattern recognition. We envision that this research can inspire engineers, designers, and researchers to develop systems that support community advocacy and empowerment.

The Heinz Endowments, the CREATE Lab (Jessica Pachuta, Ana Tsuhlares), Allegheny County Clean Air Now (ACCAN), PennEnvironment, Group Against Smog and Pollution (GASP), Sierra Club, Reducing Outdoor Contamination in Indoor Spaces (ROCIS), Blue Lens, PennFuture, Clean Water Action, Clean Air Council, the Global Communication Center of Carnegie Mellon University (Ryan Roderick), the Allegheny County Health Department, and all other participants.


  • (1)
  • sme (2018a) 2018a. A tool for predicting and interpreting smell data obtained from Smell Pittsburgh.
  • sme (2018b) 2018b. Smell Pittsburgh.
  • ACCAN (2018) ACCAN. 2018. Allegheny County Clean Air Now.
  • American Lung Association (2017) American Lung Association. 2017. State of The Air.
  • Arlot and Celisse (2010) Sylvain Arlot and Alain Celisse. 2010. A survey of cross-validation procedures for model selection. Statist. Surv. 4 (2010), 40–79.
  • Azid et al. (2014) Azman Azid, Hafizan Juahir, Mohd Ekhwan Toriman, Mohd Khairul Amri Kamarudin, Ahmad Shakir Mohd Saudi, Che Noraini Che Hasnam, Nor Azlina Abdul Aziz, Fazureen Azaman, Mohd Talib Latif, Syahrir Farihan Mohamed Zainuddin, Mohamad Romizan Osman, and Mohammad Yamin. 2014. Prediction of the Level of Air Pollution Using Principal Component Analysis and Artificial Neural Network Techniques: a Case Study in Malaysia. Water, Air, & Soil Pollution 225, 8 (21 Jul 2014), 2063.
  • Bandura (1977) Albert Bandura. 1977. Self-efficacy: toward a unifying theory of behavioral change. Psychological review 84, 2 (1977), 191.
  • Bellinger et al. (2017) Colin Bellinger, Mohomed Shazan Mohomed Jabbar, Osmar Zaïane, and Alvaro Osornio-Vargas. 2017. A systematic review of data mining and machine learning for air pollution epidemiology. BMC public health 17, 1 (2017), 907.
  • Biernacki and Waldorf (1981) Patrick Biernacki and Dan Waldorf. 1981. Snowball sampling: Problems and techniques of chain referral sampling. Sociological methods & research 10, 2 (1981), 141–163.
  • Bird et al. (2009) Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.
  • Bird et al. (2014) Tomas J Bird, Amanda E Bates, Jonathan S Lefcheck, Nicole A Hill, Russell J Thomson, Graham J Edgar, Rick D Stuart-Smith, Simon Wotherspoon, Martin Krkosek, Jemina F Stuart-Smith, et al. 2014. Statistical solutions for error and bias in global citizen science datasets. Biological Conservation 173 (2014), 144–154.
  • Bishop (2006) Christopher Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York.
  • Blevis (2007) Eli Blevis. 2007. Sustainable Interaction Design: Invention & Disposal, Renewal & Reuse. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’07). ACM, New York, NY, USA, 503–512.
  • Bonney et al. (2009a) Rick Bonney, Heidi Ballard, Rebecca Jordan, Ellen McCallie, Tina Phillips, Jennifer Shirk, and Candie C Wilderman. 2009a. Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report. Online Submission (2009).
  • Bonney et al. (2009b) Rick Bonney, Caren B. Cooper, Janis Dickinson, Steve Kelling, Tina Phillips, Kenneth V. Rosenberg, and Jennifer Shirk. 2009b. Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy. BioScience 59, 11 (2009), 977–984. arXiv:
  • Bonney et al. (2016) Rick Bonney, Tina B Phillips, Heidi L Ballard, and Jody W Enck. 2016. Can citizen science enhance public understanding of science? Public Understanding of Science 25, 1 (2016), 2–16.
  • Bonney et al. (2014) Rick Bonney, Jennifer L. Shirk, Tina B. Phillips, Andrea Wiggins, Heidi L. Ballard, Abraham J. Miller-Rushing, and Julia K. Parrish. 2014. Next Steps for Citizen Science. Science 343, 6178 (2014), 1436–1437. arXiv:
  • Breiman (2001) Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (01 Oct 2001), 5–32.
  • Brown (1992) Phil Brown. 1992. Popular epidemiology and toxic waste contamination: lay and professional ways of knowing. Journal of health and social behavior (1992), 267–281.
  • Brynjarsdottir et al. (2012) Hronn Brynjarsdottir, Maria Håkansson, James Pierce, Eric Baumer, Carl DiSalvo, and Phoebe Sengers. 2012. Sustainably Unpersuaded: How Persuasion Narrows Our Vision of Sustainability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12). ACM, New York, NY, USA, 947–956.
  • Bucilua et al. (2006) Cristian Bucilua, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 535–541.
  • Budde et al. (2017) Matthias Budde, Andrea Schankin, Julien Hoffmann, Marcel Danz, Till Riedel, and Michael Beigl. 2017. Participatory Sensing or Participatory Nonsense?: Mitigating the Effect of Human Error on Data Quality in Citizen Science. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 39.
  • Bushdid et al. (2014) Caroline Bushdid, Marcelo O Magnasco, Leslie B Vosshall, and Andreas Keller. 2014. Humans can discriminate more than 1 trillion olfactory stimuli. Science 343, 6177 (2014), 1370–1372.
  • CAC (2018) CAC. 2018. Clean Air Council.
  • Caruana et al. (2006) Rich Caruana, Mohamed Elhawary, Art Munson, Mirek Riedewald, Daria Sorokina, Daniel Fink, Wesley M Hochachka, and Steve Kelling. 2006. Mining citizen science data to predict orevalence of wild bird species. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 909–915.
  • Chari et al. (2017) Ramya Chari, Luke J Matthews, Marjory S Blumenthal, Amanda F Edelman, and Therese Jones. 2017. The Promise of Community Citizen Science. (2017).
  • Clift (2018) Theresa Clift. 2018. Allegheny County Health Department defends air quality efforts, plans stricter coke plant rules.
  • Cohn (2008) Jeffrey P. Cohn. 2008. Citizen Science: Can Volunteers Do Real Research? BioScience 58, 3 (2008), 192–197. arXiv:
  • Conklin (2005) Jeff Conklin. 2005. Dialogue mapping: Building shared understanding of wicked problems. John Wiley & Sons, Inc.
  • Cooper et al. (2007) Caren B Cooper, Janis Dickinson, Tina Phillips, and Rick Bonney. 2007. Citizen science as a tool for conservation in residential ecosystems. Ecology and Society 12, 2 (2007), 11.
  • Cooper and Lewenstein (2016) Caren B. Cooper and Bruce V. Lewenstein. 2016. Two Meanings of Citizen Science. In The Rightful Place of Science: Citizen Science, Darlene Cavalier and Eric B. Kennedy (Eds.). Consortium for Science, Policy & Outcomes, Arizona State University.
  • Corburn (2005) Jason Corburn. 2005. Street Science: Community Knowledge and Environmental Health Justice (Urban and Industrial Environments). The MIT Press.
  • Council et al. (2009) National Research Council, Committee on Acute Exposure Guideline Levels, et al. 2009. Acute exposure guideline levels for selected airborne chemicals. Vol. 9. National Academies Press.
  • Dantec and DiSalvo (2013) Christopher A Le Dantec and Carl DiSalvo. 2013. Infrastructuring and the formation of publics in participatory design. Social Studies of Science 43, 2 (2013), 241–264.
  • Den Broeder et al. (2016) Lea Den Broeder, Jeroen Devilee, Hans Van Oers, A Jantine Schuit, and Annemarie Wagemakers. 2016. Citizen Science for public health. Health promotion international (2016), daw086.
  • Devillers and Jeansoulin (2006) Rodolphe Devillers and Robert Jeansoulin. 2006. Fundamentals of Spatial Data Quality (Geographical Information Systems Series). ISTE.
  • DEVISE (2010) DEVISE. 2010. Developing, Validating, and Implementing Situated Evaluation Instruments.
  • D’Hondt et al. (2013) Ellie D’Hondt, Matthias Stevens, and An Jacobs. 2013. Participatory noise mapping works! An evaluation of participatory sensing as an alternative to standard techniques for environmental monitoring. Pervasive and Mobile Computing 9, 5 (2013), 681–694.
  • Dickinson and Bonney (2012) Janis L. Dickinson and Rick Bonney. 2012. Citizen Science: Public Participation in Environmental Research (1 ed.). Cornell University Press.
  • Dickinson et al. (2012) Janis L Dickinson, Jennifer Shirk, David Bonter, Rick Bonney, Rhiannon L Crain, Jason Martin, Tina Phillips, and Karen Purcell. 2012. The current state of citizen science as a tool for ecological research and public engagement. Frontiers in Ecology and the Environment 10, 6 (2012), 291–297.
  • DiSalvo et al. (2009) Carl DiSalvo, Kirsten Boehner, Nicholas A. Knouf, and Phoebe Sengers. 2009. Nourishing the Ground for Sustainable HCI: Considerations from Ecologically Engaged Art. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’09). ACM, New York, NY, USA, 385–394.
  • DiSalvo et al. (2010) Carl DiSalvo, Phoebe Sengers, and Hrönn Brynjarsdóttir. 2010. Mapping the Landscape of Sustainable HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10). ACM, New York, NY, USA, 1975–1984.
  • Dockery et al. (1993) Douglas W. Dockery, C. Arden Pope, Xiping Xu, John D. Spengler, James H. Ware, Martha E. Fay, Benjamin G. Jr. Ferris, and Frank E. Speizer. 1993. An Association between Air Pollution and Mortality in Six U.S. Cities. New England Journal of Medicine 329, 24 (1993), 1753–1759. PMID: 8179653.
  • Donnelly et al. (2015) Aoife Donnelly, Bruce Misstear, and Brian Broderick. 2015. Real time air quality forecasting using integrated parametric and non-parametric regression techniques. Atmospheric Environment 103 (2015), 53 – 65.
  • Dourish (2010) Paul Dourish. 2010. HCI and Environmental Sustainability: The Politics of Design and the Design of Politics. In Proceedings of the 8th ACM Conference on Designing Interactive Systems (DIS ’10). ACM, New York, NY, USA, 1–10.
  • Eitzel et al. (2017) MV Eitzel, Jessica L Cappadonna, Chris Santos-Lang, Ruth Ellen Duerr, Arika Virapongse, Sarah Elizabeth West, Christopher Conrad Maximillian Kyba, Anne Bowser, Caren Beth Cooper, Andrea Sforzi, et al. 2017. Citizen science terminology matters: Exploring key terms. Citizen Science: Theory and Practice 2, 1 (2017).
  • EPA (2014) EPA. 2014. A Guide to Air Quality and Your Health.
  • Ester et al. (1996) Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96. 226–231.
  • Ford et al. (2017) Bonne Ford, Moira Burke, William Lassman, Gabriele Pfister, and Jeffrey R Pierce. 2017. Status update: is smoke on your mind? Using social media to assess smoke exposure. Atmospheric Chemistry and Physics 17, 12 (2017), 7541–7554.
  • GASP (2018) GASP. 2018. Group Against Smog and Pollution.
  • Gass et al. (2014) Katherine Gass, Mitch Klein, Howard H Chang, W Dana Flanders, and Matthew J Strickland. 2014. Classification and regression trees for epidemiologic research: an air pollution example. Environmental Health 13, 1 (2014), 17.
  • Geurts et al. (2006) Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (01 Apr 2006), 3–42.
  • Greaves and Lishman (1980) Bernard Greaves and Gordon Lishman. 1980. The Theory and Practice of Community Politics. A.L.C. Campaign Booklet No. 12.
  • Guidotti (2010) Tee L Guidotti. 2010. Hydrogen sulfide: advances in understanding human toxicity. International journal of toxicology 29, 6 (2010), 569–581.
  • Guyon and Elisseeff (2003) Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.
  • Guyon et al. (2002) Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002.

    Gene selection for cancer classification using support vector machines.

    Machine learning 46, 1-3 (2002), 389–422.
  • Haklay (2013) Muki Haklay. 2013. Citizen science and volunteered geographic information: Overview and typology of participation. In Crowdsourcing geographic knowledge. Springer, 105–122.
  • Hastie et al. (2009) Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer-Verlag New York.
  • Henshaw (2013) Victoria Henshaw. 2013. Urban smellscapes: Understanding and designing city smell environments. Routledge.
  • Hinton et al. (2015) Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  • Hochachka et al. (2012) Wesley M. Hochachka, Daniel Fink, Rebecca A. Hutchinson, Daniel Sheldon, Weng-Keen Wong, and Steve Kelling. 2012. Data-intensive science applied to broad-scale citizen science. Trends in Ecology & Evolution 27, 2 (2012), 130 – 137. Ecological and evolutionary informatics.
  • Hopey (2018) Don Hopey. 2018. Air advocates read scroll of smells at health board meeting.
  • Hsieh et al. (2015) Hsun-Ping Hsieh, Shou-De Lin, and Yu Zheng. 2015. Inferring Air Quality for Station Location Recommendation Based on Urban Big Data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, USA, 437–446.
  • Hsu (2018) Yen-Chia Hsu. 2018. Designing Interactive Systems for Community Citizen Science. Ph.D. Dissertation. Carnegie Mellon University, Pittsburgh, PA.
  • Hsu et al. (2017) Yen-Chia Hsu, Paul Dille, Jennifer Cross, Beatrice Dias, Randy Sargent, and Illah Nourbakhsh. 2017. Community-Empowered Air Quality Monitoring System. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 1607–1619.
  • Hsu et al. (2016) Yen-Chia Hsu, Paul Dille, Randy Sargent, and Illah Nourbakhsh. 2016. Industrial Smoke Detection and Visualization. Technical Report.
  • Irwin (1995) Alan Irwin. 1995. Citizen science: A study of people, expertise and sustainable development. Psychology Press.
  • Irwin (2001) Alan Irwin. 2001. Constructing the scientific citizen: Science and democracy in the biosciences. Public Understanding of Science 10, 1 (2001), 1–18. arXiv:
  • Irwin (2006) Alan Irwin. 2006. The Politics of Talk: Coming to Terms with the New Scientific Governance. Social Studies of Science 36, 2 (2006), 299–320. arXiv:
  • James et al. (2013) Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An introduction to statistical learning. Vol. 112. Springer.
  • Jordan and Mitchell (2015) M. I. Jordan and T. M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. Science 349, 6245 (2015), 255–260. arXiv:
  • Kampa and Castanas (2008) Marilena Kampa and Elias Castanas. 2008. Human health effects of air pollution. Environmental Pollution 151, 2 (2008), 362 – 367. Proceedings of the 4th International Workshop on Biomonitoring of Atmospheric Pollution (With Emphasis on Trace Elements).
  • Kim et al. (2013a) Sunyoung Kim, Jennifer Mankoff, and Eric Paulos. 2013a. Sensr: evaluating a flexible framework for authoring mobile data-collection tools for citizen science. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1453–1462.
  • Kim et al. (2015) Sunyoung Kim, Jennifer Mankoff, and Eric Paulos. 2015. Exploring Barriers to the Adoption of Mobile Technologies for Volunteer Data Collection Campaigns. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 3117–3126.
  • Kim et al. (2013b) Sunyoung Kim, Eric Paulos, and Jennifer Mankoff. 2013b.

    inAir: a longitudinal study of indoor air quality measurements and visualizations. In

    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2745–2754.
  • Kim et al. (2011) Sunyoung Kim, Christine Robson, Thomas Zimmerman, Jeffrey Pierce, and Eben M. Haber. 2011. Creek Watch: Pairing Usefulness and Usability for Successful Citizen Science. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM, New York, NY, USA, 2125–2134.
  • Kohavi (1995) Ron Kohavi. 1995. A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection. In

    Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2

    (IJCAI’95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137–1143.
  • Kuznetsov et al. (2011) Stacey Kuznetsov, George Davis, Jian Cheung, and Eric Paulos. 2011. Ceci n’est pas une pipe bombe: authoring urban landscapes with air quality sensors. In Proceedings of the sigchi conference on human factors in computing systems. ACM, 2375–2384.
  • Kuznetsov et al. (2013) Stacey Kuznetsov, Scott E. Hudson, and Eric Paulos. 2013. A Low-tech Sensing System for Particulate Pollution. In Proceedings of the 8th International Conference on Tangible, Embedded and Embodied Interaction (TEI ’14). ACM, New York, NY, USA, 259–266.
  • LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.
  • Leo Breiman (1984) Richard A. Olshen Charles J. Stone Leo Breiman, Jerome Friedman. 1984. Classification and regression trees. Chapman & HallCRC.
  • Lindenmann et al. (2010) Joerg Lindenmann, Veronika Matzi, Nicole Neuboeck, Beatrice Ratzenhofer-Komenda, Alfred Maier, and Freyja-Maria Smolle-Juettner. 2010. Severe hydrogen sulphide poisoning treated with 4-dimethylaminophenol and hyperbaric oxygen. (2010).
  • Maisonneuve et al. (2009) Nicolas Maisonneuve, Matthias Stevens, Maria E Niessen, and Luc Steels. 2009. NoiseTube: Measuring and mapping noise pollution with mobile phones. In Information technologies in environmental engineering. Springer, 215–228.
  • Mankoff et al. (2007) Jennifer C. Mankoff, Eli Blevis, Alan Borning, Batya Friedman, Susan R. Fussell, Jay Hasbrouck, Allison Woodruff, and Phoebe Sengers. 2007. Environmental Sustainability and Interaction. In CHI ’07 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’07). ACM, New York, NY, USA, 2121–2124.
  • McKinley et al. (2015) Duncan C McKinley, Abraham J Miller-Rushing, Heidi L Ballard, Rick Bonney, Hutch Brown, Daniel M Evans, Rebecca A French, Julia K Parrish, Tina B Phillips, Sean F Ryan, et al. 2015. Investing in citizen science can improve natural resource management and environmental protection. Issues in Ecology 19 (2015).
  • Miller-Rushing et al. (2012) Abraham Miller-Rushing, Richard Primack, and Rick Bonney. 2012. The history of public participation in ecological research. Frontiers in Ecology and the Environment 10, 6 (2012), 285–290.
  • Mitchell (1997) Tom Mitchell. 1997. Machine Learning. McGraw Hill.
  • Newman et al. (2012) Greg Newman, Andrea Wiggins, Alycia Crall, Eric Graham, Sarah Newman, and Kevin Crowston. 2012. The future of citizen science: emerging technologies and shifting paradigms. Frontiers in Ecology and the Environment 10, 6 (2012), 298–304.
  • Obrist et al. (2014) Marianna Obrist, Alexandre N Tuch, and Kasper Hornbaek. 2014. Opportunities for odor: experiences with smell and implications for technology. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2843–2852.
  • Okolloh (2009) Ory Okolloh. 2009. Ushahidi, or testimony: Web 2.0 tools for crowdsourcing crisis information. Participatory learning and action 59, 1 (2009), 65–70.
  • Ottinger (2010) Gwen Ottinger. 2010. Buckets of resistance: Standards and the effectiveness of citizen science. Science, Technology, & Human Values 35, 2 (2010), 244–270.
  • Ottinger (2016) Gwen Ottinger. 2016. Social Movement-Based Citizen Science. In The Rightful Place of Science: Citizen Science, Darlene Cavalier and Eric B. Kennedy (Eds.). Consortium for Science, Policy & Outcomes, Arizona State University.
  • Ottinger (2017a) Gwen Ottinger. 2017a. Crowdsourcing Undone Science. Engaging Science, Technology, and Society 3 (2017), 560–574.
  • Ottinger (2017b) Gwen Ottinger. 2017b. Making sense of citizen science: stories as a hermeneutic resource. Energy Research & Social Science 31 (2017), 41–49.
  • Pan and Yang (2010) Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (Oct 2010), 1345–1359.
  • Paulos et al. (2008) Eric Paulos, RJ Honicky, and Ben Hooker. 2008. Citizen science: Enabling participatory urbanism. Urban Informatics: Community Integration and Implementation (2008).
  • Pedregosa et al. (2011) Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, Oct (2011), 2825–2830.
  • Pena and Parshall (2012) William M Pena and Steven A Parshall. 2012. Problem seeking: An architectural programming primer. John Wiley & Sons.
  • PennEnvironment (2015) PennEnvironment. 2015. PennEnvironment.
  • PennFuture (2018) PennFuture. 2018. PennFuture.
  • Pope III and Dockery (2006) C Arden Pope III and Douglas W Dockery. 2006. Health effects of fine particulate air pollution: lines that connect. Journal of the air & waste management association 56, 6 (2006), 709–742.
  • Porticella et al. (2017a) N. Porticella, T. Phillips, and R. Bonney. 2017a. Motivation for Environmental Action (Generic). Technical brief series (2017).
  • Porticella et al. (2017b) N. Porticella, T. Phillips, and R. Bonney. 2017b. Self-Efficacy for Environmental Action (SEEA, Generic). Technical brief series (2017).
  • Powers (2011) David Martin Powers. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. (2011).
  • Prüss-Üstün and Neira (2016) Annette Prüss-Üstün and Maria Neira. 2016. Preventing disease through healthy environments: a global assessment of the burden of disease from environmental risks. World Health Organization.
  • Quercia et al. (2016) Daniele Quercia, Luca Maria Aiello, Rossano Schifanella, et al. 2016. The Emotional and Chromatic Layers of Urban Smells.. In ICWSM. 309–318.
  • Quercia et al. (2015) Daniele Quercia, Rossano Schifanella, Luca Maria Aiello, and Kate McLean. 2015. Smelly maps: the digital life of urban smellscapes. arXiv preprint arXiv:1505.06851 (2015).
  • Raddick et al. (2013) M Jordan Raddick, Georgia Bracey, Pamela L Gay, Chris J Lintott, Carie Cardamone, Phil Murray, Kevin Schawinski, Alexander S Szalay, and Jan Vandenberg. 2013. Galaxy Zoo: Motivations of citizen scientists. arXiv preprint arXiv:1303.6886 (2013).
  • Reiffenstein et al. (1992) RJ Reiffenstein, William C Hulbert, and Sheldon H Roth. 1992. Toxicology of hydrogen sulfide. Annual review of pharmacology and toxicology 32, 1 (1992), 109–134.
  • Ribeiro et al. (2016) Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016.

    Why should i trust you?: Explaining the predictions of any classifier. In

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135–1144.
  • Rittel and Webber (1973) Horst WJ Rittel and Melvin M Webber. 1973. Dilemmas in a general theory of planning. Policy sciences 4, 2 (1973), 155–169.
  • Rotman et al. (2012) Dana Rotman, Kezia Procita, Derek Hansen, Cynthia Sims Parr, and Jennifer Preece. 2012. Supporting content curation communities: The case of the Encyclopedia of Life. Journal of the American Society for Information Science and Technology 63, 6 (2012), 1092–1107.
  • Sauermann and Franzoni (2015) Henry Sauermann and Chiara Franzoni. 2015. Crowd science user contribution patterns and their implications. Proceedings of the National Academy of Sciences 112, 3 (2015), 679–684.
  • Science Communication Unit (2013) Bristol Science Communication Unit, University of the West of England. 2013. Science for Environment Policy Indepth Report: Environmental Citizen Science. Report produced for the European Commission DG Environment (December 2013).
  • Shaikhina et al. (2017) Torgyn Shaikhina, Dave Lowe, Sunil Daga, David Briggs, Robert Higgins, and Natasha Khovanova. 2017. Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomedical Signal Processing and Control (2017).
  • Shepherd (2004) Gordon M Shepherd. 2004. The human sense of smell: are we better than we think? PLoS biology 2, 5 (2004), e146.
  • Silvertown (2009) Jonathan Silvertown. 2009. A new dawn for citizen science. Trends in Ecology & Evolution 24, 9 (2009), 467 – 471.
  • Stilgoe (2009) Jack Stilgoe. 2009. Citizen Scientists: reconnecting science with civil society. Demos London.
  • Stilgoe et al. (2014) Jack Stilgoe, Simon J. Lock, and James Wilsdon. 2014. Why should we promote public engagement with science? Public Understanding of Science 23, 1 (2014), 4–15. arXiv:
  • Stingone et al. (2017) Jeanette A Stingone, Om P Pandey, Luz Claudio, and Gaurav Pandey. 2017. Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among us children. Environmental Pollution 230 (2017), 730–740.
  • Sullivan et al. (2014) Brian L. Sullivan, Jocelyn L. Aycrigg, Jessie H. Barry, Rick E. Bonney, Nicholas Bruns, Caren B. Cooper, Theo Damoulas, André A. Dhondt, Tom Dietterich, Andrew Farnsworth, Daniel Fink, John W. Fitzpatrick, Thomas Fredericks, Jeff Gerbracht, Carla Gomes, Wesley M. Hochachka, Marshall J. Iliff, Carl Lagoze, Frank A. La Sorte, Matthew Merrifield, Will Morris, Tina B. Phillips, Mark Reynolds, Amanda D. Rodewald, Kenneth V. Rosenberg, Nancy M. Trautmann, Andrea Wiggins, David W. Winkler, Weng-Keen Wong, Christopher L. Wood, Jun Yu, and Steve Kelling. 2014. The eBird enterprise: An integrated approach to development and application of citizen science. Biological Conservation 169 (2014), 31 – 40.
  • Sullivan et al. (2009) Brian L. Sullivan, Christopher L. Wood, Marshall J. Iliff, Rick E. Bonney, Daniel Fink, and Steve Kelling. 2009. eBird: A citizen-based bird observation network in the biological sciences. Biological Conservation 142, 10 (2009), 2282 – 2292.
  • Taylor and Nourbakhsh (2015) MD Taylor and IR Nourbakhsh. 2015. A low-cost particle counter and signal processing method for indoor air pollution. Air Pollution XXIII 198 (2015), 337.
  • Taylor (2016) Michael D. Taylor. 2016. Calibration and Characterization of Low-Cost Fine Particulate Monitors and their Effect on Individual Empowerment. dissertation. Carnegie Mellon University.
  • Tian et al. (2016) Rundong Tian, Christine Dierk, Christopher Myers, and Eric Paulos. 2016. MyPart: Personal, Portable, Accurate, Airborne Particle Counting. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 1338–1348.
  • WHO (2016) WHO. 2016. Ambient (outdoor) air quality and health.
  • Wilsdon et al. (2005) James Wilsdon, Jack Stilgoe, and Brian Wynne. 2005. The public value of science: or how to ensure that science really matters. Demos London.
  • Zheng et al. (2015) Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li. 2015. Forecasting Fine-Grained Air Quality Based on Big Data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, USA, 2267–2276.
  • Zimmerman et al. (2011) John Zimmerman, Anthony Tomasic, Charles Garrod, Daisy Yoo, Chaya Hiruncharoenvate, Rafae Aziz, Nikhil Ravi Thiruvengadam, Yun Huang, and Aaron Steinfeld. 2011. Field trial of tiramisu: crowd-sourcing bus arrival times to spur co-design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1677–1686.