Understanding influence of the environmental conditions on human perception is complex. Various environmental features e.g., sound level, temperature, and illuminance affect our senses. Therefore, we adopted enhanced measurement and analysis techniques to define and measure what influences citizens in dynamic urban environments. The environmental features measured in this research include sound level, dust, temperature, humidity, illuminance and the field-of-view since they influence a person’s sense that, in this research, was represented by the physiological state of a person, which was measured through electro-dermal activity (EDA). With the advent of technology, researchers explore the utility of sensor-based physiological data in real-world scenarios. Thus, researchers now have a means to explore how environmental features can affect individuals’ physiological response-based perceptual quality and overall experience . How to capture and define such a perceptual quality is an ongoing research topic in Cognitive Science and Behavioral Science [21, 36].
This research presents a controlled study, conducted in Zürich, Switzerland, to acquire data on humans physiological responses and environmental conditions. In the study, 30 participants were asked to walk through an urban environment, while equipped with wearable sensor devices . The study was designed to address the following research questions:
Can we predict the physiological responses of participants based on particular environmental conditions?
Can we infer the relationship between the physiological responses and the environmental conditions?
What are the most significant environmental features affecting the participants’ physiological responses?
What are the patterns in the environmental conditions, for which the participants exhibit aroused and normal physiological responses?
The features of the data were recorded through devices and sensors at varying frequencies, which had both temporal and spatial properties. The features had a temporal property due to continuous recording, and the features had spatial characteristics because the recording’s association with the change in locations–global positioning system (GPS). Hence, in this research, we proposed a framework that perform signal preprocessing, signal filtering, signal quantifications, data fusion, and data labeling to answer the defined research questions.
Machine learning based techniques have been successfully applied for knowledge mining and pattern recognition in various real-world situations[32, 39] since they are useful in identifying the underlying patterns within data [1, 25]. Thus, we formulated the processed data such that four state-of-the-art machine learning techniques, classification, fuzzy rule-based inference, feature selection, and clustering, were applied for discovering patterns in the participants’ physiological responses related to the urban environmental conditions.
The first step in this research was to assess the predictability of participants’ perception (physiological responses) of the urban environment. Thus, a ten-fold cross-validation was performed on a reduced error-pruning tree (REP-Tree) classification model . Following the classification approach, a fuzzy rule-based learning inferential model was built, using fuzzy unordered rule induction algorithm (FURIA) , to investigate the relationship between the urban environmental features and the physiological response measures. Subsequently, the importance of various urban environmental features was analyzed by applying backward linear feature elimination filter (BFE) 
. Furthermore, self-organizing map (SOM) was applied to visualize the impact of urban environment features on participants’ physiological responses. In the final step, a method for referencing GPS location (geo-location) to compute mean physiological response across all participants was developed. Since various methods were involved in data processing, additional graphics and multimedia can be found on the project website .
In summary, following are three essential contributions of this research:
a field study design to understanding human perception of the urban environment;
a framework design comprising signal processing, signal quantification, and data fusion methods that invokes a novel of approach in physiological data quantification;
a comprehensive analysis using four machine learning methods to discover the patterns which are crucial to our understanding of human perception in urban settings.
We organized this paper into seven Sections. Section 2 places this research in the context of literature and describes the experimental procedure. Section 3 describes signal preprocessing, multi-sensor information fusion, and machine learning techniques in detail. Section 4 is devoted to explaining the obtained results followed by a comprehensive discussion in Section 5. The challenges and opportunity of the research are presented in Section 6, and Section 7 concludes the findings of this research.
2 Human perception of the urban environment
2.1 Literature review
The process of measuring physiological data as an indicator of human perception is complex, particularly in real-world application since perception can be influenced by various factors . However, physiological pattern recognition can derive significant evidence about human perception . Similar to our research, Picard et al.  focused on physiological sensor data, specifically skin conductance, and they related high and low arousals as positive and negative biological reactions. Also, Picard et al. 
focused on the collection and filtering of the physiological data to construct good quality data void of failure and corrupt signals. They formulated physiological data so that a k-nearest-neighbor classifier can predict human’s physiological arousal-based perception. Krause et al.[19, 20], on the other hand, used wearable device data, including physiology based sensor data (galvanic skin response), to identify user’s state in terms of physiological and activity context using SOM
based clustering. Specifically, they performed unsupervised learning to classify sensor data to determine the context from which the signals were generated.
In Wang et al. 
, pattern recognition and classification of physiological sensor signals were performed by first decomposing signals into its constituent features and by applying support vector machine to classify negative and positive emotion labels. Here, the label associated with the signals were predefined during the experiment by exposing the participant to negative and positive environments during the recording of signals. Rani et al.
performed an empirical study of four machine learning techniques: k-nearest neighbor, regression tree, Bayesian network and support vector machine for the recognition of the emotional state from physiological response data. They performed signal processing to evaluate features from the physiological data and labeled them with the emotional state reported by the participants.
Since we investigate “cause and effect” between the environmental conditions and the human’s perception, unlike Wang et al.  and Rani et al. , we performed signal processing on the physiological data to evaluate skin conductance response (SCR) arousals . Subsequently, we assigned labels to signal fragments based on the degree of arousal within a specified time. While doing this, we considered physiological data as the output in the classification model and the signals from the environment as the inputs. Whereas, Wang et al.  and Rani et al.  considered features of the processed data as the inputs and the reported environment as the output. Our approach, to first determine arousal level was adopted because of the complexities of the urban environment and because we cannot accurately consider an urban environment to be positive or negative towards the perceptual quality of a participant. Thus, we labeled environmental conditions as the positive and negative by considering physiological data as the target in the classifier’s training.
Ragot et al.  found that the physiological response signals from the Empatica E4 wearable device were closely comparable to laboratory-based measurement devices. They also found that the data from such wearable devices could be used to train a support-vector-machine classifier to recognize the participants’ emotional state. Similarly, Poh et al.  confirmed that EDA data from wearable devices is comparable to laboratory devices and the data are a valid physiological measure. Hence, was our approach in this study to employ Empatica E4 to perform physiological measure.
2.2 Study design and measurements
We designed a study to understand the general pattern(s) of human perception related to events which occur in a dynamic urban environment. An event indicates the change in the environmental condition, and also, a sample of the measured environmental data. As a case study, we selected a neighborhood in Zürich, Switzerland (Fig. 1), and invited participants to take a leisure walk on a predetermined path (Fig. 1). The participants were equipped with a “sensor backpack ” and an Empatica E4 wearable device . The walking path was carefully selected , which covered a diverse urban scenario , e.g., spacious and narrow streets, green and urban areas, and loud and quieter locations.
Our sensor kit  measured the changes in sound level (decibel, ), the amount of dust (), temperature (C), relative humidity (%), and illuminance (). We also calculated field-of-view based on the GPS information and spatial configuration of the neighborhood. The field-of-view is formerly described as the Isovist descriptor, which refers to the open space a person can view from a single vantage point . Since participants were walking in a forward direction, we considered field-of-view with a distance of . Subsequently, the Isovist descriptor for each participants’ walk was measured by drawing a polygon around the participants’ field-of-view at their specific GPS locations. From this, the following measures of the Isovist polygons were calculated: Area–polygon’s surface area; Perimeter–polygon’s perimeter length; Compactness–the ratio of area to the perimeter (relative to an ideal circle); and Occlusivity–the length of occluding edges.
The EDA measures the individuals’ physiological state , which was recorded using Empatica E4 wearable device, similar to studies by [11, 12, 13]. We placed the wearable device on participants’ non-dominant hand and let it adjust for 10 minutes according to Empatica guidelines . The data were recorded on the Empatica website and corrected for motion artifact . The EDA measure (physiological response) was a time-series signal and has temporal dependencies. The sensor backpack, on the other hand, was designed to capture the contextual-based events that occur in an urban environment. In the context of this study, an event is non-temporal since an event is dependent on the instance of its observation. Therefore, the continuous signals recorded for environmental features and the continuous signals recorded for participants’ physiological responses were quantified in two different manners (Section 3.2). Moreover, since the recorded signals were associated with the geographical location, they also had spatial properties. The primary infrastructure of the urban environment and season (April 2016) were uniform. However, inherent diversity occurred from different experiment days, time-of-day, and participants demographic background. The data for both environment measures and corresponding participants’ physiological response measures are summarized in Table 1.
|Data||Type||Features (sensors)||Frequency (Hz)|
|Urban environment (indicate the changes in the urban environmental condition during a participants’ walk)||Spatial (in the context of this study)||GPS position (Latitude and Longitude)||1.0|
|Sound level ()||0.4|
|Dust in air ()||0.4|
|Environmental temperature (C)||1.0|
|Relative humidity (%)||1.0|
|Participants field-of-view (computed based on GPS position): Area, Perimeter, Compactness, Occlusivity||- -|
|Human perception (participants’ physiological response)||Spatial-temporal||Electro-dermal activity (EDA)||4.0|
A comprehensive signal processing and data-preprocessing framework was proposed in order to apply select machine learning methods. Fig. 2 illustrates the framework and describes how it was used for information fusion and knowledge mining approaches. Here, and indicate -th quantified event (a sample in the quantified environmental data) and response (a sample in the quantified physiological response data) respectively. The variable for indicates the total number of samples belonging to the -th participant . The information, therefore, was fused in three stages:
Each participants’ event-based data () are collected from five sensors, which were re-sampled to a unique frequency and samples were aligned as per with on their time (Fig. 2, mark “A”).
The environment and response data from each participant were independently cleaned, filtered, and quantified. Each participants’ quantified event and response data were fused (paired) by assigning a quantified response to event (Fig. 2, mark “B”).
The paired participants’ data were then stacked (Fig. 2, mark “C”).
The three-stage information fusion approach produced the compiled dataset, which was fed to select machine learning techniques. For each machine learning technique, the compiled dataset (Fig. 2, mark “C”) was arranged and configured as per the techniques’ requirements and objectives.
3.1 Signal processing
3.1.1 Frequency unification
The environmental features sound and dust were collected at 0.4 Hz frequency; while GPS position, temperature, humidity, and illuminance were collected at 1 Hz frequency (Table 1
). Therefore, an up-sampling mechanism with a linear interpolation was applied to sound and dust data to unify the frequencies of the gathered data. All features were then aligned to the same timestamp, which was crucial to ensure that all sensor values belong to an exact event during the study.
3.1.2 Signal filtering and smoothing
The physiological response data (EDA signals) were kept at their original 4Hz frequency to maintain the information required for arousal detection from the physiological data. With close inspection, we found that some participants EDA signals were unusable and were discarded. The remaining (accepted) EDA signals were first smoothed and then filtered to remove artifacts as recommended in EDA literature [6, 8].
Physiological data selection
The EDA signals from 30 participants were analyzed by comparing the various “profiles.” The EDA signals from four types of uncorrupt EDA profiles shown in Figs. 3, 3, 3, and 3 were considered for the data analysis. The EDA signals belonging to the two erroneous EDA profile types illustrated in Figs. 3 and 3 were discarded. In total 10 EDA signals were discarded. The erroneous EDA signal types were classified as:
Type-1 error, when EDA signal values only fluctuate between two values, i.e., the EDA signal behaved like a step function, and the signal may also contain a significant amount of sensor loss (no sensor response record).
Type-2 error, when the majority of the sample values were zero (significant sensor response loss), despite the otherwise normal fluctuations (correct sensor response) in EDA signal.
Stationary Wavelet Transformation based smoothing
After selecting EDA signals, they were smoothed by undergoing a Stationary Wavelet Transformation (SWT) and reverse SWT. Authors in  suggested an adaptive method for SWT-based smoothing for EDA signals recorded for long periods (30 hours). In our study, EDA signals were recorded for 25–29 minutes. Therefore, we applied a one-level SWT and reverse-SWT for smoothing. Each EDA signal was transformed using “Haar” as a mother wavelet in the SWT . A one-level SWT transformation was performed on each signal; and on the obtained wavelet coefficients, a threshold of value was applied to eliminate larger fluctuation in the signal. That is, the values of wavelet coefficients above and below were cut off (Fig. 4). Finally, a reverse SWT was applied to the transformed signal to produce a smoothed signal (Fig. 4).
Truncation of the unwanted signal fragments
SWT based treatment to the EDA signals eliminated the large fluctuations from the signal. However, some sharp drops in signal (corrupt fragment) caused by artifact were not filtered out completely. Thus, the corrupt fragments and participants’ waiting time fragments of EDA signal were truncated from both original (raw) and smooth EDA signals. Fig. 4 is an example of such truncation. This process produced two EDA signals: original (original signal with filtering only) and smooth (original signal with both smoothing and filtering).
3.2 Signal quantification and labeling
Signal quantification involved three steps: time-window marking, arousal detection, and data labeling. In fact, these are the critical steps in the fusion of the environmental data and the physiological response data. As shown in Fig. 2, at first, physiological data were quantified, and then, the timestamp information was passed to the environmental data for its quantification.
3.2.1 Time window marking
Each EDA signal’s timestamp information was compared with the timestamps recorded at various stages during a participants’ walk. Based on signal filtering shown in Fig. 4 and available timestamp information, the signal fragment belonged to the walking duration—indicated by Start and End in Fig. 5—were marked with a regular interval of time-window size seconds. Such a time-window marking was crucial to our data analysis to observe participants physiological states in relation to their experience of the events occurring at a regular interval of seconds (Fig. 5).
Therefore, for each time-window, event for to experienced by participant is a vector of the environmental features and was computed by averaging the values of signal fragment (environmental measurement) at the -th corresponding time-window. On the other hand, the participants physiological response for to upon experiencing event was computed by an arousal detection method described in Section 3.2.2. Additionally, the participants’ field-of-view (Isovist descriptors: area, perimeter, occlusivity, and compactness) were computed at the start of each time-window. Thus, participant quantified data had an identically independent vector of environmental conditions (event ) and a corresponding physiological state (response ) for each time-window.
3.2.2 Arousal detection (EDA)
The level of arousal in an EDA signal depends on identifying a specific signature (pattern) called skin conductance response (SCR) or arousal [3, 6, 9, 33, 35]. The state of arousal in an EDA signal is typically defined as a peak having a specific signature . We processed the EDA signals using a skin conductance processing tool Ledalab . Ledalab offers a continuous decomposition analysis (CDA) method for analyzing an EDA signal. In CDA, an EDA signal is decomposed into tonic skin conductance level (SCL) and phasic drivers SCR.
We performed CDA on each EDA signal data—of each participant—by using the recommended settings in Ledalab . That is, the signal’s optimization procedure was performed two times, which automatically determined the optimization parameters for evaluating the number of significant SCR (nSCR) above a defined threshold 0.01Siemens within a time-window. We used nSCR, because we could not, in a theory-driven manner, define what stimulus (event) caused a change in participants “physiological arousal state.” Thus, we relied on a data-driven approach by analyzing phasic SCR, a non-specific fast changing EDA measure; i.e., the number of peaks in phasic skin conductance response measures nSCR to any kind of event for the given time-window. Therefore, the nSCR gave us the measures of shown in Fig. 5.
3.2.3 Data labeling
When aggregating all participants data (Fig. 2, mark “C”), we observe that nSCR value for a time-window vary from 0 to 12. An nSCR value 0 indicate that, in a time-window, a participant had a normal physiological condition. On the other hand, an nSCR value greater than 0 for a time-window indicates that a participant experienced a state of arousal at least once in that time-window. Thus, for the labeling of each time-window—of each participant data—a binary-class label indicating a binary state of phasic nSCR can be used, where
class 0 is “normal” physiological response (“N”), i.e., an nSCR value equal to 0; and
class 1 is “aroused” physiological response (“A”), i.e., an nSCR value greater than to 0.
A multi-class classification was also used, in which case, aroused physiological response, “A” has two categories: class “LA” indicating low arousal response, i.e., and class “HA” indicating high arousal response, i.e., . A total of 6,057 samples and 9 input features were available in the compiled dataset for a time-window size (quantification rate) of 5-seconds. In the compiled data, 3,491 samples belonged to the category “N” and 2,566 samples belonged to the category “A,” i.e., approximately 60% and 40% of the samples respectively belong to “N” and “A.” Furthermore, in the multiclass classification, 2,079 samples were labeled “LA” and 487 samples were labeled “HA.”
3.3 Machine learning methods
3.3.1 Non-inferential modeling
We build a predictive model consisting of the environmental features as the inputs, and binary (and
multiclass) quantified arousal level as the output using REP-Tree, which is a decision tree learner. In a decision tree, a tree-like predictive model is built, where the leaves represent the target (e.g., the class labels: “N” or “A”) and the branches represent an observation for a feature (e.g., sound level) at a node. REP-Tree is a method applied to reduce the size of a decision tree, where it keeps pruning subtrees by replacing it with a leaf (a class label) as long as the error does not increase (i.e., the accuracy of the model does not decrease).
We chose REP-Tree to build a predictive model because the algorithm constructs a decision tree, where each node makes a decision for a feature, and its specific value produces a particular class label. While making a predictive model, REP-Tree chooses the most significant features based on their contribution to the model’s accuracy, which is advantageous for this problem since it is uncertain which environmental features influence physiological responses. For the validation of the model’s predictive performance, we chose ten-fold cross-validation (10-fold CV). Section 4 describes the test accuracies of 10-fold CV based REP-Tree training.
3.3.2 Inferential modeling
Contrary to non-inferential modeling, inferential modeling explains the relationships between the input features and the output feature. A fuzzy rule-based inference system is capable of describing how independent environmental features are related to the dependent physiological response (phasic nSCR) feature. For this, we applied FURIA, which is a fuzzy rule-based classifier .
Unlike conventional rule-based classifiers, FURIA gives a fuzzy rule . FURIA produces fuzzy rules with operators , , and ; the operators define clear conditions for a feature’s association with a class label (e.g., “N” or “A”). FURIA also provides a range (e.g., ) indicating fuzziness in feature’s condition, which may be considered as a soft boundary while associating a feature with a class label . This ability was particularly useful in this study since we wanted to observe the specific values range of the environmental features that corresponded to a participants’ state of arousal. For instance, we needed to determine for which particular sound level range, a participant experienced a state of arousal. Since FURIA fulfills this requirement, it was selected as the technique for inferential analysis. Interpretation of the obtained rules is described in Section 4.
3.3.3 Feature selection
Feature selection is a process to determine the ability of each input feature to predict the output. Moreover, feature selection involves making a model using a subset of features and testing its predictive accuracy. We applied backward feature elimination (BFE) method in this research for its ability to examine all possible combinations of feature subsets . BFE starts with all features in a set (in this case, it begins with 9 features) to build and test the model. Subsequently, BFE iteratively eliminates features one-by-one while propagating high accuracy feature subsets to the next iteration. Finally, BEF gives a list of subsets with their corresponding accuracies, from which a subset can be selected depending on the accuracy or the number of features required. In addition to REP-Tree, MLP  and SVM  were used for a more comprehensive analysis in BFE. Therefore, the feature selection result was an assessment of three different predictors. During the feature selection, at each iteration, BFE used 60% randomly selected samples for training and the rest 40% samples to test the model.
3.3.4 Pattern discovery
In general, the primary aim of self-organizing map (SOM) is to map
-dimensional data onto a 2-dimensional (2D) plane. The 2D plane of SOM consists of a network of neurons (nodes). The network’s nodes acquire the underlying property of the input data samples (e.g., events in the environmental data). Moreover, a SOM projects similar data samples to a cluster center (a node in a SOM) as per the similarity (Euclidean distance) of the data sample to the node[18, 37].
SOM is an appropriate choice for this problem since it is tedious to define the number of clusters, especially when problems have complex relations between the features. SOM produced clusters automatically (see Section 4.4). Additionally, to analyze pattern related to the geo-locations, geo-locations referenced mean physiological response across all participants was computed by matching GPS location information (: latitude, : longitude) and aggregating the samples. Geo-location referenced mean physiological responses were computed to visually understand patterns in participants’ physiological responses related to the actual map of the neighborhood, described in Section 4.4.
4.1 Sensitivity analysis (non-inferential modeling)
First, a classifier (REP-Tree described in Section 3.3.1) was trained and tested on the five “time-resolved” datasets corresponding to five quantification rates 25, 20, 15, 10, and 5 seconds, whose outputs were labeled as the binary class: normal physiological response, “N” and aroused physiological response, “A.” The parameter settings used to train the REP-Tree models is in Table A.1. The performances of the trained REP-Tree models are shown on a receiver operating characteristic (ROC) curve plot  in Fig. 6.
The model’s performance improved as the quantification rates decreased (Fig. 6). The model’s high predictability for smaller quantification rates is an indicator of the participants’ strong sensitivity towards the changes in the urban environment. The model’s performance for smoothed EDA data (red square) was better than the model’s performance for raw EDA signal (circles). Thus, the smooth EDA data more accurately draw the association between a change in environmental features and participants’ physiological states of arousal.
The results of the 10-fold CV training of the RET-Tree classifier for both binary and multiclass classification for the dataset where smooth EDA data were quantified at 5-second time-window as shown in Table 2. The classifier’s predictive accuracy was found to be 87% for the binary-class classification and 80% for the multiclass classification.
|Binary class model||N||3105||405||2162||396||0.89||0.88||0.89||0.84||87%|
|Note: For binary class, normal physiological response, “N” indicates nSCR = 0 and aroused physiological response, “A” indicates nSCR 0. For multiclass, “N” indicates nSCR = 0; “LA” indicates a low arousal response, i.e., 0 nSCR 6, “HA” indicates a high arousal response, i.e., nSCR 6. The variables TP, FP, TN, and FN indicate true positive, false positive, true negative, and false negative, respectively .|
4.2 Sensitivity range analysis (inferential modeling)
The non-inferential model indicates that the participants’ physiological responses are sensitive to the environmental changes. Therefore, we build an inferential model to understand how environmental features influence participants’ physiological responses. A fuzzy rule-based inferential model was built using FURIA whose parameter settings are mentioned in Table A.1. We adopted a binary-class classification of nSCR, where nSCRs were categorized into two classes: normal physiological response, “N” and aroused physiological response, “A.” The FURIA algorithm offered an average test accuracy of 70.23% after a 10-fold CV training. Such accuracy is notably high for the complex problem of understanding the humans’ perception of their urban environmental conditions.
We analyzed the set of fuzzy rules generated by FURIA by segregating the rules between the participants’ “N” and “A.” Fig. 7 is a visual interpretation of the obtained fuzzy rules for both classes “N” and “A.” We interpreted and represented the FURIA rules in Fig. 7 to find the values (range of values) of the environmental features that
were linked to class “A,” which indicates participants’ aroused physiological state;
did not significantly influence the participants’ aroused physiological state.
To validate the knowledge obtained from the visual interpretation of fuzzy rules, distributions of the environmental features were examined through histograms in Figs. 7, 7, 7, 7, 7, and 7. The visual interpretation and summarization of the rules for sound level in Fig. 7 and its corresponding distribution in Fig. 7 indicate that the participants normal physiological responses match a particular sound level distribution. For example, the sound level distribution around to (Fig. 7) correspond normal physiological state (Fig. 7). Furthermore, the participants had a tendency to exhibit aroused physiological state when experienced sound level above . This result indicates that loud sound levels correspond to increased participant arousal.
The result was similar for temperature, where temperature degrees greater than 21–22 C were associated with aroused physiological state (Fig. 7). However, it can be observed that the samples in the dataset for temperatures above 22 C were fewer than for the temperature degrees below 22 C (Fig. 7), which we could take as confidence that heat alone did not cause the physiological arousal of participants. In (Fig. 7), the participants exhibited physiological arousal for darker locations (illuminance level below ).
4.3 Simultaneous impact of environmental features
Inference modeling provided the values for environmental features that were responsible for normal and aroused physiological states. However, it is also essential to discover which of the environmental feature(s) have the strongest influence on the participants’ physiological responses. Hence, we constructed a backward linear filter elimination (BFE) based feature selection framework and analyzed the obtained results to build a significance hierarchy of feature subsets (Fig. 8
). A feature subset’s significance was estimated on its ability to predict “N” and “A” classes with high accuracy.
Fig. 8 is a significance hierarchy triangle of the feature subsets, where a subset’s predictability reduces when the number of features in the subset decreases. Three predictors provided three feature selection result sets. Fig. 8 is the compilation of the three result sets from all three predictors. The MLP, REP-Tree, and SVM agreed on the feature subset temperature, humidity, illuminance, and Isovist area, where the REP-Tree had the highest accuracy, followed by SVM and MLP. Therefore, temperature, humidity, illuminance, and Isovist area, were noted as the most significant feature set but is a matter of trade-off between accuracy and number of features as indicated in hierarchy triangle (Fig. 8).
4.4 Patterns of perceptual variations
The predictive modeling confirmed the sensitivity of participants’ physiological responses towards dynamic environmental conditions. The fuzzy rule-based analysis described the relationship between the environmental features and the physiological response. Feature selection indicated the most significant environmental features. However, pattern discovery explains:
which participants were experiencing a similar environmental conditions and what were their response;
whether the participants’ physiological responses for certain environmental conditions were similar;
the patterns of the environmental features that influence the participants physiological arousal.
The compiled data (see Fig. 2) were analyzed using SOM. Fig. 9 is a result of automatic clustering from a trained SOM, where the 9-dimensional input data were mapped onto the dimension 2D plane consisting of hexagonal nodes. Each node in the map acquired the property of a set of samples. Fig. 9 shows the maps of the environmental features on feature matrices (F-matrices). On a feature matrix (F-matrix) of an environmental feature (e.g., sound level), the features’ value assigned to F- matrix nodes are corresponding to the nodes on the SOM’s unified distance matrix (U-matrix) in Fig. 9 and Label matrix (L-matrix) in Fig. 9. Hence, the position and value of the nodes in all the maps (matrices) in Fig. 9 are comparable to each other. More specifically, the U-matrix is the result of the F-matrices of the environmental features, and the L-matrix is the corresponding dominant label associated with the nodes. Therefore, to make sense of the pattern, we need to compare all matrices with one another.
Trained SOM results; node value in the maps are indicated by color: lowest value is shown in dark blue, and the highest value is shown in bright yellow. (a) U-matrix: SOM clustering map. (b) F-matrix: maps for environmental features, which were linearly scaled with a variance of 1.0 so that they have equal importance in clustering. (c) L-matrix: participant ID and participants’ physiological response state label (“N” and “A”) map.
The U-matrix in Fig. 9 shows the clusters of similar data points. The nodes with small differences (in terms of Euclidean distance) are shown in dark blue, and the nodes with high differences and are shown in bright yellow. In addition, the patches of nodes with similar colors, separated by lighter colors, indicate the clusters of data samples. Moreover, the data samples corresponding to a cluster in the U-matrix share a commonality, and dissimilar data samples are further apart. It is therefore implied that the participants’ ID label belonging to a cluster experienced similar environmental conditions.
Fig. 9 is an L-matrix with each node was labeled with participant ID and the state of physiological response. White nodes indicate a normal physiological response and blue nodes indicate the aroused physiological response. By comparing these matrices, one can discover relevant patterns in the organization of the dataset. This could carefully be interpreted as a “cause” (Fig. 9) and “effect” (consult with Fig. 9 and Fig. 9) of the dynamic and simultaneous environmental features with the participants’ physiological responses.
On the U-matrix (Fig. 9) a bright yellow patch separates itself from all the other nodes clusters. This distinctly available yellow spot is the result of a high concentration of a set similar input samples, which in this case, is due to the concentration high illuminance values as evident from F-matrix for illuminance (Fig. 9). Fig. 9 shows that at the exact same spot, participants’ had aroused physiological state (most of the nodes are colored blue) and nodes were labeled with participants ID’s (8, 13, 23, and 29) indicating that all the participants exposed to extremely high illuminance also experienced an equal aroused physiologically state.
Additionally, three other clusters of dark blue exist on the U-Matrix in Fig. 9: one at the bottom-left, one at the top-left and one at the top-right above the yellow patch. Investigating the F-matrices in Fig. 9, we can find that the clusters at the bottom-left and the top-left in Fig. 9 are the results of high values of sound and temperature and extremely low values of illuminance. These clusters, when compared to L-matrix in Fig. 9, indicate that the majority of participants responded with an aroused physiological state. Similarly, the cluster on the top-right is due to a combination of low values of dust and temperature. The corresponding L-matrix in Fig. 9 has the majority of nodes indicating a normal physiologically state. Further, the F-matrix for Isovist area in Fig. 9 shows that the high value of Isovist area resulted in an aroused physiological state, also evident from the L-matrix in Fig. 9. L-matrix also indicates that participant IDs 16, 23, 24, 29, 32, and 35 experienced such a high Isovist area and responded with a similar physiological state.
In pattern analysis, the mean physiological response across all participants was mapped onto the geographic location along the path. The geo-location referenced mean physiological response was computed and normalized between 0 and 1. The geo-location referenced physiological responses highlighted specific locations on the neighborhood’s map where participants experienced aroused physiological state (Fig. 10). The locations, where on average all participants exhibited high physiological arousal response are indicated in red while low physiological arousal is indicated by yellow. Varying size of dots on the map in Fig. 10 is proportional to the degree of participants’ physiological arousal.
Through this research, we extracted patterns from the data gathered during a controlled study, where we asked participants to walk through an urban environment (Section 2.2). Our data analysis methods had the following dimensions: signal processing, multi-sensor information fusion, and knowledge mining using machine learning techniques. The sensor frequency unification and quantification led to the preparation of identically independent data samples of events and corresponding physiological response. During the data processing phase, we categorized physiological response data (EDA signals) into clean and erroneous signals (Section 3.1). EDA signal recording is susceptible to artifacts and the suggested definition identifies an erroneous EDA signal. Finally, the quantification method segmented the continuous temporal data into regular time intervals of -seconds ( time-window size) and the quantification rate of 5-seconds was most efficient (Section 4.1).
We applied both supervised and unsupervised machine learning techniques. This included testing REP-Trees’ predictive accuracy in determining the models’ sensitivity towards five different quantification rates: 5, 10, 15, 20, and 25 seconds. The predictive model at seconds had the highest accuracy (Fig. 6). The high accuracy of the REP-Tree model indicates its predictive ability of the participants’ normal and aroused physiological responses state for a given set of environmental condition consisting of sound level, dust, temperature, humidity, illuminance, and Isovist descriptors.
The inference modeling, in addition, produced exact values of the environmental features and their influence on participants physiological response state (Section 4.2). Also, the environment features with the largest range of values in the dataset (highest distribution) were directly linked to normal physiological responses (Fig. 7). In other words, the participants showed a “habitual effect,” and they tend to respond differently to a change in the environmental condition from the previous one (Section 4.2). Such a generalization of fuzzy rules across all participants is limited because of the availability of fewer data samples and the variations in cities’ architectures. However, it is necessary to mention that all participants engaged in the study on different days and different time-of-day and we observed a high accuracy in the model’s predictability despite being applied to such a complex and diverse dataset. For example, a fuzzy rule (Fig. 7) indicates that a participants’ arousal levels correspond to extremely low illuminance, or high temperature, or a large Isovist area. Specifically, change in physiological arousal was observed for a small to a large Isovist area, i.e., an entry to a crossroad and passing from a narrow to a wider street (Fig. 10).
It was difficult to identify the features with the highest influence on the physiological response from the inference modeling. Therefore, a backward feature elimination method with three predictors (Section 3.3.3) helped determine the most significant environmental feature(s) and is presented in a significance hierarchy triangle (Fig. 8). The feature selection process, however, had its trade-off; when reducing the number of features from the feature set, it also decreases the accuracy of the predictors. After a thorough inspection of Fig. 8, the predictors suggest the temperature, humidity, illuminance and the Isovist area as the most significant features set compared to a set of any other features combination (Section 4.3).
SOM was employed for automatic clustering to discover patterns in the dataset (Section 4.4). The participants with similar environmental conditions were expected to have a similar perception (physiological arousal state) and expected to fall into the same cluster or node on the map. For example, a cluster formed due to extremely high illuminance and another for low illuminance conditions (Fig. 9). This indicates that a particular environmental condition influences most of the participants equally and the majority of participants responded a similar physiological response state when experiencing similar conditions. Furthermore, because the participants walked at different speeds, the number of quantified events corresponding to each participant slightly varied. Therefore, the geo-location referenced normalized mean of the events was the best method to show the geolocation of the participants’ average physiological responses on the map (Fig. 10). This map can be used to visually inspect the impact of urban features, such as street-width, street-type, traffic, type of area (residential and industrial) and their potential impact on the participants’ physiological response.
6 Challenges and opportunities
The methods developed for this investigation help reveal patterns from complex human-environment interactions. The analysis predominantly focused on improved quantification methods for physiological arousal level detection and a means to correlate arousal level with environmental stimuli. This approach allows us to observe an increase in physiological arousal in response to specific environmental conditions (Section 5). The primary challenge of this study was the process of selecting the appropriate tuning parameters to quantify and evaluate the arousal label. For example, the accuracy of the methods (Fig. 6) varied depending upon the quantification rate. Similarly, the accuracy of the method depends on the procedure and threshold adopted for the nSCRs level detection . Moreover, we captured 9 features of a real-world dynamics situation. Hence, increased number of features may further improve the predictive model’s accuracy.
Future studies can utilize the presented experimental design and quantification methodology. For instance, it can be extended to capture citizen’s public transport commuting experience (physiological response while walking, waiting, and riding), and for traffic safety, the method can be potentially applied to understand the physiological arousal pattern of vehicle riders while they ride through cities [10, 34]. Moreover, the developed predictive model can be used to extrapolate the potential citizen’s arousal levels to a larger geographic area when combined with the isovist values and measured environmental data beyond the selected path.
In this research, we recognized factors influencing humans perception. Whereas to meet the refereed challenges, our findings suggest that further employing virtual reality set-up could help reducing noise that may be induced by unknown factors. Additionally, our findings suggest that a subjective thresholding skin conductance can also be employed to mitigate the challenges.
Moreover, in the field of urban studies, it is crucial to understand how the built environment influences human behavior and perception. This question has been central to the practice and research ever since and poses a fundamental methodological problem since it is especially difficult to a) objectively measure perception and b) deal with the multitude dynamic environmental factors preventing to identify the effect of pure urban form on human perception. As an answer to this problem, this research provides a major contribution by presenting and empirically testing a novel research framework for predicting and inferring the effects of planning decisions on human perception. In essence, the framework provides insides into How, and Why do architecture and urban design influence human perception which is particularly helpful for evaluating planning proposals and to guide the design decisions. For this purpose, we adopt the state of the art mobile sensing technologies as well as machine learning methods which are specifically chosen and adapted for needs of architecture and urban design research.
This research presented a specific methodology to evaluate a complex dataset from an experiment with physiological responses of 30 participants linked to environmental conditions. The measurements in the dataset came from seven sensors with differing frequencies and four additional geometric features. The proposed data quantification and multi-sensor information fusion methods linked participants’ physiological state of arousal to environmental conditions. Four categories of machine learning techniques (non-inferential modeling, inferential modeling, feature selection, and clustering) revealed patterns in the dataset: The high accuracy of the non-inferential predictive model was an evidence of the participants’ physiological state sensitive to the changes in environmental conditions. The fuzzy rule-based inferential modeling results indicate that the occurrence of “normal” and “aroused” physiological conditions corresponds to specific values (and range of values) for each environment feature. It suggested that the changes in the participant physiological arousal state primarily occurred due to the fluctuations in the environmental conditions. Feature selection showed that some environmental features, such as temperature, humidity, illuminance, and the-filed-of-view were more dominant in their influence on participants’ physiological response than sound level and dust. Pattern analysis from self-organizing map indicated that, primarily, the participants who experience similar environmental conditions responded in similar physiological arousal state. Finally, the geo-location referencing of average physiological response across all participants produced a means to visually inspect how participants respond during the actual walk in relation to permanent urban features. The proposed data analysis framework revealed patterns from the complex spatial-temporal environmental and physiological data that impact our understanding of urban settings.
This research was funded by Swiss National Science Foundation (SNF) project no. 100013L 149552 titled under the German Research Foundation (DFG) Research Grant no. DO551/21-01. Authors are thankful to all the participants who took part in the study.
-  Alpaydin, E. (2014). Introduction to Machine Learning. Cambridge MA, MIT Press.
-  Bell, P. A., Greene, T. C., Fisher, J. D., & Baum, A. (2001). Environmental Psychology. 5th eds. Fort Worth, TX, Harcourt College Publishers.
-  Benedek, M., & Kaernbach, C. (2010). A continuous measure of phasic electrodermal activity. Journal of Neuroscience Method, 190(1), 80–91.
-  Benedikt, M. L. (1979) To take hold of space: isovists and isovist fields. Environment and Planning B: Urban Analytics and City Science, 6(1), pp. 47–65.
-  Blu, T., Thévenaz, P., & Unser, M. (2004). Linear interpolation revitalized. IEEE Transactions on Image Processing, 13(5), 710–719.
-  Braithwaite, J. J., Watson, D. G., Jones, R., & Rowe, M. (2013). A guide for analysing electrodermal activity (EDA) & skin conductance responses (SCRs) for psychological experiments. Psychophysiology, 49, 1017–1034.
-  Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:1–27:27.
-  Chen, W., Jaques, N., Taylor, S., Sano, A., Fedor, S., & Picard, R. W. (2015). Wavelet-based motion artifact removal for electrodermal activity. In 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015 (pp. 6223–6226). IEEE.
-  Choi, J., Ahmed, B., & Gutierrez-Osuna, R. (2012). Development and evaluation of an ambulatory stress monitor based on wearable sensors. IEEE Transactions on Information Technology in Biomedicine, 16(2), 279–286.
-  Collet, C., Petit, C., Champely, S., & Dittmar, A. (2003). Assessing workload through physiological measurements in bus drivers using an automated system during docking. Human Factors, 45(4), 539–548.
-  Empatica Inc., (2018), https://www.empatica.com (Accessed on: 21.01.2018).
-  ESUM, (2018, January), Human perception of the urban environment, https://www.esum.arch.ethz.ch.
-  Garbarino, M., Lai, M., Bender, D., Picard, R. W., & Tognetti, S. (2014). Empatica E3—A wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition. In EAI 4th International Conference on Wireless Mobile Communication and Healthcare, (pp. 39–42). IEEE.
-  Griego, D., Kuliga, S., Bielik, M., Standfest, M., Ojha, V.K., Schneider, S., König, R., Donath, D., Schmitt, G., Miltiadis, C. & Forino, A., (2017). ESUM Urban Sensing Handbook: Component, Assembly and Operational Guide: Sensor backpack and Videos. Zürich, ETH Zürich.
-  Griego, D., Buff, V., Hayoz, E., Moise, I., & Pournaras, E. (2017). Sensing and mining urban qualities in smart cities. In IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), (pp. 1004–1011) IEEE.
Hagan, M. T., & Menhaj, M. B. (1994). Training feedforward networks with the Marquardt algorithm.
IEEE Transactions on Neural Networks, 5(6), 989–993.
-  Hühn, J., & Hüllermeier, E. (2009). FURIA: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3), 293–319.
-  Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.
-  Krause, A., Siewiorek, D. P., Smailagic, A., & Farringdon, J. (2003, October). Unsupervised, dynamic identification of physiological and activity context in wearable computing. In Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings, (pp. 88–97).
-  Krause, A., Smailagic, A., & Siewiorek, D. P. (2006). Context-aware mobile computing: learning context-dependent personal preferences from a wearable sensor array. IEEE Transactions on Mobile Computing, 5(2), 113–127.
-  Kuliga, S.F., Standfest, M., Bielik, M., Schneider, S., Koenig, R., Donath, D., & Schmitt, G. (2017). From real to virtual and back – A multi-method approach for investigating the impact of urban morphology on human spatial experiences. In The Virtual and the Real in Planning and Urban Design: Perspectives, Practices and Applications, (pp 151–169). Routledge, Taylor & Francis.
-  Maldonado, S., Weber, R., & Famili, F. (2014). Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines. Information Sciences, 286, 228-246.
-  Mavros, P., Austwick, M. Z., & Smith, A. H. (2016). Geo-EEG: towards the use of EEG in the study of urban behaviour. Applied Spatial Analysis and Policy, 9(2), 191–212.
-  Molavi, B., & Dumont, G. A. (2012). Wavelet-based motion artifact removal for functional near-infrared spectroscopy. Physiological Measurement, 33(2), 259–270.
Ojha, V. K., Abraham, A., & Snášel, V. (2017). Metaheuristic design of feedforward neural networks: a review of two decades of research.
Engineering Applications of Artificial Intelligence, 60, 97–116.
Pearce, J., Ferrier, S. (2000) Evaluating the predictive performance of habitat models developed using logistic regression,Ecological Modelling. 133(3), 225–245.
-  Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1175–1191.
-  Poh, M. Z., Swenson, N. C., & Picard, R. W. (2010). A wearable sensor for unobtrusive, long-term assessment of electrodermal activity. IEEE Transactions on Biomedical Engineering, 57(5), 1243–1252.
-  Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221–234.
-  Ragot, M., Martin, N., Em, S., Pallamin, N., & Diverrez, J. M. (2017). Emotion recognition using physiological signals: laboratory vs. wearable sensors. In International Conference on Applied Human Factors and Ergonomics, (pp. 15–22). Springer, Cham.
-  Rani, P., Liu, C., Sarkar, N., & Vanman, E. (2006). An empirical study of machine learning techniques for affect recognition in human–robot interaction. Pattern Analysis and Applications, 9(1), 58–69.
-  Rodríguez-Fdez, I., Mucientes, M., & Bugarín, A. (2016). FRULER: fuzzy rule learning through evolution for regression. Information Sciences, 354, 1–18.
-  Roth, W. T., Dawson, M. E., & Filion, D. L. (2012). Publication recommendations for electrodermal measurements. Psychophysiology, 49(8), 1017–1034.
-  Shuyun, W., Jinxi, Z., Xiang, L., & Weilin, C. (2009). Traffic safety evaluation of asphalt pavement anti-sliding property based on time series analysis of drivers’ electrodermal activity. In 2nd International Conference on Intelligent Computation Technology and Automation, ICICTA’09, 2009 (Vol. 3, pp. 859–863). IEEE.
-  Taylor, S., Jaques, N., Chen, W., Fedor, S., Sano, A., & Picard, R. (2015). Automatic identification of artifacts in electrodermal activity data. In 37th Annual International Conference of the IEE Engineering in Medicine and Biology Society (EMBC), 2015, (pp. 1934–1937). IEEE.
-  Varela, F. J., Thompson, E., & Rosch, E. (2017). The Embodied Mind: Cognitive Science and Human Experience. Cambridge MA, MIT Press.
-  Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (1999). Self-organizing map in MATLAB: the SOM Toolbox. In Proceedings of the MATLAB DSP Conference (pp. 16–17).
-  Wang, X. W., Nie, D., & Lu, B. L. (2014). Emotional state classification from EEG data using machine learning approach. Neurocomputing, 129, 94–106.
-  Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann.
Xia, V., Jaques, N., Taylor, S., Fedor, S., & Picard, R. (2015). Active learning for electrodermal activity classification. InIEEE Signal Processing in Medicine and Biology Symposium (SPMB) (pp. 1–6). IEEE.
Appendix A Appendix
|Ledalab||Analysis type||Type of method for decomposing a signal||Continuous decomposition|
|Optimization time||Number of times a signal is optimized||2|
|Window range||1 –3.7 Sec.|
|Smoothing wind||0.2 Sec.|
|REP-Tree||#Leaf instances||Minimum children per node.||2|
|Depth||Maximum limit of tree depth/level.||No limit|
|Pruning||Pruning of tree nodes.||True|
|FURIA||Function||Membership function for fuzzification||Trapezoidal|
|MLP||Learning rate||Convergence speed.||0.3|
|Momentum rate||Influence of previous iteration.||0.2|
|Hidden Layer||Maximum hidden layer nodes.||10|
|Iterations||Maximum time for parameter optimization||1000|
|LibSVM||SVM kernel||Type of function at a hidden node||Radial basis function|
|Epsilon||Termination criteria for algorithm||0.001|
|SOM||Map dimension||Dimension of the 2D plane||20x20|
|Normalization||Method of data normalized for SOM training||Linear scaling|
Number of samples in an epoch of training.
|Iteration||Number of training epochs||25|
|Fine-tuning||Number of fine-tuning epochs||20|