Stress is described as a physiological response to emotional, mental and physical challenges, which people face in everyday life Schneiderman et al. (2005). Numerous types of stressors are part of today’s modern life, such as exams or annual job evaluations. Even though human body adjusts with day-to-day stressors, the long-term exposure to extreme stress can be destructive for mental as well as physical health Pickering (2001). It also increases the risk of cardiovascular diseases and (psycho) somatic complaints Holmes et al. (2006); Shi et al. (2007). Due to the health issues associated with stress, its measurement and management become important. A timely detection of stress can help users to take corrective and preventive measures in an informed way.
The physiological stress affects two branches of an autonomic nervous system: sympathetic nervous system and parasympathetic nervous system. The immediate effect of their stimulations is a measurable change in physiological parameters, such as an increased heart rate (HR) and skin conductance level Dawson et al. (2007). Stress research has a wide area of applications, from increasing resilience of military personnel to improving athletes’ performance. Many techniques have been proposed in the past to detect stress in pilots Sem-Jacobsen (1961), car drivers Healey and Picard (2005); Hennessy and Wiesenthal (1999), computer users Zhai and Barreto (2006), and in surgeons Sexton et al. (2000). In addition to speech and facial expressions, most of the approaches use numerous physiological signals Sharma and Gedeon (2012), such as respiration rate, electrocardiography (ECG), blood pressure, and electromyography (EMG). The collection of these data sources in naturalistic conditions is very difficult and not consumer friendly for developing practical applications. On the contrary, skin conductance and heart rate can be reliably acquired in a non-invasive way from wearable sensors placed on the wrist.
In this paper, we focus on stress detection (binary classification) during real-world and simulated driving tasks using skin conductance and heart rate data. The physiological signals tend to vary in people which are influenced by age, gender, diet or sleep Picard et al. (2001). Due to this fact, stress responses can differ from person to person. The global (or one-fits-all) models that are usually used Healey and Picard (2005) for stress recognition, often do not generalize well to unseen test subjects and hence require extensive fine-tuning of the model. Therefore, to take account of the interpersonal differences, we adopt a multi-task learning (MTL) approach with subject-as-tasks. Specifically, the MTL model has hard parameter sharing of mutual representation along with a specific layer for each subject (or task) in order to personalize the model.
The main contribution of this paper is to use multi-modal physiological data of real-world and simulator driving to develop a multi-task neural network for personalized stress detection.
2 Dataset and Feature Extraction
2.1 MIT Driver Stress
The MIT Driver Stress dataset Healey and Picard (2002) consists of physiological signals recorded during a real-life experiment with subjects in following conditions: 1) resting, 2) driving in a city, and 3) on a highway. The dataset consists of drives, where each driving session lasts for - hours. The recorded signals are EMG, ECG, galvanic skin response (GSR) from hand and foot, heart rate derived from ECG and respiration rate. The GSR and respiration rate is sampled at Hz, ECG recorded at Hz and EMG at Hz. The signals provided in the dataset are down-sampled to Hz. There is another signal available in the dataset called ‘marker’. It indicates a change of activity (a button press) i.e. the start or end of a rest period, city or highway driving.
The marker signal is used to derive ground truth annotation for binary stress levels. Peaks are detected in the signal to capture the button push event; indicating a new trial of the experiment is commencing. The data points before and after the first and last marker (peaks) are removed as they correspond to the time when subjects were equipped with sensors. Likewise, minutes of data after resting and before the beginning of post driving baseline are removed. These steps are taken to avoid feeding signals with ambiguous labels, as it is hard to determine if users are stressful or recuperated. The artifacts are removed from HR and GSR signals following Ollander (2015) as values fluctuated to unreasonably high and low. The EMG signal is discarded because the sensor placed on the shoulder and it might have recorded muscle movement instead of psychological stress response Ollander (2015). Likewise, ECG, GSR from foot and respiration rate are also not used as collecting this data in real-world settings is very problematic. Finally, the following drives’ dataset having GSR from hand, HR and marker signals available are used for model training and evaluation: , , , , , , , , and .
2.2 Simulator Driving
We collected heart rate and skin conductance (SC) data from professional truck drivers using wrist-worn devices. The SC signal was recorded at a frequency of Hz and HR was derived from Photoplethysmogram sensor data with a frequency of Hz; it is upsampled to match the frequency of SC. The experiment was realized with a driving simulation software and participants received standardized instructions from an audiotape. The high stress was induced by means of secondary arithmetic subtraction task. It is a component of widely used Trier Social Stress Test Birkett (2011), where a user has to perform serial subtraction verbally in a loud manner and have to start over from the last correct answer; if a mistake is made.
The study consisted of three major steps 1) baseline driving, 2) moderate stress activity, and 3) high-stress task (Saeed et al., 2017). The experimental trial was initiated with a normal driving for minutes. Afterwards, each subject was asked twice to count - as a moderate stress activity with a very small interval between two activities. After a one-minute period of normal driving and to induce high stress, the subject was asked to count backward from a random number in steps of in approximately seconds. Subsequently, the subject was asked to count backward again from another random number. This process was repeated for approximately minutes. The length of the stress simulation task was minutes, including baseline. Since we were interested in recognition of baseline and high stress, data points of moderate stress activity and bad quality signals of two subjects were dropped.
For model input, we used a sliding window approach to extract physiological features from each participant’s data. A similar window length of seconds with a fixed step size of seconds (or % overlap) is used for both (real-world and simulator) datasets. It is important to note that, features were computed from pre-processed signals, and were subsequently standardized with mean normalization by baseline to compensate for individuals having different resting heart rates.
The heart rate measures the number of heartbeats per unit of time. It describes the heart activity when the autonomic nervous system attempts to tackle with the human body’s demands depending on the received stimuli Healey and Picard (2005)
. We obtained the following seven features from heart rate: mean, standard deviation, min, max, range, root mean square of successive differences, and standard deviation of successive differences.
The skin conductance (also known as galvanic skin response) describes the autonomic variations in electrical properties of the skin or equivalently, number of active sweat glands. It is widely used as a sensitive index of emotional processing, sympathetic activity and is a relevant indicator of the stress level of a person Labbé et al. (2007); Ferreira et al. (2008)
3 Stress Classification
3.1 Problem Formulation
The stress detection task can be formulated as a supervised sequence classification problem. In this task, the objective is to assign a single label to an input sequence. It can be conceived as follows, let be a dataset with sequences of fixed length. The high-level features ( features in our case) are computed from each raw input sequence and corresponding label (binary label in this case) is generally assigned to be the mode of context window labels.
3.2 Model Architecture
The neural network learns complex non-linear transformations of the input data through several hidden layers, having a different number of neurons connected together. In a single-task neural network (ST-NN), there is only one task to solve by minimizing a single loss function with backpropagation. Conversely, multi-task learning involves finding a unified model for solving more than one task with a shared representation of the tasks. Consequently, multi-task neural network model (MT-NN) consists of common layers mutual across tasks as well as task-specific layers. Moreover, in the last layer, there is a separateunit and a loss function for each task. The optimization of loss functions is done simultaneously at random or in other words, by alternating between different tasks.
The multi-task learning is generally done through hard or soft parameter sharing Ruder (2017). By following Jaques et al. Jaques et al. (2016), we employed hard parameter sharing, where, final layers are subject-specific as shown in Figure 1. We used a shared fully-connected layer with neurons and if else activation. The subject-specific layers have neurons and
to reflect non-linearity. Likewise, we applied l2-regularization on task-specific layers and validation-based early stopping, to avoid over-fitting. The binary cross-entropy is optimized as an objective function using a variant of stochastic gradient descent ‘Adam’Kingma and Ba (2014). This model architecture will be able to take interpersonal variations in physiological signals into account through person-specific layers while having a mutual global representation.
Our experiments were conducted using a) MIT Driver Stress dataset Healey and Picard (2002)
and b) Simulator Driving data. We first tested two standard classifiers as a baseline: logistic regression (LR) and support vector machine with linear (L) and radial basis function (RBF) kernels. In addition to that, we also trained two layers (subject independent) neural network model for performance comparison. The data of each subject is divided randomly into train and test sets (/). The
-fold cross validation is performed on the training set for hyper-parameter optimization and evaluation metrics are averaged across participants on the test set. The stress recognition performance of these models is summarized in Table2 and 2 for real-world and simulator driving, respectively.
|LR||0.894 0.078||0.672 0.191|
|SVM (L)||0.903 0.076||0.706 0.175|
|SVM (RBF)||0.950 0.027||0.828 0.100|
|ST-NN||0.954 0.027||0.844 0.095|
|MT-NN||0.965 0.023||0.879 0.080|
|LR||0.720 0.342||0.663 0.371|
|SVM (L)||0.726 0.326||0.671 0.349|
|SVM (RBF)||0.774 0.300||0.710 0.371|
|ST-NN||0.801 0.243||0.736 0.307|
|MT-NN||0.922 0.137||0.891 0.184|
In case of the on-road dataset, it can be seen that mostly all models performed well but there is a considerable variance in the results of standard classifiers. The ST-NN model reduced the spread achieving average f-score and kappa ofand , respectively. Likewise, MT-NN model improved the results even further by minimizing the std. deviation and reach an f-score value of and of kappa. It can be seen as an overall improvement across drives due to subject-specific layers. However, caution is advised in the interpretation of MIT Driver Stress dataset’s result as no actual ground truth annotations or subjective self-reports are publicly available. The labels were acquired by means of a ‘marker’ signal, representing the start of next study trial (i.e. from resting to driving in a city) and assuming that driving, in general, is a stressful task. For simulator driving, the commonly used classifiers and ST-NN do not generalize well as can be noticed from huge standard deviation values of evaluation metrics. The MT-NN notably improved the recognition rate across subjects and resulted in a better model by achieving f-score and kappa of and , respectively. We think the reason for the large variation in results across participants (as compared to on-road study) could be the short duration of the experiment, a number of users, and use of different sensors for data collection. Nevertheless, these results show that multi-task learning with reliable quality skin conductance and heart rate signals can be used to detect physiological stress during driving as it generalizes well across various drivers and different environments (real-world and a simulator).
In this work, the multi-task neural network is used to detect physiological stress during real-world and simulator driving tasks. Generally, a global (subject-independent) model is used for this purpose, which may perform poorly due to large interpersonal variations in physiological parameters (e.g. due to age and diet) Picard et al. (2001). Likewise, most of the studies (see Sharma and Gedeon (2012) for a review) used sensor data (such as EMG, respiration rate, facial expressions and pupil dilation) that are very hard to acquire in a real-life situation to develop practical applications. Therefore, we used skin conductance and heart rate features in combination with multi-task learning (subjects-as-tasks) to come up with a personalized stress model. In our experiments, we found almost similar results on MIT Driver Stress and Simulator Driving datasets, with a same neural network architecture. Hence, it can be said that if a wearable device provides reliable quality signals, real-time stress detection application can be developed to improve driver’s safety and well-being.
In the future, we will explore transferring representation learned from one dataset to another and to examine the generalization; like it has been usually done for computer vision and natural language processing problemsSharif Razavian et al. (2014); Chen et al. (2012). Moreover, we want to apply neural network with temporal convolutions and recurrent layers; on raw physiological signals to automatically learn discriminant features. Most importantly, a future study may involve investigating the performance of these models in real-life situation e.g. by comparing the output of the model against subjective self-reports.
- Birkett  Melissa A Birkett. The trier social stress test protocol for inducing psychological stress. Journal of visualized experiments: JoVE, (56), 2011.
- Chen et al.  Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. Marginalized denoising autoencoders for domain adaptation. arXiv preprint arXiv:1206.4683, 2012.
- Dawson et al.  Michael E. Dawson, Anne M. Schell, and Diane L. Filion. The electrodermal system. Handbook of psychophysiology, 2:200–223, 2007.
- Ferreira et al.  Pedro Ferreira, Pedro Sanches, Kristina Höök, and Tove Jaensson. License to chill!: how to empower users to cope with stress. In Proceedings of the 5th Nordic conference on Human-computer interaction: building bridges, pages 123–132. ACM, 2008.
- Healey and Picard  Jennifer Healey and Rosalind W. Picard. Driver stress data. Retrieved June 26th from MIT Affective Computing Group: http://affect. media. mit. edu, 124, 2002.
- Healey and Picard  Jennifer A. Healey and Rosalind W. Picard. Detecting stress during real-world driving tasks using physiological sensors. IEEE Transactions on intelligent transportation systems, 6(2):156–166, 2005.
- Hennessy and Wiesenthal  Dwight A. Hennessy and David L. Wiesenthal. Traffic congestion, driver stress, and driver aggression. Aggressive behavior, 25(6):409–423, 1999.
- Holmes et al.  Sari D. Holmes, David S Krantz, Heather Rogers, John Gottdiener, and Richard J. Contrada. Mental stress and coronary artery disease: a multidisciplinary guide. Progress in cardiovascular diseases, 49(2):106–122, 2006.
Jaques et al. 
Natasha Jaques, Sara Taylor, Ehimwenma Nosakhare, Akane Sano, and Rosalind
Multi-task learning for predicting health, stress, and happiness.
NIPS Workshop on Machine Learning for Healthcare, 2016.
- Kingma and Ba  Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Labbé et al.  Elise Labbé, Nicholas Schmidt, Jonathan Babin, and Martha Pharr. Coping with stress: the effectiveness of different types of music. Applied psychophysiology and biofeedback, 32(3-4):163–168, 2007.
Wearable sensor data fusion for human stress estimation, 2015.Master Thesis, Technical University of Linköping University.
- Picard et al.  Rosalind W. Picard, Elias Vyzas, and Jennifer Healey. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE transactions on pattern analysis and machine intelligence, 23(10):1175–1191, 2001.
- Pickering  Thomas G. Pickering. Mental stress as a causal factor in the development of hypertension and cardiovascular disease. Current hypertension reports, 3(3):249–254, 2001.
- Ruder  Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
- Saeed et al.  Aaqib Saeed, Stojan Trajanovski, Maurice van Keulen, and Jan van Erp. Deep physiological arousal detection in a driving simulator using wearable sensors. IEEE International Conference on Data Mining - workshop: Data Mining in Biomedical Informatics and Healthcare (DMBIH), 2017.
- Schneiderman et al.  Neil Schneiderman, Gail Ironson, and Scott D. Siegel. Stress and health: psychological, behavioral, and biological determinants. Annu. Rev. Clin. Psychol., 1:607–628, 2005.
- Sem-Jacobsen  Carl W. Sem-Jacobsen. Electroencephalographic study of pilot stresses in flight. Technical report, Gaustad Hospital Oslo (Norway) Eeg Research Lab, 1961.
- Sexton et al.  J. Bryan Sexton, Eric J. Thomas, and Robert L. Helmreich. Error, stress, and teamwork in medicine and aviation: cross sectional surveys. British Medical Journal, 320(7237):745–749, 2000.
Sharif Razavian et al. 
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson.
Cnn features off-the-shelf: an astounding baseline for recognition.
Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 806–813, 2014.
- Sharma and Gedeon  Nandita Sharma and Tom Gedeon. Objective measures, sensors and computational techniques for stress recognition and classification: A survey. Computer methods and programs in biomedicine, 108(3):1287–1301, 2012.
- Shi et al.  Yu Shi, Natalie Ruiz, Ronnie Taib, Eric Choi, and Fang Chen. Galvanic skin response (gsr) as an index of cognitive load. In CHI’07 extended abstracts on Human factors in computing systems, pages 2651–2656. ACM, 2007.
- Zhai and Barreto  Jing Zhai and Armando Barreto. Stress detection in computer users based on digital signal processing of noninvasive physiological variables. In Engineering in Medicine and Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE, pages 1355–1358. IEEE, 2006.