Novel designs of wearable sensors demonstrate promising results for monitoring physiological status (e.g., stress level assessment) in humans. A traditional method to assess such activity was by measuring electroencephalography (EEG) . However, EEG-based monitoring requires either surface (non-invasive) or implanted (invasive) electrodes and frequent calibration to account for sensor sensitivity to external factors, which increases system expense and decreases user comfort. Usage of non-EEG physiological biosignals [2, 3, 4, 5, 6, 7] avoids the aforementioned issues with a wrist-worn platform in more effective, comfortable, and less expensive ways.
One major issue in identifying different physiological states is to prevent undesired variability among different subjects or different recording sessions from a single subject. Generally, given the fact that most biosignal datasets are of smaller scale, transfer learning [8, 9, 10] aims to cope with the change in data distributions, in order to process data from a wider range of users. Notably, promising results were demonstrated in transfer learning by censoring learned discriminative representations within an adversarial training scheme [11, 12, 13, 14, 15, 16]. Adversarial representation learning can allow the representation to predict dependent variables while simultaneously taking advantage of an adaptive measure to control the extent of its dependency during training.
In this study, we propose an adversarial inference approach for transfer learning to exploit disentangled nuisance-robust representations from physiological biosignal data in stress status level decoding. Particularly different from common deep learning frameworks, we exploit a trade-off between task-related features and person-discriminative information by using additional censoring network blocks to manipulate the learned latent representations using adversarial training schemes. By jointly training the adversary, nuisance and classifier units, task-discriminative features are incorporated into the final prediction, while simultaneously the biosignal characteristics from new users could also be projected to local features extracted from the existing subject pool for reference purposes, especially when new users demonstrate similar biosignal behaviors to the training set subjects. We perform empirical assessments on a publicly available dataset with extensive parameter explorations. Results demonstrate the advantage of our disentangled adversarial transfer learning framework with a proof of concept through cross-subject evaluations. Moreover, we highlight that the proposed adversarial transfer learning framework is applicable to other deep learning network approaches that are available, depending on the characteristics of the signal to be learned.
Ii-a Disentangled Adversarial Transfer Learning
Let denote the training data set, where is the raw data matrix at trial recorded from dimensions for discretized time samples, is the label of corresponding user stress level status or task among categories, and indicates the subject identification (ID) number from whom the data was recorded among individuals. Here we assume the task/status and subject ID are marginally independent, and the data is generated dependently on and jointly i.e., . Our aim is to build a discriminative model which can classify/predict the category given observation , where the model is generalized across subjects and invariant to the variability in subject IDs , which is regarded as a nuisance variable involved in the data generation process.
In the proposed framework, a deterministic encoder with parameters is trained to learn the latent representation from data , where the latent consists of two sub-parts: and , based on a ratio of over its dimensionality. The latent sub-part is used as the input to an adversary network with parameters , while serves as the input to a nuisance network parameterized by , as illustrated in Figure 1. Complete latent representation (i.e., concatenation of and ) is further used as an input to the main classifier network with parameters .
In order to filter factors of variation caused by out of , the encoder is forced to minimize the likelihood , while at the same time maximizing the likelihood to retain sufficient subject-discriminative information within . The main classifier network is further conditioned on alongside latent , and trained towards the main classification task to predict the category label by maximizing the likelihood . Overall, we propose the following objective to train the encoder-classifier pair as follows:
where and denote the weight parameters for adversary and nuisance networks respectively, controlled to adjust the trade-off between invariance and identification performance. Setting indicates training a regular discriminative neural network structure without disentangling the transfer learning units. Note that besides the overall objective, both of the adversary and nuisance networks are also trained separately to predict variable by maximizing the likelihoods and
respectively. Neural network weights are optimized by every training data batch via stochastic gradient descent; for each training batch, weights for the adversary network, nuisance network and the classifier network are updated alternatingly according to their corresponding softmax cross-entropy loss.
Disentangling of into sub-parts and is proposed to systematically re-arrange the distribution of task- and subject- related features. While conceals subject information indicated through , is trained to retain subject-related information within the learned sub-component. By dissociating the nuisance variable from task-related discriminative features in a more clear way, the model is extrapolated into a broader domain of subjects. For the input data of users unknown to the training subject set, task-related features would be incorporated into the final prediction, whereas the biosignal behaviors which are similar to known subjects could also be projected to to serve as a reference.
Ii-B Model Architecture
Deep neural networks in biomedical signal processing were recently demonstrated as powerful generic feature extractors [14, 16, 17, 18]. In the view of these progress, each block in the proposed model in Figure 1 is composed of neural networks for further assessments. It is worth noting that our proposed framework is applicable to any other discriminative representation learning network, depending on the characteristics of the signal of interest.
The encoder consists of two linear layers with 100 units per layer, since deeper layers did not to improve the performance significantly but yet increasing the amount of parameters to be estimated and hence causing possible overfitting to the training data. In our preliminary analyses, we also did not observe significant improvements by altering the number of units at each layer. Representationwith dimension is then generated and split into and with dimensions of and
respectively. Attached to the encoder, the adversary network, nuisance network and the main classifier are each built as a single hidden layer multilayer perceptron (MLP) with ReLU nonlinearity. Learned representation sub-partsand are respectively used by adversary and nuisance networks with output dimensionality of for classification of subject IDs. Similarly complete representation is used as input to the main classifier network with an output dimensionality of for task label decoding.
Iii Experimental Evaluation and Results
Iii-a Physiological Biosignal Dataset
We perform the experimental evaluations on a publicly available physiological biosignal dataset for the assessment of different stress status levels . This database consists of physiological biosignals for inferring 4 different stress status () from 20 healthy subjects (), including physical stress, cognitive stress, emotional stress and relaxation. The data was collected by non-invasive wrist worn biosensors and contains electrodermal activity (EDA), temperature, acceleration, heart rate, and arterial oxygen level, where acceleration is composed of data from three channels. Thus the dataset consists of signals from 7 channels in total (), which we downsampled to 1 Hz to align all data sources. For each of the stress status states, a corresponding task of 5 minutes (i.e., 300 time samples with ) was assigned to subjects for inducing the stress levels. Each subject performed a total of 7 trials, where 4 out of the 7 trials were for the relaxation status. To account for imbalanced number of trials across classes, we only used the first trial of the relaxation trials and ignored the rest, resulting in one trial for each of the four stress status levels.
Iii-B Experiment Implementation
For the model described above, according to the dataset, we have , , and . The parameters to be determined for the disentangled adversarial model were regularization weights and , and the rate of nuisance representation . An intuitive way to optimize is by a parameter sweep. To perform this, we trained our models with various parameter combinations, and favored the decreases in adversary accuracy with increasing nuisance accuracy, while maintaining a relatively stable accuracy for the main classifier on the validation sets.
|Main Classifier||Adversary Network||Nuisance Network|
To initially reduce the amount of parameter combinations, we first optimized the model for and with and , which is the case of adversarial model with only adversary network attached. Later, based on the assumption that the subject-related representation accounts for a relatively small proportion among in order to solve the task-specific problem, we fixed the rate of nuisance representation to . With a fixed and , we further assessed the model with varying , which is the disentangled adversarial model with both adversary and nuisance networks attached. It is essential to note that these parameters could still be changed and optimally chosen by cross-validating the model learning stage even further, since for different selections of each parameter which were not covered in this implementation there are corresponding variable combinations to be optimized. Still, the adversarial transfer learning framework could be applied for any other specifications. Evaluations were performed by cross-subjects analyses using a leave-one-subject-out approach, where the left-out subject constituted the cross-subject test set, and the training and validation sets were composed of 90% and 10% random trial splits from the remaining subjects.
Iii-C Results and Discussion
We performed cross-subjects analyses to evaluate the trained models, which is an indicator for transfer learning performances. As shown in Table I, we first assessed the non-adversarial models with , and . Later we evaluated the adversarial network with and respectively with and to approximately reduce the number of parameters. Finally, we fixed and in order to observe the representation learning capability of the complete disentangled adversarial transfer learning model with different choices of . For each model we evaluated the accuracy of the main classifier (4-class decoding), as well as the adversary and nuisance networks (20-class decoding). A higher accuracy of main classifier indicates better discrimination of stress status levels, a lower accuracy of adversary network demonstrates that more task-specific information are preserved in the learned representation , and a higher accuracy of nuisance network shows that more subject-dependent features are existing in the representation . Thus our aim is to keep the accuracy of main classifier stable while decreasing the adversary accuracy and increasing the nuisance accuracy.
In Table I we observe that the non-adversarial model can indeed learn features which yield a status-classification accuracy of 79.88%, yet with a 71.13% adversary network accuracy and a 6.17% nuisance network accuracy. We further notice that with increasing , the adversary network accuracy descends dramatically towards chance level and thus more task-discriminative features are exploited by , while the main classifier accuracy slightly increases. Specifically, is more preferable than in this case. Moreover, under the particular setting of and , we observe that higher censors the encoder with significantly increased nuisance network accuracies, and therefore enforces stronger extraction of subject information into , with slightly higher but relatively stable main classifier accuracies. Figure 2 demonstrates the transfer learning results for the 20 held-out subjects on three specific model training conditions, where we observe that with our approach using both adversary and nuisance network units attached to the encoder, the classifier improves the worst-case accuracies significantly and shows more stable performances across different left-out subjects, since the proposed transfer learning framework becomes more robust to decode data of unknown subjects from a broader range.
This study proposes a framework for disentangled adversarial transfer learning to extract nuisance-robust representations from physiological biosignal data in stress status level decoding. Different from common deep learning network architectures, in our proposed model, additional adversary and nuisance networks are attached to the output of the feature learning encoder for manipulating the latent representations. We exploit a novel objective towards which the adversary network, nuisance network and the encoder-classifier pair are jointly trained. We perform cross-subject transfer learning evaluations over a publicly available physiological biosignal dataset for stress status level monitoring. Results demonstrate the benefits of the proposed disentangled adversarial framework in transfer learning with input data from novel users, and thus demonstrate better adaptability to a wider range of subjects. Our proposed adversarial transfer learning model is also applicable to any other deep feature learning approach, where the feature encoders could be manipulated accordingly based on different input signal characteristics.
-  P. C. Petrantonakis and L. J. Hadjileontiadis, “Emotion recognition from EEG using higher order crossings,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 2, pp. 186–197, 2009.
-  J. Birjandtalab, D. Cogan, M. B. Pouyan, and M. Nourani, “A non-EEG biosignals dataset for assessment and visualization of neurological status,” in IEEE International Workshop on Signal Processing Systems, 2016, pp. 110–114.
-  A. M. Amiri, M. Abtahi, A. Rabasco, M. Armey, and K. Mankodiya, “Emotional reactivity monitoring using electrodermal activity analysis in individuals with suicidal behaviors,” in 10th International Symposium on Medical Information and Communication Technology, 2016, pp. 1–5.
-  D. Cogan, M. B. Pouyan, M. Nourani, and J. Harvey, “A wrist-worn biosensor system for assessment of neurological status,” in 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2014, pp. 5748–5751.
-  D. Giakoumis, D. Tzovaras, and G. Hassapis, “Subject-dependent biosignal features for increased accuracy in psychological stress detection,” International Journal of Human-Computer Studies, vol. 71, no. 4, pp. 425–439, 2013.
-  G. Giannakakis, D. Grigoriadis, K. Giannakaki, O. Simantiraki, A. Roniotis, and M. Tsiknakis, “Review on psychological stress detection using biosignals,” IEEE Transactions on Affective Computing, 2019.
-  O. Ozdenizci et al., “Time-series prediction of proximal aggression onset in minimally-verbal youth with autism spectrum disorder using physiological biosignals,” in 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2018, pp. 5745–5748.
-  H. Morioka, A. Kanemura, J.-i. Hirayama, M. Shikauchi, T. Ogawa, S. Ikeda, M. Kawanabe, and S. Ishii, “Learning a common dictionary for subject-transfer decoding with resting calibration,” NeuroImage, vol. 111, pp. 167–178, 2015.
-  W. Tu and S. Sun, “A subject transfer framework for EEG classification,” Neurocomputing, vol. 82, pp. 109–116, 2012.
-  S. Fazli, F. Popescu, M. Danóczy, B. Blankertz, K.-R. Müller, and C. Grozea, “Subject-independent mental state classification in single trials,” Neural Networks, vol. 22, no. 9, pp. 1305–1312, 2009.
-  H. Edwards and A. Storkey, “Censoring representations with an adversary,” arXiv preprint arXiv:1511.05897, 2015.
-  A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015.
-  M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun, “Disentangling factors of variation in deep representation using adversarial training,” in Advances in Neural Information Processing Systems, 2016, pp. 5040–5048.
-  O. Özdenizci, Y. Wang, T. Koike-Akino, and D. Erdoğmuş, “Learning invariant representations from EEG via adversarial inference,” IEEE Access, vol. 8, pp. 27 074–27 085, 2020.
-  G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer, and M. Ranzato, “Fader networks: Manipulating images by sliding attributes,” in Advances in Neural Information Processing Systems, 2017, pp. 5967–5976.
O. Özdenizci, Y. Wang, T. Koike-Akino, and D. Erdoğmuş, “Transfer learning in brain-computer interfaces with adversarial variational autoencoders,” in9th International IEEE/EMBS Conference on Neural Engineering, 2019, pp. 207–210.
M. Atzori, M. Cognolato, and H. Müller, “Deep learning with convolutional neural networks applied to electromyography data: A resource for the classification of movements for prosthetic hands,”Frontiers in Neurorobotics, vol. 10, p. 9, 2016.
-  O. Faust, Y. Hagiwara, T. J. Hong, O. S. Lih, and U. R. Acharya, “Deep learning for healthcare applications based on physiological signals: A review,” Computer Methods and Programs in Biomedicine, vol. 161, pp. 1–13, 2018.