Remarkable enrichment of sensor technology and consequently smart environments alongside with huge progress in machine learning techniques have pervasively brought intelligent solutions into every aspect of human life. Recognition of what the human subject is doing, widely considered to be one of the most important tasks of an intelligent system known as an active field of research named Human Activity Recognition (HAR).
Previous studies on HAR can be generally categorized based on sensor modalities and data utilized for detection of activity details include vision and sensors based approaches. Vision-based sensors are exploited to capture images, videos or surveillance camera features to recognize activitynweke2018deep. Despite the successful performance of vision based solutions, non-visual sensors are still required to address their existing limitations such as laborious processing and privacy problems. Non-visual sensors can be installed on the human’s body (wearable sensors) or in the environment (ambient sensors). Utilizing a network of heterogeneous sensors has become widespread interest as well.
Diverse supervised and semi-supervised machine learning models have been proposed for activity recognition. These models deliver promising accuracy conditioning on training with enough labeled data. However, the pitfall is that their performance will dramatically fall against data, from new unseen distributions. The difference may root in feature space or label space distribution. Therefore, recognizing activities of a new user remains challenging for the model which was trained by samples of other users’ behavior. Nevertheless, collecting and labeling sufficient training data is not feasible for every new user since it requires a relatively long time observation of human subjects behavior which is time-consuming and sometimes impractical.
Transfer learning techniques aim to prevent that performance leak by adapting obtained knowledge from the source domain (training users) to the target domain (new users). Transfer learning is researched under a variety of different names such as life-long learning, knowledge transfer, learning to learn, inductive transfer, context-sensitive learning, and meta-learning in the field of machine learning,cook2013transfer.
This research study investigates how to solve the aforementioned limitations and analyzes the results of our proposed solution. The remainder of this paper is organized as follows in five sections:
Section 2 examines the previous research works have been devoted to the study of HAR and Transfer Learning. Section 3 describes SA-GAN and its related training details. Evaluation, experimental results and their analysis are presented in section 4. Finally, section 5 summarizes the results of this work, draws conclusions and highlights issues for future researches.
2 Related Works
2.1 Human Activity Recognition
Traditional machine learning approaches including K-Nearest Neighbor(KNN), Hidden Markov Model (HMM), Support Vector Machine (SVM), Random Forest (RF) and Naive Bayes have shown satisfactory results on recognizing human activitiesmannini2010machine; bedogni2012train
. A major criticism of these models is that they mainly rely on handcrafted feature extraction or heuristic information. Besides the demand for the domain specialist, extracted features are not abstracted enough. Therefore, models are not suitable for generalization and recognition of more abstract activitiesyang2009activity.
Deep learning approaches have been interestingly used in feature extraction applications such as HAR problems which deal with high dimensional data vepakomma2015wristocracy; ronao2016human. Data-driven cheng2017human; chen2015deep; yang2015deep and Model-driven approaches ha2015multi; jiang2015human are two primary ways of deep model application in HAR problems. Increasing network’s depth improves the quality of extracted featureswang2018deep.
Stacked Auto Encoders (SAE) are the models capable of learning lower dimensional representation in an unsupervised manner li2014unsupervised
. Recurrent networks and their combination with Restricted Boltzmann Machines would be of interest owing to the temporal nature of human activities datajiang2015human; guan2017ensembles; inoue2018deep. Nonetheless, their high resource consumption and low rate of learning is counted as their prohibitive drawbacks.
2.2 Knowledge Transfer
The literature on knowledge transfer can be generally categorized into four main approaches based on the type of knowledge they transfer pan2010survey:
Instance Transfer: Methods placing in this approach, mainly aim for weighting and transforming labeled instances into the target domain. Standard supervised machine learning models can be applied on transferred samples afterward.
Feature Representation Transfer: The core idea of this category’s models is about finding a common representation of both source and target domain that decreases the distance between domains while keeping their classes discernible.
Parameter Transfer: The basic assumption is that the source and target domains share some parameters or prior distributions of the models’ hyper-parameters. These methods focus on the transformation of prior knowledge and parameters between domains.
Relational Transfer: The knowledge to be transferred is the relationship among the data. Mapping of relational knowledge between the source domain and target domains is built. Both domains should be relational.
Authors in hu2011cross
proposed an instance-based transfer model in HAR domain that interprets the data of source domain as pseudo training data with respect to their similarity measure to the target domain samples. These pseudo data then will be fed into supervised learning algorithms for training the classifier.
Quite recently, another Cross-Domain Activity Recognition translation framework was proposed by researchers in wang2018stratified. It first obtains pseudo labels for the target domain using the majority voting technique. Then, it transforms both domains into common subspaces considering intra-class correlations. This model which is working in a semi-supervised manner obtains labels of target domain via the second annotation.
Transfer Component Analysis (TCA) is a domain adaptation method introduced in pan2011domain. TCA learns transfer components across domains in a Reproducing Kernel Hilbert Space for establishing a representation transfer. With the new representation in the subspace spanned by these transfer components, standard machine learning methods are applicable to train classifiers or regression models in the source domain for use in the target domain.
Another representation transfer solution is described in gong2012geodesic named in short GFK which is a kernel-based method. It models the domain shift by integrating an infinite number of subspaces that characterize changes in statistical and geometric attributes from the source to the target domain.
2.3 Generative Adversarial Networks
The idea of Adversarial Learning has attracted much attention from research teams since the introduction of the GAN framework. Several publications have appeared in recent years documenting Domain Adaptation using GAN modelsluo2017label. Though, most of the previous works concentrating on utilizing GAN in the domain of Vision and Image Processing.
Authors in zhu2017unpaired present an approach for learning to translate an image from a source domain to a target domain in order to overcome the problem of lacking labeled images. Researchers in yi2017dualgan, developed an innovative mechanism named dual-GAN, which provides an image translator trained by sets of unlabeled images of both domains. A new unsupervised method presented in bousmalis2017unsupervised that learns a transformation in the pixel space from one domain to the other by adapting source-domain images to appear as if drawn from the target domain. Isola et al. isola2017image
However, research on employing GAN models in Human Activity Recognition domain is limited to approaches of generating high-quality artificial data, imitating output of wearable sensorssaeedi2018personalized. Despite compelling results of GANs on the vision-based problems, they are not optimal on discriminative tasks and can handle smaller domain shiftstzeng2017adversarial. To the authors’ best knowledge, very few publications can be found in the literature that discusses the issue of Knowledge Transfer using GAN for classification performance improvement in the HAR problems.
3 Proposed Model: SA-GAN
Following our semi-supervised knowledge transfer setting, we have labeled data of the source domain and unlabeled data of target domain . In our Cross-Subject Transfer Learning problem the difference between the domains roots in the distribution of feature space and the conditional distribution of label space. It can be interpreted as the scenario when there is a model trained with limited samples from distribution , and it is required to test the model against samples drawn from . Given these assumptions, the goal is to perform transformation of samples from in order to have labeled data as if drawn from . Concurrently a classifier can be trained with those transferred instances. Therefore it will be able to classify data from .
As shown in Fig. 1, our proposed model consists of three main components:
Generator (G): This component is in charge of generating artificial data which are similar to the data from the target domain.
Discriminator (D): This component’s task is to distinguish between artificial G’s output and real data from the target domain.
Classifier (C): This component aims to assign a true label to its inputs. Its interaction in training of adversarial components prevents the model from Mode Collapse related challenges.
Similar to the classic GAN model, G and D play a minimax game with the value function as the following:
In this game, G implicitly defines a new distribution which supposed to be as close as possible to in a way that D will not be able to discriminate data from and . This minimax game has a global optimum in ; hence the optimal discriminator can be written in the form ofgoodfellow2014generative:
Which means will discriminate half of the samples incorrectly, as G getting more powerful in generating target-wise data.
Simultaneously, classifier C prevents generator G from collapsing since it gets updated based on discernibility of its output evaluated by C, as illustrated in Fig. 2. Mode collapse refers to an state when G collapses too many values of input() to the same value of output()goodfellow2014generative. The equation that describes classifier’s supervised optimization function is as follows:
Given and the overall objective function of the framework is defined by Equation 4 whereas and are adversarial and classification task factor:
These two hyperparameters determine the impact level of D and C’s output on gradient update of G. Considering quick convergence of adversarial components, large values oflet classifier keep improving using transferred and original target data.
3.1 Components’ Architecture
Summary of all components’ parameters and their input/output are provided in Table 1. Those parameters deliver the capability of complexity control to the model. Model’s sophistication should suit source and target subject distance so that be able to move the required mass between two distribution. Complexity can be regulated based on the model’s fitness reflection in loss values’ trend.
For discriminator D, we have implemented a model composed of
convolutional layers following by Leaky ReLU. For the first convolutional layer, we also added up Batch Normalization. The last layer of this component is a 1D convolutional layer with the number of features outputs. By practical investigation, we decided to applyTanh
as the activation function for the last layer, to obtainoutput instead of binary ones. Besides, One-sided label smoothing has been done to make further improvement as is suggested in salimans2016improved.
Several experiments had been carried out with the purpose of verifying performance loss without applying transfer learning techniques and the necessity of their application. During these experiments, different architectures were examined. Admissible performance and generalization potential of convolutional models led us to opt for a convolutional architecture for classifier C and generator G. Generator and classifier components, consist of residual blocks with and filters in convolutional layers respectively.
Model training entails three main steps implemented in mini-batch mode. In each step, only one component is getting updated, and two remaining components remain constant during training batch. Moreover, training steps are considered to be independent optimization problem so that different optimizers, learning rates, and loss functions can be applied to solve the problem. Mini-batch training allowed the model to take into account a bunch of data once in an iteration; so it has a wider horizon which is helpful to generate more various samples. While in the simple stochastic training the model processes each sample independently and it probably makes the model blind to the diversity of its generated samples. AlgorithmLABEL:Algorithm outlines the training procedure.
In the first step, discriminator D is updated by maximizing Equation 1 using samples from both domains. This step is sort of supervised training that exploits the feature space of , and which has adopted by Generator G, as the input and yields a binary label validity
which remarks real or fake nature of data. In our implementation, Mean Squared Error(MSE) loss and Stochastic Gradient Descent(SGD) optimizer are chosen for this step.
With the completion of the first step, we can proceed to classifier training based on Equation 3. The second step can be treated as a supervised classification problem which aims to assign correct label to both inputs and artificial data generated by G using . Note that the higher objective is to transfer data from source domain through the generator in a way that we get close enough to so as to be appropriate for classifier training.
Having D and C updated, the final adjustment is to train the generator. As illustrated in Fig. 2, this element utilizes the combination of discriminator and classifier’s output in order to compute its gradient. Each output participates in the training procedure proportional to their task factor as formulated in Equation 4.
Our experiments are broken down into two groups on the basis of their objectives. The first set of analysis was carried out in order to justify the necessity of applying knowledge transfer technique by investigating the performance drop in case of domain shifts. Another group of experiments was conducted with the aim of measuring the improvement achieved by SA-GAN model.
The evaluation is performed on the Opportunity Challenge benchmark dataset which contains the recorded output of wearable sensors worn by 4 human subjects while they were doing predefined activitieschavarriaga2013opportunity. There exist three types of activity in this dataset on the basis of their level of abstraction. Recognition task is more difficult for activity with a higher level of abstraction. The most abstract activities, have been picked to evaluate the proposed model.
For each subject, the first three Activity of Daily Living (ADL) files considered as a training set and fourth and fifth ones were selected as validation and test set respectively. Fig. 3, depicts a simple overview of the required steps to have the prediction of target domain , using SA-GAN.
4.1 Experimental Setup
Data preparation is an inevitable step in neural networks training. Our preprocessing framework is composed of three major steps as following:
Data Preprocessing: In the initial step, missing values of the dataset were replaced by the mean value of their corresponding feature. A min-max normalization has been done based on the sensor’s range of output.
Data Segmentation: Approximately 3 seconds length sliding window was applied for segmentation, taking into account 70% of overlap between successive windows of data.
Dimension Reduction: Considering sliding window application on feature vectors, for each sample we would have around 10 thousands feature values which is extremely difficult for a network to process. Hence, we utilized Principal Component Analysis (PCA) to reduce windows dimension to 88. The number of components to keep, can be determined by a training time versus accuracy trade-off.
4.2 Results and Analysis
Our principal objective of transfer learning is to decline the distance between the source and target domain distributions. It is expected to have a more laborious transformation between further domains in terms of time and resource consumption. The distance measurements were taken using Wasserstein distance function which is defined as followsarjovsky2017wasserstein:
is the set of all joint distributionswhose marginals are and respectively. In fact, denotes the “mass” required to be transported from domain s to t in order to transform the distributions into the distribution .
Table 2 presents the results obtained from our experiments. Each experiment is defined by specific source and target subject whose Wasserstein distance is stated in the third column. This scenario of experiments follows a notion of real-world application where there is a newcomer user to test the pre-trained HAR model while it has not been among training dataset’s users. The best source for knowledge transfer can be found either by distance or by validation measure comparison.
|Source Subject||Target Subject||Wasserstein Distance||No Transfer||KNN+PCA||GFK||STL||SA-GAN||Supervised Learning|
|Transferred from S1 to S2||Supervised Learning on S2||Without Transferred|
Left: Confusion matrix of the proposed model transferring knowledge from Subject 1 to Subject 2. Center: Confusion matrix of a supervisely trained model on Subject 2. Right: Confusion matrix of a supervisely trained model on data Subject 1. All the models are tested against Subject 2.
To assess our proposed model, we have compared its performance with two state-of-the-art transfer learning models including GFK gong2012geodesic, STLwang2018stratified and a classic knowledge transfer model KNN+PCA in terms of Weighted F1-measure. The most dominant performance of each experiment is in bold font.
On combining models’ performance result with domain distances, we deduced that the domains with the largest distance lead to more ineffective transports, as it was expected. It can be inferred from the reported W-F1 measures that our proposed model made improvement in all the cases compare with No Transfer mode reported in Table 2. Moreover, it shows the predominant results among more than 66% of experiments and second best classification performance in the remaining 25% of them.
Further analysis notably showed that in 3 experiments, SA-GAN model has reached up to 90% accuracy of the model which has been supervisely trained by the target domain labeled data. Supervised learning performance can be assumed as a summit to comprehend how much improvement is feasible to achieve. Therefore, the fourth and last columns of Table 2 is a sort of boundary for performance drop and enhancement respectively. Our investigations have shown from 22% to 47% of performance loss in No transfer mode.
Table 3 extends our knowledge of what has been reached by transfer learning. It goes deeper into the confusion matrix of 3 models which are representing Semi-supervised transfer learning (our proposed model), Supervised learning, and No transfer mode. The source domain is Subject 1, and the target is Subject 2. Apart from this slight sign of mode collapse on class 0 (Relaxing) and 5 (Sandwich Time), the result shows appreciable enrichment provided by SA-GAN.
From Table 2 it can be seen that performance falls to 0.45 in case of assessing a model that has been trained by samples of Subject 1, against samples of Subject 2 and it heads up to 0.75 given supervised training using Subject 2’s labeled data; while the proposed model achieves W-F1 equal to 0.73 which is almost equivalent of supervised learning performance.
As mentioned earlier, Cross-Subject Transfer Learning using different sources has great potential for practical applications. One typical real-world application is the scenario simulating a situation when a pre-trained machine learning model is facing with a new user, while it is not possible to collect and label enough data for re-training of the model. Assume the model is trained using samples of Subject 1, 3 and 4 and the unseen samples belong to the activities of Subject 2. For each Knowledge Transfer model, three transformations have been done, using three different sources. These cases are represented in Fig. 4.
Accordingly, Subject 1 is of direct practical relevance and proper to be picked as the source of transfer. Considering Subject 1 to Subject 2 transformation, our proposed model overwhelmed other methods. Summing up the results, for each target subject, most appropriate source of transfer can be found by evaluating the obtained transferred model over a validation set.
5 Conclusions and Future Work
One of the most important limitations of HAR models lies in lacking a sufficient amount of labeled data. Furthermore, the discovered patterns through available labeled data might not be well generalizable to the samples from unseen subjects. However, data acquisition and labeling are not feasible for newcomers due to limitations of interaction with human users. This paper has highlighted an innovative cutting-edge solution for cross-subject knowledge transfer in the domain of Human Activity Recognition based on Generative Adversarial Network framework. SA-GAN performs a semi-supervised instance-based transfer in order to provide enough data to train a classifier on the target domain. Results so far have been very promising and we reached up to 90% of supervised model’s performance in some cases.
Future work will concentrate on utilizing more stable versions of GAN to prevent mode collapse problem and achieve enhancement on recognition results. To further our research we intend to examine multiple source transfer and combination of transferred models from different source domains.