Achieving Single-Sensor Complex Activity Recognition from Multi-Sensor Training Data

02/26/2020 ∙ by Paula Lago, et al. ∙ 0

In this study, we propose a method for single sensor-based activity recognition, trained with data from multiple sensors. There is no doubt that the performance of complex activity recognition systems increases when we use enough sensors with sufficient quality, however using such rich sensors may not be feasible in real-life situations for various reasons such as user comfort, privacy, battery-preservation, and/or costs. In many cases, only one device such as a smartphone is available, and it is challenging to achieve high accuracy with a single sensor, more so for complex activities. Our method combines representation learning with feature mapping to leverage multiple sensor information made available during training while using a single sensor during testing or in real usage. Our results show that the proposed approach can improve the F1-score of the complex activity recognition by up to 17% compared to that in training while utilizing the same sensor data in a new user scenario.



There are no comments yet.


page 10

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Human Activity Recognition (HAR) is the process of automatically identifying what a user is doing from sensor observations. Cameras, wearable sensors, and object-attached sensors have been used for recognizing human activity in different scenarios [7]

. Estimating the activity of a user, correctly enables a wide range of applications, such as, assisting elders with activities in their daily lives 

[13], detecting abnormal situations [1], or early disease diagnosis [10].

Traditionally, activity recognition relies on supervised machine learning 

[19]. That is, a set of activity examples, i.e. the training set

, is used to learn a function to classify new activity examples, i.e. the

test set. This model assumes that both the training set and the test set have the same dimensions; in other words, they share the same number and type of features used to represent sensor data. The easiest way to comply with such an assumption is to use the same set of sensors for measuring activities in both the training and test examples. While it is easy to collect training data with multiple sensors, they are rarely used in real-life environments because of the following reasons:

  • Users may not have multiple devices due to costs or comfort

  • Users may not want to use some sensors due to privacy concerns

  • Users may not want to enable multiple sensors due to increased battery consumption

Current plans to improve activity recognition accuracy usually rely on placing more devices in different body positions or using more sensors on the same device [15]. Compared to using a single device, using sensors in multiple body locations can increase the accuracy by 35% and by as much as 52% [3]. Using multiple modalities on a single device can increase the accuracy by 25% compared to using only the accelerometer sensor of a wearable device [20]111These improvements are seen in physical activities and user-dependent models. However, even if we can collect experimental data with highly accurate and multiple sensors, the results of these experiments are usually unrealistically high compared to those obtained in real life. This is because a typical user will not have all the same sensors used in the laboratory. As a consequence of this, models trained with experimental data are hard to transfer to real-life situations. The difference in the number and precision of sensors used in training and real-life settings makes the models trained with laboratory data unusable in real-life (Figure 1). Nevertheless, training models with only one sensor which is normally the case in real settings considerably reduces the performance of activity recognition as it does not take full advantage of the data collected for training.

Figure 1: Multi-sensor activity recognition models are trained with data from multiple sensors. However, not all sensors may be available for the final user. If sensors are different during test time, the different input dimensionality will make the model unusable.

In this paper, we study the possibility of building single-sensor activity recognition with multiple sensor training data. We formulate the learning problem and propose a method for its solution (Section 3). Our approach uses representation learning and feature mapping to bring both the training and testing data to a shared representation space which is used as an input for the learning task. This solution is inspired by various transfer learning approaches [32, 46, 29, 2, 37] which learn a shared representation space from the source-task data and then apply it to the target learning task. Previous approaches, however, do not assume the setting of having different number of sensors between training and testing phases; rather, they take advantage of having more samples (unlabeled data, data with different labels or data from the same number of sensors but in different layouts). Our approach takes advantage of having additional sensors for learning. We further discuss the differences between our proposal and related work in in Section 2. The main contributions of this paper can be summarized as follows:

  1. We formulate a problem in which multi-sensor data is available during the training phase of the activity recognition models, yet only a single sensor is available for testing. This problem setting represents the current gap between laboratory and real-life settings in complex activity recognition. We propose a method to solve this problem using representation learning and feature mapping, by which we can take advantage of multi-sensor data for training (Section 3). The new space represents actions that are easier to recognize that the initial complex activities.

  2. We extensively evaluate the proposed method using four publicly available datasets with both simple and complex activities to assess the hypothesis that the method is more effective when dealing with complex activities. Our results show that the proposed method improves the recognition performance (F1-Score) for complex activities by as much as 17% (Section 4).

  3. We use a boosting approach to further improve the performance which results in an improvement of 15% compared to that of a method without boosting (Section 4).

  4. We discuss why the proposed method improves performance for complex activities by analyzing the decomposition into actions. We also discuss further implications of this study including how to take advantage of settings with additional sensors for training (Section 5).

Our results show that we can achieve improvements by as much as 17 percentage points in the F1-Score in user-independent models (Section 4

). The main advantage of the proposed approach is that it is suitable for any set of sensors and can work with any number of algorithms. To the best of our knowledge, this work is the first to try single sensor-based activity recognition while training with multiple (high dimensional) sensors. Our solution combines three solutions studied before for different problems (feature learning, feature mapping and supervised learning) to leverage the strengths of the multiple-sensor data to improve single-sensor-based activity recognition. We discuss the implications of our results, practical application scenarios and limitations of this study in Section 

5. Finally, we present our conclusions in Section 6.

2 Related Work

Our work builds upon current research efforts to reduce the need of labeled data for activity recognition such as semi-supervised learning 

[11, 23, 41, 4, 25, 44, 40], and transfer learning [42, 5, 33, 14, 36, 9]

. These approaches take advantage of additional data; semi-supervised learning takes advantage of unlabeled data and transfer learning usually takes advantage of labeled data from a different domain by referring to an input of the same dimension but with a different probability distribution or data from a different task (data with the same dimension but with different labels) 


Our method is inspired in techniques that learn a shared, low-dimensional representation for transfer learning [2, 32, 29, 37]. Like these works, we propose an approach in which we first learn a common representation for the learning task. Learned representations have been successful in activity recognition because they can be more robust to small variations in the input features rather than the hand-crafted features [4, 39, 27, 21, 39, 31]. The underlying difference in our work is that we assume that the training data comes from a larger number of sensors compared to that of the testing data and that those sensors might be of different quality. We intend to take advantage of the experimental data where more sensors than those used in real life are available. Research in transfer learning has shown that the transfer is more successful when the tasks are related [30]. In the problem setting, we are proposing that the relation comes from the fact that the training and testing input represents the same activities, thus the actions that compose them are the same.

In activity recognition, transfer learning has studied problems of source and target domains having different sets of activities, different sensor layouts or different sets of sensors [9]. Our proposal assumes that the source (training data) and target (testing data) domains have a different set of sensors. In [36], a teacher-learner approach was used to label the instances of the new set of sensors. However, the only knowledge transferred was that of the label to be assigned to the instance. Transfer learning based on feature representation have used manually designed features [42] and learned features [38, 45]. Our approach is similar to those using a unified learned feature space (also called ”latent variables”). However, these approaches assume the same number of features for training and testing because both training and test inputs are images [45], or use the same set of sensors but in different layouts [38]. This makes the mapping from the feature space to the shared representation space straightforward. Nevertheless, when a different set of sensors is used, its consequent mapping must also be learned. A similar work uses multi-task learning [37] for learning a shared representation space across datasets. In this work, all datasets were collected with a single device, hence not considering the use of multiple sensors at different positions, or different dimensions of input when testing.

A setting with a similar solution approach is zero-shot learning where a semantic representation is used to describe activities, and the sensor features must be mapped to this representation [8, 28]. In this setting, the mapping is learned by formulating it as a multi-label classification, or a multiple regression problem depending on whether the attributes are binary or continuous. Inspired by the usage of attribute mapping in zero-shot learning techniques [18], we also use feature mapping to change data representation from the single sensor feature space to a learned space that represents information from multiple sensors. However, the problem setting of zero-shot learning is different from the problem setting studied in this paper.

Our proposed method takes advantage of data from additional sensing sources that complement the training data. In this regard, our approach can be thought as an instance of domain adaptation [46, 26], where the training data is different from the testing data. However, the case where this difference becomes additional information sources as studied in this work, has attracted less attention. Although this problem setting of using higher dimensional data for training is similar to the Learning Under Privileged Information paradigm [43], this paradigm uses privileged information to accelerate the learning rate for specific algorithms whereas we propose a generic approach that can be used with any classification algorithm. Using this approach, the representation space can be more robust and the labeling effort can be reduced.

In summary, the main difference between our problem setting and other problem settings is that we assume that training data differs from the test data; not only in the number of sensors (features), but also in the quality of the sensors used. This difference forces us to develop a new method of learning as current classification models assume that the dimensions of the input does not change in testing. We propose an approach based on feature learning and feature mapping from single sensor features to learned features from multiple sensors. The differences between our proposed approach and other approaches are:

  1. we do not assume any knowledge about the attributes that describe each activity, similar to attribute-based learning approaches and feature transfer learning approaches. We use high-dimensional data to learn this.

  2. Given the different input dimensions between training and testing data, mapping test data to learned features is not as straightforward as it is in other feature-learning approaches. We use a multiple-regression approach for this mapping.

  3. The proposed approach can use any set of sensors and can adapt to suit any combination of algorithms.

3 Single-Sensor Activity Recognition from Multi-Sensor Training Data

Figure 2: Proposed algorithm to learn a single-sensor activity recognition model using multi-sensor data. During the training phase (top) multiple sensors are used to learn a representation space (1). A mapping from single sensor to the representation step is learned (2), and the classification model is trained with the representation space as inputs (3). The final model is used for testing.

In this section, we describe our proposed approach. This method uses knowledge that can be obtained from data in controlled settings using multiple sensors in settings where fewer and possibly less precise sensors are being used. Such condition causes inputs of different dimensionality for training and testing. We first present an overview of the method and challenges (Section 3.1), then we define the problem formally (Section 3.2) and the proposed algorithm (Section 3.3).

3.1 Overview

To learn using more sensors than those available in the final system, we have to overcome the main challenge which is to find a common representation for the training and testing data. This is because classification algorithms assume no change for the dimensions of the input during testing. Therefore, we use representation learning and feature mapping to solve this challenge (Figure 2). Representation learning is used to learn features that encode information from the whole-body sensors (or multiple sensors). The information provided by multiple sensors help us to discern activities that may have similar movements in one limb but have different movements in other limbs. Feature mapping is used to map the features from the single-sensor to this learned representation.

To understand why combining multiple sensor information into a single representation is helpful, let us consider the examples in Figures 3 and 4 where windows of different activities are depicted. As noticed in the figures, for the pair eat-cook, the sensor in the arm discerns better between both activities whereas, for the prepare-eat pair, the leg sensor might discern better. However, using the sensor in the wrist is more practical so when the user wears the sensor in the arm position for all activities there might be confusions. The learned representation, combining the information from both arm and leg, can help to discern those activities that were initially thought to be the same. The second challenge is to correctly map the single sensor to the learned representation. This is achievable because we map independently to each of the features in the new space, which are simpler classes compared to the complex activities. The new features can be thought of as actions.

Figure 3: Differentiating ”eat” from ”cook” is difficult with only the leg sensor (top) but the arm sensor (bottom) can provide hints of the differences
Figure 4: In contrast, to differentiate ”prepare” from ”eat”, the leg sensor (top) might provide more information than the arm sensor (bottom)

After learning the representation space, the next challenge is to learn to classify activities (Step 3 in Figure 2). This is a traditional supervised learning problem but given the problem setting, data for this model can come from multiple sensors or from the single sensor. In the testing or usage phase, the mapping function and classification model are used to recognize activities.

3.2 Problem Formulation

The proposed problem setting is one where the training and testing data have different dimensionality. Training data has higher dimensionality than that of testing data because it is measured from multiple sensors whereas testing data is measured with a single sensor. Moreover, we can assume that this sensor may have lower accuracy and precision than the sensors used in the training data.

We assume that a feature space exists that represents the non-redundant and important information needed to classify the activities; the representation space. The first step in the algorithm is to learn this feature space from the multi-sensor training data. In this paper, we use clusters of simple sensor features as the representation space; however, other representation learning algorithms can be used. Our main goal in this paper is to evaluate the feasibility of this approach rather than evaluating feature learning approaches.

Let us now describe the problem setting as compared to traditional activity recognition. The activity recognition problem is typically defined as a classification problem. We train an activity recognition model using a set of pairs (), where

is a feature vector and

is an activity label. The task of the traditional activity recognition is to find a function that minimizes the classification error .

In the proposed problem setting, we have a set of triples (), where , , and . The task is to find functions and so that the classification error is minimized. The representation space is learned from as

3.3 Learning Algorithm

Using the previous problem formulation, we now describe the proposed algorithm (Algorithm 1).

We use unsupervised learning for feature learning as it enables the use of a high volume of unlabeled data. Therefore, the input of the algorithms are three sets of data instead of a triple as stated in the problem formulation; one consisting of an unlabeled multi-sensor data, one pair of multi-sensor and single sensor data, and a pair of labeled single sensor data.

The algorithm first learns the representation space (Line 1) using multi-sensor data. Representation-learning algorithms include dimensionality reduction algorithms such as PCA, and clustering and supervised learning algorithms like encoders. In this paper, we use feature clustering. The goal is to find a representation space of dimension , with , so that it acts as a middle layer between the multi and single sensor data spaces.

Data: ( (, ), (, ),
Result: Mapping function and activity classification model
/* Learn representation */
1 repModel = learnRepresentation(, ))
2 mappings = []
3 repData = repModel()
4 i = 0 while  do
5       target = repData[:, column];
6       mapping[column] = linReg(, target)
7 end while
/* Classification model */
8 data = []
9 i=0
10 while  do
11       data = data.append(mapping[column]()) 
12 end while
13model = learn_classification(data, )
Algorithm 1 Learning algorithm using multi-sensor data for training and single-sensor data for testing

The next step is to learn a mapping that takes the single-sensor data to the middle-layer representation space. We use multiple regression for this (Line 1 to Line 1

). In this paper, we evaluate both linear and logistic regression for this step.

Notice that the dataset used for learning the representation () can differ from that used to learn the mapping (). Moreover, notice that both datasets are unlabeled as only data from the sensors are needed for these two first steps.

Finally, the classifier is learned using features of the learned representation space as input and activity labels as output. This is equivalent to a traditional supervised machine learning problem for activity recognition. Notice that this is the only step where labeled data is needed.

The main advantage of having separate processes for learning the feature space, mapping and classifier is that we can use a small dataset to learn the final classifier while using a large unlabeled dataset for learning the representation and mapping. Since it is common to collect more unlabeled data than labeled data (free-style experiments can be conducted for obtaining unlabeled datasets), we can use data that covers a wide range of contexts for steps 1 and 2, making the representation space more robust.

This algorithm has been implemented in Python for the evaluation.

3.4 Leveraging Models for Performance

The second challenge in the proposed approach aims at minimizing the final classification error. Given the serial architecture of the proposed method, the final classification error will depend on the mapping and classification errors. In short, even a perfect classifier can have a high error for certain activities if the mapping to the common representation space is not perfect.

We propose to combine the traditional activity recognition model with the proposed model to achieve a better performance. The basic intuition is that the proposed model will help to better differentiate difficult activities from each other whereas other activities will be well recognized by the traditional approach. We use a multi-class adaboost algorithm [12] to combine the two learners; proposed method and traditional approach. For this, we first train the activity recognition model using the proposed method. Then, based on the training errors of this model, we train a classifier using the traditional approach giving more weight to the samples with high error that are identified in the first step.

3.5 Effective Domains of the Proposed Method

We hypothesize that, for our method to work, each feature in the learned representation space should be equivalent to an action or a low-level activity. Recognition of low-level activities is then easier from a single sensor. Given that physical activities can be easily recognized from a low-precision sensor, their feature learning will not improve the total accuracy or equivalent, compared with that of the traditional method. On the other hand, for complex activities, we can expect that the mapping from single-sensor features to the learned representation space will be effective, and the activity classification from the learned representation space to the complex activities will be better than that from single sensor features.

Given this rationale, we expect that the proposed method is mainly effective when dealing with complex activities, which can be decomposed into actions. Besides, for physical and low-level activities, the proposed method can keep equivalent or not worse performance to traditional methods. We will demonstrate these assumptions in the following section.

4 Experimental Evaluation

In this section, we describe the experimental settings and results of the evaluation for the proposed approach. The evaluation compares the performance of the proposed method against a traditional pipeline using a single (and the same) sensor for both training and testing. Since our method uses clustering for learning the representation space, we first evaluate the performance of a classifier using these clusters as features. This becomes the highest expected performance of our method.

We expect the method to be more useful for complex activities as these activities have shown higher improvements when using multiple sensors. We also expect it to have a higher impact when the training sensors have higher quality than the testing sensors as they can learn a stronger representation.

Furthermore, we evaluate the sensitivity of the proposed approach to two parameters, namely the window size parameter and the number and type of sensors included for training.

In the following section, we describe the datasets used for the evaluation (Section 4.1), implementation and evaluation scenarios (Section 4.2) and summarize the results obtained in the performance evaluation (Section 4.3).

4.1 Datasets

To evaluate our method, we consider four publicly available datasets with several sensors in different placements. Some important aspects of the data are summarized in Table 1. We classify the activities as complex, gestures, and physical activities. Complex activities have longer durations and are composed of several different actions. Physical activities, considered simpler, involve the repetition of an action, possibly periodically. For example, walk involves the repetition of steps. Gestures are short and may or may not be repetitive; for example, take consists of a non-periodic movement but Cut may involve repetition of movement. Below, we describe briefly each dataset, summarize its key points and the data pre-processing used.

Dataset (Activity type) № of sensors № of subjects № of classes № of windows Locations (sampling rate)
OPP HL [35] (complex activities) 5 IMUs 4 6 Upper and lower arms and back (30Hz)
Cooking dataset [16] (complex activities) 5 IMUs 7 16 7105 Lower legs, Lower arms, and upper back (120 Hz)
PAMAP [34] (physical activities) 3 IMUs 9 12 28720 Chest, dominant hand and ankle (100Hz)
OPP Locomotion [35] (physical activities) 5 IMUs 4 5 21013 Upper and lower arms and back (30Hz)
Table 1: Datasets used for the evaluation

4.1.1 Cooking Task Dataset:

[16] This dataset recreates a meal-time routine consisting of the following major tasks: (i) Prepare a soup (ii) Set table. (iii) Eat meal. (iv) Clean up and put away utensils. The dataset is labeled with gestures, and we set the labels for these 4 activities by analyzing the activities done and the script given in the paper. The dataset was collected in a laboratory setting with some utensils replaced by physical props and some actions shortened. We use windows of 1 second with 0.25sec step. In this evaluation, we used the accelerometer sensor of the IMU sensors.

4.1.2 Opportunity Dataset

The Opportunity dataset [35] was recorded in a room simulating a studio and includes activities of a morning routine performed by 4 subjects. The activities were labeled in different levels; locomotion, gestures, and high-level activities. In our evaluation, we used the high-level activities (OPP HL) which include Relaxing, Coffee (prepare and drink), Sandwich (prepare and eat), Early-morning (check objects in the room), and cleanup, and the locomotion activities (OPP Loc) which are: Stand,Walk,Sit,Lie, and Unlabeled. In total, there are almost 6 h of recorded activity in this dataset plus an additional 2 h of unlabeled data (’Drill’ run). We use this unlabeled data in the feature learning stage.

Although the dataset includes multiple on-body and object sensors, in this evaluation we used only the accelerometer sensor of the IMU sensors. These sensors are only placed in the upper body thus, the placements are not as diverse as in the cooking dataset.

For the high-level activities, we used windows of 30 s with 15 s step. For the locomotion activities, we used 3-s windows with a 2-s step (1-s overlap).

4.1.3 PAMAP Dataset

The Physical Activity Monitoring Dataset [34] (PAMAP) is a benchmark dataset for physical activity monitoring. The activities are; lie, sit, stand, walk, run, cycle, Nordic walk, iron, vacuum clean, rope jump, and ascend and descend stairs. Although some subjects performed other activities during data collection, we did not include them as not all subjects performed those activities during that time. This dataset contains approximately 8 h of recorded activities. We used windows of 5.12 s with a step of 1 s as per recommendations of the original publication.

4.2 Implementation and Experimental Setup

We implemented the algorithm proposed in Section 3

in Python using the scikit-learn library for the evaluation. We used feature clustering to learn the feature space using 3 clusters per sensor in all cases. That is, we use 15 clusters for datasets using 5 sensors, and 9 clusters for PAMAP. We implemented the algorithm with and without boosting, using linear and logistic regression for the mapping function. In total, we implemented and evaluated 4 algorithms that depended on the mapping function and if the boosting approach was used or not: linear regression (LinR), logistic regression (LogR), linear with boosting (LinB) and logistic with boosting (LogB).

For each sensor axis, we extracted the following statistical features; mean, standard deviation, range (the difference between minimum and maximum) and the difference between the mean and the median. We used the same features for all datasets.

As evaluation protocol, we used a leave-one-subject-out cross-validation approach (user-independent models). This approach evaluates the robustness of the classifier to new users. We chose this approach as our method is intended to transfer controlled experiments into real-life settings where users will be unseen. In this protocol, we used data from all users except one for both training stages, which is both feature learning and mapping learning.

As evaluation metrics, we used the micro average F1-Score [47] to compare the performance across all activities of the proposed and traditional approach.

4.3 Results

We now present the results of the empirical evaluation. We first evaluate the hypothesis that states: using learned features from multiple sensors achieves better performance than traditional single-sensor features (Section 4.3.1). We then evaluate the effectiveness of the proposed method by assessing the performance when the single sensor is used as the input for feature mapping (Sections 4.3.2, 4.3.3, and 4.3.4). Our results show that the improvement is proportional to the difference between the multi-sensor performance and the traditional single-sensor approach. Furthermore, we go on to evaluate how the proposed method performs when using lower-quality sensors for testing (Section 4.3.5) and when including different types of sensors for learning the representation (Section 4.3.6). The results are positive and encouraging, showing that the proposed method can take advantage of these settings.

4.3.1 Measuring The Gap between Single-Sensor and Multi-Sensor Performance: Analysis of maximum expected improvement

We first evaluate the gap between the performance of activity recognition using a single sensor and the performance of when learned features from all sensors are being used (cluster features). The main objective in this evaluation is to measure the maximum possible improvement, as the expected performance of the proposed approach will be less than the performance when all sensors are used also in testing. As shown in Figure 5, there is a potential for improving 34 percentage points the F1-Score of activity recognition using the right leg sensor in the Cooking Dataset (0.51 - 0.17), and a potential of only 0.09 when using the Left Lower Arm (LLA) sensor in the OPP Loc dataset (0.65 - 0.56). We can also observe cases where the performance of using clustered features is lower than that of using the individual sensors (PAMAP dataset).

Figure 5: Preliminary analysis of the maximum expected performance compared to traditional approach with single sensor. The performance obtained when multiple sensors are used for training and testing (cluster representation) is the upper limit for the performance expected with the proposed approach.

The previous evaluation confirms that in most cases, and specially for complex activities, a classifier using multiple sensors (with learned features) in both training and testing achieves a better performance than one using a single sensor in both phases. We now evaluate the performance of the proposed approach, which uses multiple sensors for training and a single sensor for testing. The performance depends on an effective map learning and leveraging of both models when the boosting approach is used.

4.3.2 Comparing the Proposed Approach against Traditional Single-Sensor on Complex Activities

As mentioned in Section 3.5, we expect the proposed approach will yield the largest improvements for complex activities because they can be decomposed into actions when the features are learned.

Figure 6 shows the performance of the proposed approach compared to the baseline of a traditional approach using a single sensor in the Cooking dataset. It should be noted that we used the same features for the baseline and for learning the mapping in the proposed approach. The improvements in the proposed approach come from the use of better representation, learned using multiple sensors for the representation learning task. The observed improvements in the F1-Score are 17 percentage points (pp) for the right leg sensor, 13 pp for the left leg sensor, 9 pp for the right arm sensor, 6 pp for the left arm sensor and 5 pp for the back sensor. These are significant improvements that show that we can leverage the use of multiple sensors when collecting training data for the models, while still using a single sensor during the usage of an activity recognition application.

Figure 6: F1-Score of the proposed approach compared to the traditional approach using only one sensor as input in the Cooking Dataset. Improvements of 17% and 13% are observed for the right and left leg respectively. For the arms, improvements of 9% and 6% are observed. The improvements are proportional to the large differences between the traditional single and multi sensor activity recognition.

In the Opportunity-HL dataset (Figure 7), we observe the largest benefit from the proposed approach when there is a larger difference between the performance of the single-sensor compared to the multiple sensor in the traditional approach (Figure 5). The RLA sensor shows an improvement of 6 percentage points (pp) in the F1-Score compared to the traditional single-sensor approach. Other sensors show improvements of 3pp (LLA and LUA), 2pp (RUA) and 1pp (BACK). This result is consistent with a smaller room for improvement observed in Figure 5.

Figure 7: F1-Score of the proposed approach compared to the traditional approach using only one sensor as input in the OPP-HL Dataset. Improvements of 6% and 3% are observed for the RLA, LLA and LUA sensors. Again, these improvements are proportional to the differences between multiple and single sensor performance in Figure 5.

4.3.3 Comparing the Proposed Approach against Traditional Single-Sensor on Physical Activities

Figure 8 shows the performance of the proposed approach compared to the baseline for datasets representing physical activities. In contrast to complex activities (Section 3.5), when we deal with physical activities (OPP-Loc and PAMAP datasets), the performance of multiple sensors is similar or lower to that of the single sensor. Thus, the performance of the proposed approach is lower than that of the traditional approach.

Figure 8: F1-Score of the proposed approach compared to the traditional single-sensor approach for simple activities (OPP Loc and PAMAP datasets). As was expected, for simple activities there are little to no improvements because they are not decomposed into smaller actions.

4.3.4 Performance of Boosting and no-Boosting Approaches

Figure 9 shows the results for all the sensors using all four algorithms; linear regression (LinR), logistic regression (LogR), linear with boosting (LinB) and logistic with boosting (LogB). Even if the proposed method with linear mapping and boosting have the best performance in most cases, in the OPP-Loc dataset, the best performance is achieved when no boosting is used (except for RUA sensor). In other cases, we can observe that using boosting improves the performance of the model by as much as 15 percentage points when compared to no boosting (refer to the right leg sensor in the cooking dataset).

Figure 9: F1-Score for the 4 algorithms implemented: The best score is shown in Dark green background, showing that in complex and middle-level activities, the proposed methods always have the best performance. Linear mapping has better performance than logistic performance in general. In most cases, boosting further improves the performance.

4.3.5 Performance when Testing Sensor has a Lower Quality than Training Sensors

The OPP dataset includes accelerometer sensors of two different quality levels. As specified by the dataset description, the accelerometer sensors can have more data loss than the IMU sensors, so the former is used in our evaluation as ’lower quality’ sensors. This setting clearly represents the motivation of this work which is the gap between laboratory and real-life settings. It is common to see that the sensors in the laboratory are of higher quality and are controlled better than those of the final users.

For this setting, we used the IMU sensors for learning the representation, and the lower quality acceleration sensors for learning the mapping and classification function. Figure 10 shows the performance of the proposed approach in this setting. Notice that the performance of the traditional single-sensor approach is lower when high-quality sensors are used (Figure 9). However, the improvements are visible when using the proposed approach instead of the traditional approach. In this case, we observe improvements of 12 percentage points for the back sensor, 13 percentage points for the HIP sensor, 8 percentage points for the RWR sensor and 5 percentage points of the LWR sensor. The case of the hip sensor is interesting because, in the dataset used to learn the representation, there is no HIP sensor. This means that the positive transfer can occur even at new sensor positions.

Figure 10: F1-Score when using lower quality sensors as testing: Results in the OPP-HL Dataset suggests that the method can be used with sensors that are of lower quality than those used in learning the representation. The improvements are as much as 12 percentage points for the HIP and BACK sensors.

This result shows that the proposed approach can be used to learn models that will use different sensors than those used in the laboratory.

4.3.6 Using Different Types of Sensors for Feature Learning

We noticed that if we include different types of sensors during feature learning, the performance of the proposed approach will improve. For this, we included not only the accelerometer measures as before but also the gyroscope and magnetometer measures of the Inertial Measurement Units during the learning features step.

For this experiment, we used the OPP-HL dataset and the low-quality sensors for testing. We did not use the boosting approach to assess the impact of including additional information when training. The F1-Scores for each accelerometer are shown in Figure 11. As shown in the Figure, The HIP sensor shows a consistent improvement as more sensors are included for training. The improvement compared to the traditional method is almost 10 percentage points. The BACK sensor shows an improvement when the gyroscopes are included, however, when the magnetometer is also included, the performance is equal to that of training with only the accelerometers. For the LWR and RWR sensors, performance decreases when the gyroscope measurements are included but it increases again when the magnetometer is included. Interestingly, for the RWR sensor, when using all three types of sensors, the proposed method outperforms the traditional method, which had only been outperformed in the previous evaluation (Figure 10). In contrast, for the RUA sensor, the performance decreases as more sensors are used; the same phenomenon is observed for the LUA sensor.

Figure 11: F1-Score as more types of sensors are used for learning features Including different types of sensors for learning the representation space can further increase the improvements, notably for HIP (+10pp) and RWR (+6pp) sensors.

This result shows that the improvement depends on the type of information that is included for learning the representation space, but also in the testing sensor information and its placement.

5 Discussion

This study was designed to determine whether or not we can use additional information sources to achieve activity recognition with a single sensor. Our results show that our proposed method can improve the performance of single-sensor activity recognition, notably for complex activities. In this section, we analyze these results and the implications of this study.

5.1 Success Factors

Our results show that using multiple sensors for learning a common representation space can improve the performance of single-sensor-based activity classification models. The main question that this research raises is: How does using additional information help in the model? For example, if the final user uses a watch how does having information from the leg help in the classification?

As we showed in the method section, a single position cannot differentiate among multiple activities. Instead, each sensor is better at discerning different pairs of activities. In the learned representation, the combined knowledge can help discern among more activities. Our intuition is that there is an underlying representation of the activity, similar to the shared representation of multiple related tasks [2]. Even if the dimensions of the training and testing data are different, both sets are related because they are generated by the same activities. This relation can enable the positive transfer of knowledge [30]. The shared representation can be understood as the actions of which the activities are composed of [17, 22, 6]. It has been shown that manually designed action primitives, such as cut or reach, help to recognize complex activities that are composed of many such primitives [24]. However, designing useful primitives for each domain and obtaining labeled data for learning them can be a difficult and time-consuming task. Therefore, our proposal first learns such primitives from the multi-sensor data.

The success of our proposed approach depends on two conditions; (1) finding an effective representation and (2) minimizing the mapping error. As is the case of the PAMAP dataset, when the representation is not effective, i.e. its performance is lower than that of a single sensor, the proposed approach is not useful. However, when the representation is effective, we have seen that the improvements can reach 17 percentage points as was the case of the Cooking dataset (Figure 6). More interestingly, the case of the OPP-HL when using sensors of lower quality for testing (Figure 10) demonstrated that the proposed approach can be used to improve the performance of activity recognition systems in realistic conditions by training with high-quality data.

The mapping error depends on the test sensor position and its relation to the learned representation, as would be the case in other regression cases. We hypothesized that learning such representation is ”easier” than learning to recognize complex activities. This is because each feature in the learned representation can be thought of as an ”action” or a less complex activity. This is supported by our results: for the complex activities, the Cooking and OPP-HL datasets, we observe improvements, but for the simpler activities, the OPP-Loc and PAMAP dataset, we do not observe significant improvements. This shows that the proposed method is more useful in recognizing complex activities.

5.2 Combining with the Traditional Approach

As the mapping error is not zero, we proposed to use boosting to combine the proposed approach and the traditional activity recognition approach. We observed that using boosting obtained the best performance and the use of boosting improved the performance compared to that of no-boosting in most cases (Figure 9). This determines that boosting can reduce the effect of mapping errors. We also observed that boosting can improve the score, even beyond that of the classification using learned features. For example, in the PAMAP dataset, the clusters classification F1-Score was 0.42 but the boosted approach for the chest sensor achieved an F1-Score of 0.47. This is because boosting uses the features of the original sensor and the shared representation space, and thus, can recognize those activities that were already well classified with the single sensor.

5.3 Implications of the Study

The results we have presented raise the possibility that if we have good additional information, then we can leverage this by using the proposed method and improve the performance of activity recognition. The additional information can come from additional sensors (Section 4.3.2,4.3.4), sensors of higher quality (Section 4.3.5), or from sensors of different types (Section 4.3.6). These findings raise questions regarding how to find good additional information and how to choose it. This is an important issue for future research. Our results show that having more additional sensors is better, especially placed in different body positions (upper and lower body placements).

There are two main points for future research about the potential improvements for this method: (1) evaluating different classifiers and (2) evaluating different feature learning methods. In this study, we only used SVM as a classifier to focus on the influence of the proposed approach. Based on our results, we consider that if one classifier performs better for the traditional approach, it will also perform better with the proposed approach. As the proposed method is general and can be used with any classification algorithm, further research should be undertaken to investigate the effects of using different classifiers. Similarly, we used feature clustering as a feature learning method but as discussed in the related work, several feature learning methods could be used. Since the improvements depend on the performance of these features, using better approaches for feature learning would yield higher improvements. However, it is important to keep the dimensionality of the representation space low so that the mapping function can be learned. A further study with more focus on feature learning is suggested.

6 Conclusions

In this paper, we have proposed and evaluated a new activity recognition approach that takes advantage of additional information collected during experimentation or short periods in the real-life setting. Our main motivation is the observation that though it is possible to use multiple high precision sensors during short time intervals, it is not always possible to use it in real-life continuously. Due to the fact that multi-dimensional information can better characterize an activity, our approach aims at using this knowledge to recognize different activities easier.

Our proposed approach is based on feature learning using multi-dimensional and high-precision information. We hypothesize that this new representation space encapsulates knowledge about motion movements like actions or other simple gestures. By mapping single sensor features to such representation spaces, we can effectively model activities. While our results show modest improvements, they imply that performance (as measured by the F1-score) of an accelerometer-based activity recognition model can be improved through the use of additional information for training, notably, the performance on complex and middle-level activities.

For future works, we will focus on the evaluation of the proposed approach using laboratory data as training data for real-life settings to study if the knowledge obtained in experimental settings can be successfully transferred.

The proposed approach has practical applications in fields other than activity recognition. The condition of having access to specialized sensors only for a short time can occur in other domains involving behavior studies where cameras can be used for recording a small sample.


  • [1] S. Abbate, M. Avvenuti, F. Bonatesta, G. Cola, P. Corsini, and A. Vecchio (2012) A smartphone-based fall detection system. Pervasive and Mobile Computing 8 (6), pp. 883 – 899. Note: Special Issue on Pervasive Healthcare External Links: ISSN 1574-1192, Document, Link Cited by: §1.
  • [2] A. Argyriou, T. Evgeniou, and M. Pontil (2006) Multi-task feature learning. In Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, Cambridge, MA, USA, pp. 41–48. Cited by: §1, §2, §5.1.
  • [3] L. Bao and S. S. Intille (2004) Activity recognition from user-annotated acceleration data. In Pervasive Computing, A. Ferscha and F. Mattern (Eds.), Berlin, Heidelberg, pp. 1–17. External Links: ISBN 978-3-540-24646-6 Cited by: §1.
  • [4] S. Bhattacharya, P. Nurmi, N. Hammerla, and T. Plötz (2014) Using unlabeled data in a sparse-coding framework for human activity recognition. Pervasive and Mobile Computing 15, pp. 242–262. External Links: Document, ISBN 1574-1192, ISSN 15741192 Cited by: §2, §2.
  • [5] U. Blanke and B. Schiele (2010-10) Remember and transfer what you have learned - recognizing composite activities based on activity spotting. In International Symposium on Wearable Computers (ISWC) 2010, Seoul, South Korea, pp. 1–8. External Links: ISBN 978-1-4244-9046-2, ISSN 15504816 Cited by: §2.
  • [6] A. A. Chaaraoui, P. Climent-Pérez, and F. Flórez-Revuelta (2012) A review on vision techniques applied to Human Behaviour Analysis for Ambient-Assisted Living. Expert Systems with Applications 39 (12), pp. 10873–10888. External Links: ISSN 09574174 Cited by: §5.1.
  • [7] L. Chen, J. Hoey, C. D. Nugent, D. J. Cook, and Z. Yu (2012-11) Sensor-based activity recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42 (6), pp. 790–808. External Links: Document, ISSN 1094-6977 Cited by: §1.
  • [8] H. Cheng, M. Griss, P. Davis, J. Li, and D. You (2013) Towards zero-shot learning for human activity recognition using semantic attribute sequence model. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’13, New York, NY, USA, pp. 355–358. External Links: ISBN 9781450317702, Link, Document Cited by: §2.
  • [9] D. Cook, K. D. Feuz, and N. C. Krishnan (2013-09-01) Transfer learning for activity recognition: a survey. Knowledge and Information Systems 36 (3), pp. 537–556. External Links: ISSN 0219-3116, Document Cited by: §2, §2.
  • [10] J. Favela (2013-07) Behavior-aware computing: applications and challenges. IEEE Pervasive Computing 12 (3), pp. 14–17. External Links: Document, ISSN 1536-1268 Cited by: §1.
  • [11] D. Guan, W. Yuan, Y. Lee, A. Gavrilov, and S. Lee (2007-08) Activity recognition based on semi-supervised learning. In 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007), Daegu, pp. 469–475. External Links: Document, ISSN 2325-1301 Cited by: §2.
  • [12] T. Hastie, S. Rosset, J. Zhu, and H. Zou (2009) Multi-class adaboost. Statistics and its Interface 2 (3), pp. 349–360. Cited by: §3.4.
  • [13] J. Hoey, C. Boutilier, P. Poupart, P. Olivier, A. Monk, and A. Mihailidis (2012) People, sensors, decisions: Customizable and Adaptive Technologies for Assistance in Healthcare. ACM Transactions on Interactive Intelligent Systems 2 (4), pp. 1–36. External Links: Document, ISBN 21606455, ISSN 21606455 Cited by: §1.
  • [14] D. H. Hu, V. W. Zheng, and Q. Yang (2011) Cross-domain activity recognition via transfer learning. Pervasive and Mobile Computing 7 (3), pp. 344–358. External Links: ISSN 15741192 Cited by: §2.
  • [15] A. Jain and V. Kanhangad (2018) Human Activity Classification in Smartphones Using Accelerometer and Gyroscope Sensors. IEEE Sensors Journal 18 (3), pp. 1169–1177. External Links: ISBN 9781509000821, ISSN 1530437X Cited by: §1.
  • [16] F. Krüger, M. Nyolt, K. Yordanova, A. Hein, and T. Kirste (2014-11) Computational state space models for activity and intention recognition. a feasibility study. PLOS ONE 9 (11), pp. 1–24. External Links: Document Cited by: §4.1.1, Table 1.
  • [17] P. Lago, C. Jiménez-Guarín, and C. Roncancio (2017-04) Contextualized behavior patterns for change reasoning in Ambient Assisted Living: A formal model. Expert Systems 34 (2), pp. e12163. External Links: ISSN 02664720 Cited by: §5.1.
  • [18] C. H. Lampert, H. Nickisch, and S. Harmeling (2009-06) Learning to detect unseen object classes by between-class attribute transfer. In

    2009 IEEE Conference on Computer Vision and Pattern Recognition

    Miami, FL, USA, pp. 951–958. External Links: ISSN 1063-6919 Cited by: §2.
  • [19] O. D. Lara and M. A. Labrador (2013) A Survey on Human Activity Recognition using Wearable Sensors. IEEE Communications Surveys & Tutorials 15 (3), pp. 1192–1209. External Links: ISBN 1553-877x, ISSN 1553-877X Cited by: §1.
  • [20] J. Lester, T. Choudhury, and G. Borriello (2006) A practical approach to recognizing physical activities. In Pervasive Computing, K. P. Fishkin, B. Schiele, P. Nixon, and A. Quigley (Eds.), Berlin, Heidelberg, pp. 1–16. External Links: ISBN 978-3-540-33895-6 Cited by: §1.
  • [21] J. Liu, B. Kuipers, and S. Savarese (2011) Recognizing human actions by attributes. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, Washington, DC, USA, pp. 3337–3344. External Links: ISBN 978-1-4577-0394-2, Link, Document Cited by: §2.
  • [22] Y. Liu, L. Nie, L. Liu, and D. S. Rosenblum (2016-03) From action to activity: Sensor-based activity recognition. Neurocomputing 181, pp. 108–115. External Links: ISBN 0925-2312, ISSN 09252312 Cited by: §5.1.
  • [23] M. Mahdaviani and T. Choudhury (2007) Fast and scalable training of semi-supervised crfs with application to activity recognition. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, USA, pp. 977–984. External Links: ISBN 978-1-60560-352-0, Link Cited by: §2.
  • [24] A. Manzoor, H. Truong, A. Calatroni, D. Roggen, M. Bouroche, S. Clarke, V. Cahill, G. Tröster, and S. Dustdar (2013-09) Analyzing the impact of different action primitives in designing high-level human activity recognition systems. J. Ambient Intell. Smart Environ. 5 (5), pp. 443–461. External Links: ISSN 1876-1364, Link Cited by: §5.1.
  • [25] R. Matsushige, K. Kakusho, and T. Okadome (2015-10) Semi-supervised learning based activity recognition from sensor data. In 2015 IEEE 4th Global Conference on Consumer Electronics (GCCE), Osaka, Japan, pp. 106–107. External Links: ISSN Cited by: §2.
  • [26] A. Natarajan, G. Angarita, E. Gaiser, R. Malison, D. Ganesan, and B. M. Marlin (2016) Domain adaptation methods for improving lab-to-field generalization of cocaine detection using wearable ecg. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’16, New York, NY, USA, pp. 875–885. External Links: ISBN 9781450344616, Link, Document Cited by: §2.
  • [27] L. T. Nguyen, M. Zeng, P. Tague, and J. Zhang (2015) I did not smoke 100 cigarettes today!: avoiding false positives in real-world activity recognition. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’15, New York, NY, USA, pp. 1053–1063. External Links: ISBN 978-1-4503-3574-4, Link, Document Cited by: §2.
  • [28] L. T. Nguyen, M. Zeng, P. Tague, and J. Zhang (2015) Recognizing new activities with limited training data. In Proceedings of the 2015 ACM International Symposium on Wearable Computers, ISWC ’15, New York, NY, USA, pp. 67–74. External Links: ISBN 978-1-4503-3578-2, Link, Document Cited by: §2.
  • [29] S. J. Pan, J. T. Kwok, and Q. Yang (2008) Transfer learning via dimensionality reduction.

    Proceedings of the National Conference on Artificial Intelligence

    2, pp. 677–682.
    External Links: ISBN 9781577353683 Cited by: §1, §2.
  • [30] S. J. Pan and Q. Yang (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22 (10), pp. 1345–1359. External Links: Document, PAI, ISBN 1041-4347 VO - 22, ISSN 10414347 Cited by: §2, §2, §5.1.
  • [31] T. Plötz, N. Y. Hammerla, and P. Olivier (2011) Feature Learning for Activity Recognition in Ubiquitous Computing. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two, Barcelona, Catalonia, Spain, pp. 1729–1734. External Links: Document, ISBN 978-1-57735-514-4, ISSN 10450823 Cited by: §2.
  • [32] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng (2007) Self-taught learning. In Proceedings of the 24th international conference on Machine learning - ICML ’07, New York, New York, USA, pp. 759–766. External Links: Document, ISBN 9781595937933, Link Cited by: §1, §2.
  • [33] P. Rashidi and D. J. Cook (2011-06) Activity knowledge transfer in smart environments. Pervasive and Mobile Computing 7 (3), pp. 331–343. External Links: ISSN 15741192 Cited by: §2.
  • [34] A. Reiss and D. Stricker (2012-06) Introducing a new benchmarked dataset for activity monitoring. In 2012 16th International Symposium on Wearable Computers, Newcastle, UK, pp. 108–109. External Links: ISSN 2376-8541 Cited by: §4.1.3, Table 1.
  • [35] D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, J. Doppler, C. Holzmann, M. Kurz, G. Holl, R. Chavarriaga, H. Sagha, H. Bayati, M. Creatura, and J. d. R. Millàn (2010-06) Collecting complex activity datasets in highly rich networked sensor environments. In 2010 Seventh International Conference on Networked Sensing Systems (INSS), Kassel, Germany, pp. 233–240. Cited by: §4.1.2, Table 1.
  • [36] D. Roggen, K. Förster, A. Calatroni, and G. Tröster (2013) The adARC pattern analysis architecture for adaptive human activity recognition systems. Journal of Ambient Intelligence and Humanized Computing 4 (2), pp. 169–186. External Links: ISBN 1868-5137 1868-5145, ISSN 18685137 Cited by: §2, §2.
  • [37] A. Saeed, T. Ozcelebi, and J. Lukkien (2019) Multi-task Self-Supervised Learning for Human Activity Detection. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3 (2), pp. 1–30. Cited by: §1, §2, §2.
  • [38] S. Samarah, M. G. A. Zamil, M. Rawashdeh, M. S. Hossain, G. Muhammad, and A. Alamri (2018-12) Transferring activity recognition models in FOG computing architecture. Journal of Parallel and Distributed Computing 122, pp. 122–130. External Links: ISSN 07437315 Cited by: §2.
  • [39] K. Shirahama, L. Köping, and M. Grzegorzek (2016) Codebook approach for sensor-based human activity recognition. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, UbiComp ’16, New York, NY, USA, pp. 197–200. External Links: ISBN 9781450344623, Link, Document Cited by: §2.
  • [40] M. Stikic, D. Larlus, S. Ebert, and B. Schiele (2011-12) Weakly supervised recognition of daily life activities with wearable sensors. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (12), pp. 2521–2537. External Links: Document, ISSN 0162-8828 Cited by: §2.
  • [41] M. Stikic, D. Larlus, and B. Schiele (2009-Sept) Multi-graph based semi-supervised learning for activity recognition. In 2009 International Symposium on Wearable Computers, Vol. , Linz, Austria, pp. 85–92. External Links: Document, ISSN 1550-4816 Cited by: §2.
  • [42] T. L. M. van Kasteren, G. Englebienne, and B. J. A. Kröse (2010) Transferring knowledge of activity recognition across sensor networks. In Proceedings of the 8th International Conference on Pervasive Computing, Pervasive’10, Berlin, Heidelberg, pp. 283–300. External Links: ISBN 3-642-12653-7, 978-3-642-12653-6 Cited by: §2, §2.
  • [43] V. Vapnik (2015) Learning Using Privileged Information : Similarity Control and Knowledge Transfer. Journal of Machine Learning Research 16, pp. 2023–2049. External Links: ISBN 1532-4435, ISSN 15337928 Cited by: §2.
  • [44] J. Wen and Z. Wang (2017) Learning general model for activity recognition with limited labelled data. Expert Systems with Applications 74, pp. 19 – 28. External Links: ISSN 0957-4174, Document, Link Cited by: §2.
  • [45] T. Xu, F. Zhu, E. K. Wong, and Y. Fang (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image and Vision Computing 55, pp. 127–137. External Links: ISBN 0262-8856, ISSN 02628856 Cited by: §2.
  • [46] X. Xu, X. Zhou, R. Venkatesan, G. Swaminathan, and O. Majumder (2019-06) D-sne: domain adaptation using stochastic neighborhood embedding. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 2492–2501. External Links: Document, ISSN 1063-6919 Cited by: §1, §2.
  • [47] Y. Yang (1999) An evaluation of statistical approaches to text categorization. Information retrieval 1 (1-2), pp. 69–90. Cited by: §4.2.