Deep HMResNet Model for Human Activity-Aware Robotic Systems

09/20/2018 ∙ by Hazem Abdelkawy, et al. ∙ 0

Endowing the robotic systems with cognitive capabilities for recognizing daily activities of humans is an important challenge, which requires sophisticated and novel approaches. Most of the proposed approaches explore pattern recognition techniques which are generally based on hand-crafted features or learned features. In this paper, a novel Hierarchal Multichannel Deep Residual Network (HMResNet) model is proposed for robotic systems to recognize daily human activities in the ambient environments. The introduced model is comprised of multilevel fusion layers. The proposed Multichannel 1D Deep Residual Network model is, at the features level, combined with a Bottleneck MLP neural network to automatically extract robust features regardless of the hardware configuration and, at the decision level, is fully connected with an MLP neural network to recognize daily human activities. Empirical experiments on real-world datasets and an online demonstration are used for validating the proposed model. Results demonstrated that the proposed model outperforms the baseline models in daily human activity recognition.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

There is a growing consensus about the need of adding some cognitive capabilities to the connected things and robots that are produced today in order to provide the added value of assistive services for dependent people. These services aim to improve their quality of life and their physical and mental well-being and to guarantee their safety [Sebbak et al.2013]

. In the context of ambient assisted living (AAL), daily human activity recognition (DHAR) is one of the interesting cognitive capabilities that must be present in any robotic system. In this context, some studies based on a new generation of data-driven approaches, deep learning models, were proposed recently in the literature

[Plötz, Hammerla, and Olivier2011, Yang et al.2015, Ronao and Cho2016].

The development of robotic systems with the capability of daily human activity recognition requires sophisticated and novel approaches. To enable an efficient recognition of daily activities in a dynamic environment, an exhaustive activities sensing with fusion techniques are required. Building efficient cognitive models for robotic systems requires a suitable architecture allowing the integration of heterogeneous sensors, objects, and robots.

In this paper, A novel Hierarchal Multichannel Deep Residual Network (HMResNet) model is proposed for robotic systems to recognize daily human activities in the ambient environments. The introduced model is comprised of multilevel fusion layers combined with residual shortcut connections. The proposed Multichannel 1D Deep Residual Network model is, at the features level, combined with a Bottleneck MLP neural network to automatically extract features and, at the decision level, is fully connected with an MLP neural network to recognize daily human activities. The hierarchical architecture of the proposed model gives the advantage of extracting more complex features than the traditional plain deep learning models and facilitates the training of more sparse and deep networks than those used in the literature.

The paper is organized as follows: First, we review the related works concerning daily human activity recognition in the robotics field. Then, we describe the proposed Hierarchal Multichannel Deep Residual Network (HMResNet) model for human daily activities recognition. We evaluate the performances of the proposed approach with extensive experiments on real-world datasets besides a scenario of cognitive daily exercises coaching. This paper is concluded with a short review of the proposed model and a summary of the ongoing works.

Related Work

Within the objective of developing added value services to assist dependent people in their daily activities, as well as identifying these activities receives much focus in recent years [Kashiwabara et al.2012]. One of the most challenging tasks in human activity recognition is the extraction of remarkable features from the raw inertial data. Most of the existing works are based on handcrafted engineered features, which are known as shallow features. The most commonly used features for human activity recognition are hand-engineered features [Anguita et al.2012]

, time domain features such as mean, median, variance, skewness, kurtosis and range

[Attal et al.2015]

and frequency domain features such as temporal fast Fourier transform (tFFT)

[Sharma, Lee, and Chung2008], Discrete Fourier Transform (DFT) and Power Spectral Density (PSD) [Attal et al.2015]

. To identify different daily human activities, classification approaches, such as, Hidden Markov Models

[Lee and Cho2011], Artificial Neural Network [Mantyjarvi, Himberg, and Seppanen2001]

, Support Vector Machine (SVM)

[He and Jin2009]

and naive Bayes classifiers

[Yang, Dinh, and Chen2010] rely on the latter features.

As evidenced, The traditional machine learning approaches have made significant progress in the last decade. However, these methods are based on heuristic hand-engineered features. Besides, the more amount of input data, the more inability of those approaches to come up with such relevant and consistent features. Moreover, the majority of the latter approaches depend on learning from static data, while the recognition of human activities in real life is based on data streaming from heterogeneous sensors that require durable online and incremental learning.

In recent years, a fast development of deep learning models has been started to compensate the drawbacks of the traditional data-driven approaches. One of the first attempts to recognize human basic activities with a deep learning model is proposed in [Plötz, Hammerla, and Olivier2011]

. This study based on the creation of Restricted Boltzmann Machines (RBM) allows extracting features from accelerometer raw data. In

[Duffner et al.2014]

, a convolutional neural network (ConvNet) model is proposed for recognizing basic gestures from the accelerometer and gyroscope raw data. This ConvNet model outperformed the other state-of-the-art models in gesture recognition such as Dynamic Time Wrapping (DTW) and Hidden Markov Model (HMM). A hierarchical ConvNets model is proposed in

[Yang et al.2015] and benchmarked against many datasets of daily activities to show its performance compared to other states of the art baseline models. In [Ronao and Cho2016], ConvNet model is proposed to recognize Human Activities using smartphone sensors.

Compared to the traditional machine learning approaches, deep learning approaches depend on learned-features that can be extracted from the raw data automatically. These features are more relevant and complex than the hand-engineered features used with the traditional machine learning approaches. Besides, the nature of the deep learning models structure allows performing online and incremental learning. Despite the above-mentioned advantages, compared to computer vision and natural language processing tremendous development, there are only a few attempts

[Zheng et al.2016, Yang et al.2015, Lee, Yoon, and Cho2017] that tried to exploit the hierarchical deep learning models to classify 1D time series which is the cornerstone of HAR.

HMResNet Deep Learning Model

In this paper, to recognize human daily activities in a dynamic environment, a new deep learning architecture based on Hierarchical Multichannel deep Residual Network (HMResNet) is proposed. Compared to the state of the art deep learning models [Zheng et al.2016, Ronao and Cho2016, Yang et al.2015, Lee, Yoon, and Cho2017], the proposed model is based on multilevel fusion layers, with residual shortcut connections, and working on the multichannel raw data. At the features level fusion, a Multichannel 1D Deep Residual Network combined with a Bottleneck MLP neural network is proposed, for each sensor channel, to automatically extract features from raw data. At the decision level fusion, a multi-sensor fusion layer based on deep 1D ResNet followed by a fully connected MLP neural network is exploited for recognizing daily human activities. Indeed, the hierarchical architecture of the proposed model gives the advantage of extracting more relevant and complex features than the traditional plain deep learning models and facilitates the training of more sparse and deep networks than those used previously in the literature.

Raw Data Preprocessing

This step consists of data filtering, segmentation, and missing values replacement processes. In this paper, the Inertial Measurement Unit (IMU) sensors are used to measure linear and angular motion based on accelerometer and gyroscope raw data. Noise filters are used for preprocessing the sensors raw data (gyroscope and accelerometer). The sliding window algorithm is applied to segment the separated input signals into fixed-size windows. Finally, the preprocessed data are transferred to the feature extraction/fusion layers as a vector of sliding windows which contains accelerometer and gyroscope preprocessed raw data components, as shown in Fig.

1.

Feature Level Fusion

At the feature level fusion, the 1D deep ResNet is exploited to extract features automatically from the preprocessed raw data, followed by a Bottleneck MLP neural network to create sensor level features fusion layer as shown in Fig.1.

Deep Residual Network (ResNet)

Basically, the Deep Residual Network developed by Microsoft research labs, is exploited in [He et al.2016]

for image recognition. ResNet got the first place in the five main tracks of COCO and ImageNet competitions, which covering object recognition, image classification, and semantic segmentation. Hence, many studies started to evaluate ResNet performance in different fields such as speech recognition

[Xiong et al.2017], and question answering systems [De Vries et al.2017]. However, to our knowledge, a single attempt was proposed in [Wang, Yan, and Oates2017] to use ResNet for time series classification.

Figure 1: Daily Human Activity Recognition based-Hierarchal Multichannel Deep Residual Network Model for robotic systems Exploiting IMUs
Figure 2: Multilayer Convolution Feature Extractor Unit (MCFEU)
Figure 3: Deep Residual Network Based on Stacked MCFEU units

In the proposed model, the basic block of ResNet is convolutional layer with kernel of size

followed by Batch Normalization

[Ioffe and Szegedy2015]

and Rectified Linear Unit

layers. To avoid the problem of the vanishing gradient, ReLU activation function is used. The Batch normalization (BN) is applied to speed up the model convergence and improve the model generalization. Inspired by ResNet152 deep model

[He et al.2016], A plain network based on

basic blocks are developed with different 1D kernel sizes, without strides, with 32, 64, and 64 feature maps respectively to create Multilayer Convolution Feature Extractor Unit (MCFEU) as shown in Fig.

2. Both The kernel sizes and the number of feature maps have been chosen based on the empirical experiments which were conducted on different datasets [Ronao and Cho2016, Attal et al.2015]. MCFEU is exploited to extract multilevel features from time series preprocessed raw data. The complete deep ResNet model is developed by stacking multiple MCFEU units, besides adding residual shortcut connection between the MCFEU units as shown in Fig.3. The shortcut connections are exploited to ensure that every MCFEU unit is learning more meaningful features and to solve the problem of the vanishing gradient for deep networks [He et al.2016]. Before feeding the extracted features to the decision level fusion layer, a Global Average Pooling layer [Lin, Chen, and Yan2013] is used to minimize the model overfitting by reducing the total number of the learned parameters.

MLP Neural Network for Sensor Level Fusion

In this paper, for every single IMU sensor, a bottleneck MLP neural network is exploited as a fusion layer for the sensor’s channels as shown in Fig.1. The bottleneck MLP neural network acts as a nonlinear dimension reduction module, which is used for extracting low-dimensional features from the integrated deep ResNet output features. Finally, the outputs of all bottleneck MLP neural networks are integrated into a single feature vector, which is fed to the decision level fusion layer, see Algorithm 1. In this work, the bottleneck MLP neural network is comprised of :

  • Two fully connected hidden layers where each layer consists of 1000 nodes.

    • Each layer is denoted by , where is the layer index.

    • Each node is denoted by , where is the layer index and is the node index inside a layer.

  • The Dropout algorithm [Srivastava et al.2014] is applied on 30% of the nodes to prevent the network from overfitting the training features.

  • To break the symmetry of the neurons performance, The network weights are randomly initialized with small values close to zero based on normal distribution with

    and .

    • Each weight is denoted by , which refers to the weight from the node to the node .

  • For every node, the Rectified Linear Unit (ReLU) is exploited as an activation function.

    • Each node activation is denoted by .

Input: X List of feature vectors with length
    Parameters to be learned: ,
    Output:

1:  Randomly initialize the network weights
2:  
3:  for  each  do
4:     
5:     .append()
6:  end for
7:   Dropout(,0.3)
8:  for  each  do
9:     
10:     .append()
11:  end for
12:   Dropout(,0.3)
13:   Return
Algorithm 1 Bottleneck MLP neural network Sensor Fusion Layer (BMLP)

Decision level fusion

At decision level fusion, both the 1D deep ResNet and the Bottleneck MLP neural networks are exploited to recognize daily human activities as shown in Fig.1. The Bottleneck MLP neural network is comprised of three fully connected layers. In the hidden layers, each layer consists of 1000 nodes which are based on ReLU activation function. The output layer consists of a number of nodes equal to the total number of target activities. Besides, the softmax activation function is used for the output layer, see Algorithm 2

. During the training phase, the categorical cross entropy cost function is exploited to calculate the difference between the target labels and the predicted labels. This difference is exploited by the backpropagation algorithm

[Hagan and Menhaj1994] to update the parameters to be learned of both the feature level and decision level layers during the training phase. Finally, Adam algorithm [Kingma and Ba2014] is used for optimizing the MLP categorical cross-entropy cost function.

Input: X 1D Feature vector with length
    Parameters to be learned: ,
    Output:

1:  
2:  for  each  do
3:     
4:     .append()
5:  end for
6:   Dropout(,0.3)
7:  for  each  do
8:     
9:     .append()
10:  end for
11:   Dropout(,0.3)
12:  
13:  for  each  do
14:     
15:     .append()
16:  end for
17:  ) Return
Algorithm 2 Decision Level Fusion Classifier

Experiments and Evaluation

The proposed approach for daily human activities recognition is evaluated through empirical experiments on real-world datasets: Smartphones dataset [Anguita et al.2013] and Wearable Sensors dataset [Attal et al.2015]. The proposed model is evaluated against the following baseline models : (i) k-NN with time domain and frequency domain features [Attal et al.2015] using the Wearable Sensors dataset. (ii) Convnet combined with MLP neural network applied to raw data [Ronao and Cho2016], and Convnet with tFFT features [Ronao and Cho2016], using the Smartphones dataset. The architecture shown in Fig.1 is used for both datasets, the only differences are the number of input sensors and the number of output classes since the input and output of the datasets are different. Both of the features level deep networks and the decision level deep networks are trained together to ensure the consistency of the learning process.

Datasets

Basically, human activities are divided into periodic, static, or sporadic activities. Periodic activities such as biking and walking, static activities such as standing, lying and seated, and sporadic activities are intention-oriented activities such as drinking from a cup, and opening door [Bulling, Blanke, and Schiele2014].

Smartphone Dataset

In this dataset, the data were collected using a smartphone with a built-in accelerometer and gyroscope tri-axial sensor. The dataset consists of 6 different activities performed by 30 volunteer subjects while holding a smartphone in a pocket tight around their waist. The activities are a mix of periodic and static activities such as WALKING, WALKING UPSTAIRS, WALKING DOWNSTAIRS, SITTING, STANDING, and LAYING. The data were sampled at 50Hz and divided into fixed length windows of 128 samples with 50% overlap. Butterworth low-pass filter is used to separate body acceleration and gravity from the accelerometer raw data. The data were separated into a training set with 7352 windows from 21 randomly selected subjects, and testing set of the remaining 2947 windows.

Method Accuarcy (%)
Baseline Models [Ronao and Cho2016]
PCA+MLP 57.10
HCF+NB 74.32
HCF+J48 83.02
SDAE+MLP(DBN) 87.77
HCF+ANN 91.08
HCF+SVM 94.61
Deep Learning Models [Ronao and Cho2016]
Convnet (inverted pyramid archi)+MLP 94.79
tFFT+Convnet (()=200) 95.75
Proposed Model
Hierarchal Multichannel Deep ResNet 97.619
Table 1: Smartphone dataset Accuracy Evaluation
ConvNet Predicted classes Proposed Model Predicted classes
Actual class W WU WD Si St L W WU WD Si St L
Walking 491 3 2 0 0 0 496 0 0 0 0 0
W. upstairs 0 471 0 0 0 0 3 468 0 0 0 0
W. downstairs 0 0 420 0 0 0 2 0 417 0 1 0
Sitting 0 0 0 436 34 21 0 1 0 438 52 0
Standing 0 1 0 24 496 11 0 0 0 10 522 0
Laying 0 0 0 43 23 471 0 0 0 0 0 537
Table 2:

HAR Using Smartphone dataset Confusion Matrix Evaluation

Wearable Sensors dataset

In this dataset, the data were collected using three IMU sensors placed on the chest, the right thigh and the left ankle of the subject. Each IMU sensor has built-in tri-axial accelerometer, gyroscope, and magnetometer. The dataset consists of different activities performed by volunteer subjects. The activities are a mix of periodic and static activities with transitional activities such as A1: WALKING DOWNSTAIRS, A2: STANDING, A3: SITTING DOWN, A4: SITTING, A5: FROM SITTING TO SITTING ON THE GROUND, A6: SITTING ON THE GROUND, A7: LYING DOWN, A8: LYING, A9: FROM LYING TO SITTING ON THE GROUND, A10: STANDING UP, A11: WALKING, and A12: WALKING UPSTAIRS. The dataset was sampled at Hz and no sliding window was applied to the raw data.

Baselines

With regard to smartphone dataset, the baseline consists of two deep learning models which are applied to both the raw data and tFFT features extracted from a single smartphone IMU sensor [Ronao and Cho2016]. The first baseline consists of ConvNet combined with MLP neural network to extract features automatically from the IMU sensor raw data. The second baseline consists of the ConvNet applied to tFFT features extracted from smartphone IMU sensor raw data. Both the latter baselines aim to recognize 6 static and periodic activities. The baseline models were evaluated over a test set of randomly selected 9 volunteers.

With regard to wearable sensors dataset, the baseline consists of multiple traditional machine learning models which are applied to both the raw data and time domain, frequency domain features extracted from 3 different IMU sensors [Attal et al.2015]. The baseline models aim to recognize a set of static, periodic and transitional activities. The baseline models evaluated over 10 folds cross-validation with 25 samples and 80% overlap sliding windows.

In this paper, To ensure a fair comparison with the baseline models, The same number of cross-folds, number of samples per sliding window, and sliding windows overlapping configurations are applied as the reference models.

Results

Figure 4: Confusion Matrix obtained by HMResNet using wearable sensors dataset

With regard to smartphone dataset, In terms of accuracy, The results of the proposed model compared to eight baseline models are shown in Table 1. Besides, The confusion matrix of the best baseline model (ConvNet) compared to the proposed model (HMResNet) is shown in Table. 2. The best results are highlighted in bold for both tables.

From the Accuracy results, The proposed HMResNet model has significantly outperformed the baseline models and obtained better classification accuracy. Compared to the best baseline model (ConvNet), The confusion matrix shows that a significant improvement can be observed for static activities such as , , and , which constituted a major impediment for the baseline model to classify them correctly. For activity, the number of the correctly classified classes improved by 12.3% with no misclassified classes. For activity, the number of the correctly classified classes improved by 4.8%, Besides the number of misclassified classes decreased by 4.5%. For activity, the number of the correctly classified classes improved by 0.4%, Besides the number of misclassified classes decreased by 0.6%. Based on the results, the proposed model shows that the hierarchal architecture with multilevel fusion layers combined with residual shortcut connections succeed to extract more relevant features for both static and periodic activities than both the hand engineered features and the plain CNN learned features. Besides, the proposed model is more accurate than both the traditional machine learning and deep learning baseline models for recognizing human daily activities based on a single accelerometer and gyroscope tri-axial sensor.

To evaluate the extendability and the robustness of the proposed model regardless of the hardware configuration, The proposed model was benchmarked against another baseline models using wearable sensors dataset. In terms of average precision-recall values, The results of the proposed model compared to eight base line models are shown in Table 3.

With regard to baseline models, The K-nearest neighbor (k-NN) algorithm applied to time-domain, and frequency-domain features achieves the best results in terms of average recall, and precision values, followed by Random Forest (RF), then k-NN without features and finally the Supervised Learning Gaussian Mixture Models (SLGMM) without features obtains relatively the worst results. As shown in Table

3, the proposed model outperforms the baseline models which are evaluated on raw data as well as on the hand-crafted features. The obtained results show that the proposed model improved the values of precision-recall to be % and % respectively compared to the baselines methods that are varying from % to % and from % to %. The best results are highlighted in bold for in the table. The proposed model obtained almost perfect results as shown in the confusion matrix in Fig. 4.

Because of the small size of the dataset, the results show a slight difference when comparing the results of the proposed (HMResNet) model to the best baseline (k-NN with features) model. Even though the small difference, the extraction of features phase requires integrating additional models and algorithms to the baseline models. Besides, the feature extraction phase needs extra computation time, which is not practical for real-time applications.

From the latter empirical evaluation experiments, The proposed HMResNet model succeeded to outperforms the baseline models, extract more relevant features, and recognize periodic, transitional, and static daily human activities from single IMU sensor up to IMU sensors, which shows the robustness of the proposed model regardless of the hardware configuration.

Model Precision(%) Recall(%)
Without features [Attal et al.2015]
KNN 94.62 94.57
RF 83.46 82.28
SVM 90.33 90.98
SLGMM 69.88 69.99
With features [Attal et al.2015]
KNN 98.85 98.85
RF 98.25 98.24
SVM 92.90 93.15
SLGMM 73.61 74.44
Proposed Model
Deep Multichannel ResNet 99.22 98.88
Table 3: Wearable Sensors dataset Evaluation

Real World Use case: Cognitive daily exercises coaching

Figure 5: Scene extracted from the smart home environment

To validate the proposed approach for real-time activity recognition, a use case of cognitive daily exercises coaching for a diabetic person is studied, see Fig.5. This use case consists of a robot, called Pepper, that is acting as a training coach of a diabetic person, called Alice. Indeed, Pepper recognizes and guide the daily exercises which were prescribed by a doctor for Alice. This work is reported in a multimedia video that is available on LISSI’s Website 111http://www.lissi.fr/videos/HMResNet.php.

By extending a previously proposed cognitive architecture [Ayari et al.2017a], Pepper can detect and monitor Alice’s activities continuously based on the proposed HMResNet model. During this use case, a set of complex activities are recognized and analyzed by integrating the proposed HMResNet approach with previously proposed Narrative Knowledge Representation Language (NKRL) reasoning rules [Ayari et al.2017a, Ayari et al.2017b, Ayari et al.2016], see Fig.6.

At the low level, a communication service is implemented to enable the entities populating the ambient environment to connect and subscribe to cloud services as well as to interchange knowledge. The communication service is based on standard communication technologies such as (XMPP, REST, etc.). In addition to the communication service, HMResNet Activity recognition, and Multi-modal data sensing services are implemented at the low level.

At the high level, Knowledge representation services are proposed to model the dynamic knowledge. The knowledge representation services exploited the narrative knowledge representation language (NKRL) based n-ary ontologies, to avoid the problems experienced by binary ontologies for dynamic knowledge representation [Tenorth and Beetz2015, Lemaignan et al.2016]. The representation services create NKRL predicates occurrences and save them in a shared knowledge base, thus the reasoning model query the predicates occurrences back when necessary.

Figure 6: Cognitive Architecture for Human Activities Aware Robotic Systems

During the experiment, The proposed model was evaluated by streaming 6974 sliding windows of 1.5 seconds and the average processing time to recognize a single activity was seconds. Therefore, The processing time is reasonably fitting the constraints of real-time activity recognition.

This use case is a part of European project 222https://itea3.org/project/medolution.html which is funded by Research, Development and Innovation (RDI) program.

Conclusion

In this paper, a new deep learning architecture based on Hierarchical Multichannel deep Residual Network (HMResNet) is proposed for robotic systems to recognize daily human activities in ambient environments. The introduced model consists of multilevel fusion layers. The proposed Multichannel 1D Deep Residual Network model is, at features level, combined with Bottleneck MLP neural network to automatically extract relevant features and, at decision level, fully connected with MLP neural network to recognize daily human activities.

The performance of the daily human activity recognition based on HMResNet model is shown through two datasets. The proposed automatic features extraction model is more relevant than both the hand engineered features and the plain CNN learned features. It is able to recognize perfectly, in terms of precision, the static activities: SITTING, STANDING, and LAYING. In general, results demonstrated that the proposed model outperforms baseline methods exploiting the same datasets.

To validate the proposed approach for real time activity recognition, a use case of daily exercises coaching for a diabetic person is studied.

The ongoing works address the extension of the proposed approach to explain human behavior through recognized activities over time. Besides, The further study of the hyper-parameters tunning, and the extracted features by HMResNet model to evaluate them against the well-known baseline deep learning models should be addressed. Even though HMResNet neural networks can be a cornerstone technique for the Human Activity Recognition, further study of the method and evaluating larger datasets should be conducted.

References

  • [Anguita et al.2012] Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; and Reyes-Ortiz, J. L. 2012. Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In International workshop on ambient assisted living, 216–223. Springer.
  • [Anguita et al.2013] Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; and Reyes-Ortiz, J. L. 2013. A public domain dataset for human activity recognition using smartphones. In ESANN.
  • [Attal et al.2015] Attal, F.; Mohammed, S.; Dedabrishvili, M.; Chamroukhi, F.; Oukhellou, L.; and Amirat, Y. 2015. Physical human activity recognition using wearable sensors. Sensors 15(12):31314–31338.
  • [Ayari et al.2016] Ayari, N.; Chibani, A.; Amirat, Y.; and Matson, E. 2016. A semantic approach for enhancing assistive services in ubiquitous robotics. Robotics and Autonomous Systems 75:17–27.
  • [Ayari et al.2017a] Ayari, N.; Abdelkawy, H.; Chibani, A.; and Amirat, Y. 2017a. Towards semantic multimodal emotion recognition for enhancing assistive services in ubiquitous robotics. In 2017 AAAI Fall Symposium Series.
  • [Ayari et al.2017b] Ayari, N.; Chibani, A.; Amirat, Y.; and Fried, G. 2017b. Contextual knowledge representation and reasoning models for autonomous robots. In 2017 AAAI Fall Symposium Series.
  • [Bulling, Blanke, and Schiele2014] Bulling, A.; Blanke, U.; and Schiele, B. 2014. A tutorial on human activity recognition using body-worn inertial sensors. ACM Computing Surveys (CSUR) 46(3):33.
  • [De Vries et al.2017] De Vries, H.; Strub, F.; Mary, J.; Larochelle, H.; Pietquin, O.; and Courville, A. C. 2017. Modulating early visual processing by language. In Advances in Neural Information Processing Systems, 6597–6607.
  • [Duffner et al.2014] Duffner, S.; Berlemont, S.; Lefebvre, G.; and Garcia, C. 2014. 3d gesture classification with convolutional neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, 5432–5436. IEEE.
  • [Hagan and Menhaj1994] Hagan, M. T., and Menhaj, M. B. 1994. Training feedforward networks with the marquardt algorithm. IEEE transactions on Neural Networks 5(6):989–993.
  • [He and Jin2009] He, Z., and Jin, L. 2009. Activity recognition from acceleration data based on discrete consine transform and svm. In Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on, 5041–5044. IEEE.
  • [He et al.2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  • [Ioffe and Szegedy2015] Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167.
  • [Kashiwabara et al.2012] Kashiwabara, T.; Osawa, H.; Shinozawa, K.; and Imai, M. 2012. Teroos: a wearable avatar to enhance joint activities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2001–2004. ACM.
  • [Kingma and Ba2014] Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [Lee and Cho2011] Lee, Y.-S., and Cho, S.-B. 2011. Activity recognition using hierarchical hidden markov models on a smartphone with 3d accelerometer. In

    International Conference on Hybrid Artificial Intelligence Systems

    , 460–467.
    Springer.
  • [Lee, Yoon, and Cho2017] Lee, S.-M.; Yoon, S. M.; and Cho, H. 2017. Human activity recognition from accelerometer data using convolutional neural network. In Big Data and Smart Computing (BigComp), 2017 IEEE International Conference on, 131–134. IEEE.
  • [Lemaignan et al.2016] Lemaignan, S.; Warnier, M.; Sisbot, E. A.; Clodic, A.; and Alami, R. 2016. Artificial cognition for social human?robot interaction: An implementation. Artificial Intelligence.
  • [Lin, Chen, and Yan2013] Lin, M.; Chen, Q.; and Yan, S. 2013. Network in network. arXiv preprint arXiv:1312.4400.
  • [Mantyjarvi, Himberg, and Seppanen2001] Mantyjarvi, J.; Himberg, J.; and Seppanen, T. 2001. Recognizing human motion with multiple acceleration sensors. In Systems, Man, and Cybernetics, 2001 IEEE International Conference on, volume 2, 747–752. IEEE.
  • [Plötz, Hammerla, and Olivier2011] Plötz, T.; Hammerla, N. Y.; and Olivier, P. 2011. Feature learning for activity recognition in ubiquitous computing. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, 1729.
  • [Ronao and Cho2016] Ronao, C. A., and Cho, S.-B. 2016. Human activity recognition with smartphone sensors using deep learning neural networks. Expert Systems with Applications 59:235–244.
  • [Sebbak et al.2013] Sebbak, F.; Chibani, A.; Amirat, Y.; Mokhtari, A.; and Benhammadi, F. 2013. An evidential fusion approach for activity recognition in ambient intelligence environments. Robotics and Autonomous Systems 61(11):1235 – 1245. Ubiquitous Robotics.
  • [Sharma, Lee, and Chung2008] Sharma, A.; Lee, Y.-D.; and Chung, W.-Y. 2008. High accuracy human activity monitoring using neural network. In Convergence and Hybrid Information Technology, 2008. ICCIT’08. Third International Conference on, volume 1, 430–435. IEEE.
  • [Srivastava et al.2014] Srivastava, N.; Hinton, G. E.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research 15(1):1929–1958.
  • [Tenorth and Beetz2015] Tenorth, M., and Beetz, M. 2015. Representations for robot knowledge in the knowrob framework. Artificial Intelligence.
  • [Wang, Yan, and Oates2017] Wang, Z.; Yan, W.; and Oates, T. 2017. Time series classification from scratch with deep neural networks: A strong baseline. In Neural Networks (IJCNN), 2017 International Joint Conference on, 1578–1585. IEEE.
  • [Xiong et al.2017] Xiong, W.; Droppo, J.; Huang, X.; Seide, F.; Seltzer, M.; Stolcke, A.; Yu, D.; and Zweig, G. 2017. The microsoft 2016 conversational speech recognition system. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, 5255–5259. IEEE.
  • [Yang et al.2015] Yang, J.; Nguyen, M. N.; San, P. P.; Li, X.; and Krishnaswamy, S. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition. In Ijcai, volume 15, 3995–4001.
  • [Yang, Dinh, and Chen2010] Yang, X.; Dinh, A.; and Chen, L. 2010. Implementation of a wearerable real-time system for physical activity recognition based on naive bayes classifier. In Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference on, 101–105. IEEE.
  • [Zheng et al.2016] Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; and Zhao, J. L. 2016. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification. Frontiers of Computer Science 10(1):96–112.