Entropy Decision Fusion for Smartphone Sensor based Human Activity Recognition

05/30/2020 ∙ by Olasimbo Ayodeji Arigbabu, et al. ∙ 0

Human activity recognition serves an important part in building continuous behavioral monitoring systems, which are deployable for visual surveillance, patient rehabilitation, gaming, and even personally inclined smart homes. This paper demonstrates our efforts to develop a collaborative decision fusion mechanism for integrating the predicted scores from multiple learning algorithms trained on smartphone sensor based human activity data. We present an approach for fusing convolutional neural network, recurrent convolutional network, and support vector machine by computing and fusing the relative weighted scores from each classifier based on Tsallis entropy to improve human activity recognition performance. To assess the suitability of this approach, experiments are conducted on two benchmark datasets, UCI-HAR and WISDM. The recognition results attained using the proposed approach are comparable to existing methods.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The increasing pervasiveness and wide acceptability of smart phones, coupled with immense amount of embedded sensors has open up an avenue for adopting mobile phones as a means of data acquisition. Transmitted signals from mobile sensors such as accelerometer and gyroscope can be captured for analyzing human motion based on acceleration of movement and angular rotational velocity. Aside data acquisition, recent advancement in edge intelligence (EdgeAI) [1]

has introduced another interesting perspective of developing self contained artificial intelligence device. EdgeAI offers on-demand prediction on smartphones using pre-trained models in real time with low latency, rather than depending on cloud deployment of trained models. Such advancement provides compelling ecosystem to model human activity recognition (HAR) for fast and accurate personalization of activity pattern of an individual over time. This can then be incorporated into the developmental pipeline of systems used for video surveillance, patient rehabilitation, entertainment, and smart homes.

Aside smartphones, there are various other means of capturing HAR data such as wearable sensors, ambient sensors, video sensors, and social network. [2, 4, 14, 3, 5]. Like mobile, wearable sensors are usually placed at specific locations on the body (e.g sternum, lower back, and waist) to measure human activity information. Ambient sensors are usually installed in closed environment to monitor interactions between individuals and the environment. Video-based sensors are used to record daily activities of human subjects’ appearance in video footage or surveillance cameras. Social network based activity recognition sifts through an abundance of users’ online information from multiple social network platforms to understand users’ intention, behavior, and activities.

Human activities naturally possess inherent hierarchical structure, as an activity class can be divided into more fine grained smaller actions [11]

, and activity data collected using sensors such as those in smartphones are quite complex to model. This is due to factors such as position of sensors, number of sensors, as well as complexity of activities, since different individuals may have slightly different styles of expressing a particular activity. Immense research effort has been devoted toward understanding and developing HAR by employing both handcrafted and deep learning based feature learning techniques to extract comprehensive information about different types of activities

[7]. To enable the usability of HAR in real life environment, a fundamental aspect which is very critical to its success is recognition accuracy. To improve the classification performance, different levels of information combination have been introduced which include sensor fusion, feature fusion, and classifier fusion [8, 9].

Multiple classifier fusion can help mitigate and compensate for the weakness of a single classifier, especially when the source of information is prone to a number of limiting factors that could negatively affect recognition accuracy. However, an aspect of multiple classifier fusion that has not been well addressed is the generation of weights or significance factors assigned to different independent classifiers prior to score fusion. It is arguably an open area of research needing indepth investigation to improve HAR performance and boost its applicability in real life environment.

Pushing toward using multiple learning algorithms for human activity recognition, this paper introduces a fusion strategy which integrates the predicted probabilities of convolutional neural network (CNN), recurent convoutional network (RCN), and support vector machine (SVM). Each learning algorithm is initially trained independently and the relative weighted scores from the learning methods are subsequently fused to produce a more robust and accurate activity recognition system. The relative weights in this case is based on computing self-information from each classifier using the Tsallis entropy

[10], where the total Tsallis entropy from a classifier is influenced by the quality of information from each activity class. Therefore, ensuring that the final fusion weight for a classifier captures the underlying prediction quality of the classifier over the number of class labels in the model.

The remainder of the paper is organized as follows: section II provides a review of related literature. Section III describes the adopted learning techniques based on CNN, RCN, and SVM. In section IV, we describe the fusion strategies and also present the proposed method for fusing the 3 learning methods. Experimental results are presented in section V, and we draw conclusions on this work in section VI.

2 Related Works

This section describes the reported approaches in the literature for performing HAR from the perspective of feature representation and classification. It also points out the main contributions of this paper.

2.1 Handcrafted features based methods

It is commonplace in the literature to divide the pipeline of HAR into preprocessing, feature extraction, classification, which is also a standard in various computer vision and machine learning applications

[12, 13]. Upon acquiring the data for HAR, a number of preprocessing operations are usually performed to minimize noise in the data. This includes the use of denoising filters, data segmentation and normalization techniques [16]

. From the perspective of feature extraction, algorithms such as principal component analysis (PCA), independent component analysis (ICA), and linear discriminant analysis (LDA) have popularly been used to extract statistical information from the raw data. Such is the approach presented by Fergani

[17] where statistical feature extraction methods are used for training weighted SVM, with LDA-WSVM showing the best result. An overview of different feature extraction techniques was presented in [15]

, by grouping the techniques into structured and statistical. We have also witnessed direct application of machine learning classifiers to HAR by considering the input raw data as feature vectors. Some of the commonly used classifiers include decision tree, random forest, k-nearest neighbour (KNN), naive bayes, and SVM. Performance comparison of a number of both supervised and unsupervised classification techniques suggests that KNN and HMM are better suited for classifying human activities


2.2 Deep learning based methods

Nowadays, the paradigm has shifted from deconstructing the learning process into sub-stages (preprocessing-feature extraction-classification) to using deep learning techniques which has the capacity to perform the aforementioned processes implicitly with very little requirement of human intervention in the learning pipeline. Deep neural network (DNN) [19]

has attained ground breaking performance in other application areas such as image recognition, natural language processing (NLP), and object detection

[20]. The earliest attempt on HAR with DNN was presented in [21]

using restricted Boltzmann machines (RBM). Recently, a lot of interesting research studies have been reported exploring various methods of learning useful information from data using DNN

[22, 23, 24]

. For instance, different architectures of deep, convolutional, and recurrent neural network (RNN) were assessed in


. CNN with 1D filters and max pool layers was applied in


yielding a recognition result of 94.79% and combination of CNN with temporal fast fourier transform (tFFT) produced a result of 95.75%. Comprehensive experimentation involving handcrafted features, codebook learning, multi-layer perceptron (MLP), CNN, LSTM, autoencoder, and CNN-LSTM on two public datasets was conducted in

[26]. It was discovered that automatically learned features offer better performance than handcrafted ones especially using CNN-LSTM. To deploy DNN for HAR in real life applications, issues of speed has to be well addressed. This has been investigated in [27], where deep RNN was trained on raw accelerometer data with high throughput of 1.347ms, which is 8.19 times faster than other similar methods.

2.3 Information fusion

To improve the performance of HAR, researchers have opted for the option of integrating multiple sources of information. This has been approached from the perspective of fusing information from multiple sensors, where readings from accelerometer, gyroscope, and magnetometer are combined to compensate for the weaknesses of individual sensor [30]. Such assessment has been conducted to understand the influence of positioning of on-body sensors at different body parts on HAR performance [31]. There are studies also concentrating on fusion at feature level and classifier level [5]. For instance, CNN has been examined for implementing different early and late-fusion strategies such as integrating multiple CNNs trained independently on gyroscope, accelerometer, and pressure data at sensor, channel, and shared network parameter level [32]. Hierarchical decision fusion strategy was proposed in [33] which typical combines multi-decision making classifiers by assessing the average accuracy of each classifier to obtain the weight for fusing the classifiers. Finally majority voting is used to obtain the final decision of each activity class [33]. Using shannon entropy based weight generation, an approach which uses the classifier accuracy rates of multiple classifiers used under multiple sensors placed at different body to obtain classifier weights was proposed in [35]

. Handcrafted features were extracted from each sensor data, feature selection method based on LDA was adopted to minimize feature redundancy, and typical classifiers such as KNN, decision tree C4.5, naive bayes, SVM, and HMM were used to making predictions.

2.4 Main contributions

The contributions of this work can be summarized as follows:

  • Explored deep learning and statistical learning based techniques to fuse their performance using the predicted class probabilities. Our method combines the usefulness of automatically learned features with handcrafted ones.

  • Proposed the use of generalized entropy based on Tsallis entropy to obtain classifier fusion weights which are directly influenced by relative self information of each classifier.

3 Learning Methods

This section describes the basic concepts behind the algorithms adopted in this work for training the activity data acquired using smartphone.

3.1 Convolutional Neural Network (CNN)

CNN is a deep learning algorithm which combines feature learning with trainable classifier for learning from multiple array of data such as color image, video streams, or 3D volumetric data. [34, 36]

. The architecture of this type of network involves multiple layers of convolution, non-linear transformation, pooling operation, and a fully connected network at the tail end for prediction as depicted in Figure

2. The convolutional layers in CNN perform convolution operation on the input data using a set of filter banks (kernels) with varying properties to generate some feature maps, which are use as input to the subsequent layers of convolution. This process is repeated until the entire convolutional layers have been exhausted. The units of the feature maps associated to an image patch are connected across different layers of the network with a set of weights [19]

. Moreover, the feature maps are further approximated using a mapping (activation) function such as sigmoid, hyperbolic tangent, or rectified linear unit (ReLU) to ensure non-linearity.

As an intermediate step between two convolutional layers, pooling operation is usually performed to generate features with strong semantic affinity. This strategically removes weaker values in the feature map as well as reducing the size of the feature map by replacing the values in a particular location with the statistical summarization of its neighboring features. This has also proven to help in eliminating the effect of variation caused by translation. Two main techniques have been suggested in the literature which are max pooling which replaces values of the feature map with the max value, and average pooling that simply computes the average of the feature map [37]

. The final layers of CNN consist of fully connected network (FCN) and loss layer. FCN connects every single neuron in one layer to that of another layer, while the loss layer is used for making predictions. Softmax is usually used in the loss layer as it outputs the class probabilities for each class, particularly in a multi-class problem, between the range

, and the sum of all the probabilities will be equal to one.

The type of data utilized in this paper is typically 1-dimensional on each axis as such 1D CNN is primarily used in this work. However, in a situation where we are interested in training the data with 2D CNN, we consider restructuring the 1D data to 2D. Suppose we can convert a 2D matrix to 1D vector via vectorization as , without loss of generality we can obtain the 2D matrix form of the vector by taking the inverse: , as shown in Figure 1. Afterward, we compute the 2D fast fourier transform (2D FFT) on the 2D matrix [38], which then serves as input to 2D CNN. We understand that this may seem counter-intuitive since issues regarding coordinate directions while mapping from 1D to 2D should be put into consideration. However, from the empirical evaluation conducted, the recognition results attained using this approach is quite impressive.

Figure 1: Illustration of restructuring 1D data to 2D matrix. 2D FFT is then computed on the 2D matrix.
Figure 2: Illustration of CNN with restructed data as explained in Section III and Figure 1

. The convolutional layers of the network make use of 2D convolution, max pooling, and ReLU activation function, a fully connected softmax layer.

3.2 Recurrent Convolutional Network (RCN)

RCN is basically the integration of recurrent neural network (RNN)[39] and convolutional layers into a single learning framework. It has the advantages of both convolutional and recurrent networks. RNN in its most traditional form attempts to construct a model with temporal dynamics flow by mapping sequential input data to a hidden state. The hidden states are then mapped to outputs which can be expressed with the following equations(1), given a sequence data .


where is a non-linear activation function computed element-wise, is the hidden state, and is the output at time . One of the major challenges of RNN is the inability to remember interaction in long-term sequence due to the problem of exploding gradients [40]

. As a result, long-short term memory (LSTM)

[41] networks have been introduced as a variant of RNN which incorporates memory units into the network. This effectively allows the network to determine the instances to forget previous hidden states or when to update hidden states when new data is fed into the network. In this work we construct a RCN with 2 convolutional layers of 1D filters and two layers of max pooling. The learned features are passed to LSTM and finally fully connected layer as shown in Figure 3.

Figure 3: Illustration of RCN with input data composing of 9 channel (axes). The convolutional layers of the network make use of 1D convolution, max pooling, and ReLU activation function. The resulting feature maps are fed to the recurrent layer (LSTM) and the final layer is softmax fully connected.

3.3 Support Vector Machine

SVM is a structural risk minimization algorithm based on statistical learning theory


. The main concept is aimed at finding an optimal separating hyperplane that sufficiently separates the data. SVM has successfully been used in several machine learning classification problems such as image classification, face recognition, and object detection. We used the soft margin SVM which uses non-linear mapping functions to transform the data to high dimensional feature space and can separate the data by the introduction of some slack variables


4 Fusion Strategy

4.1 Score Fusion

To examine the impact of score fusion, three techniques are explored as follows.

Sum Rule


Weighted Sum Rule


where, is predicted scores of a classifier and is a weight value assigned to the classifier based on recognition performance.

Entropy Weighted Score Fusion

we propose the concept of using self-informed classifier score significance for fusing multiple classifiers. Unlike weighted score fusion which generically assigns a weight value for each classifier, we instead compute the weight value dynamically by considering the amount of self-information each individual class label can contribute to the decision making.

Assume that we have matrix of predicted class probabilities from a particular classifier (e.g SVM), is samples on the rows and represents the columnwise predicted probabilities of each class label with respect to each individual sample. Since, this is an obvious self contained indication of the classifier performance, we decide to compute the summation of entropies of the set of probabilities from the columns . The proposed method is based on the Tsallis entropy, which is a generalization of the shannon entropy. As a measure of diversity of information, shannon entropy can be expressed as [44]:



represents the probability of possible outcomes of random variable

in column .

With regard to the nature of distribution, shannon entropy makes implicit assumption about the tradeoff between contributions originating from the tails and the main mass of distribution [45]. It is however important to control such tradeoff explicitly to differentiate weak values coinciding with much stronger ones [45]. As a result, we propose using Tsallis entropy to obtain the weights for classifier score fusion. Due to its dependence on power of probabilities, Tsallis entropy allows us to control the contributions from the main mass and tail of the distribution with an entropic-index parameter , which can be expressed as [10]


where is a function for obtaining the Tsallis entropy of values in column . With the term , Tsallis entropy provides different level of concentration of information. will be more sensitive to values occurring more frequently, while will be sensitive to values occurring less often [46]. The parameter we introduced in the equation is a small tunable parameter for selecting the predicted probabilities from the classifier, which are above a certain value. The main justification for parameter is to ensure that entropy is computed for values greater than . This is because a classifier can return values within the range of and entropy requires computation of logarithm, whereas the logarithm of is undefined. Hence the entropy for a classifier is obtained using equation 6:


is the summation of the entropies computed for each column in a classifier’s predicted probability matrix. In order to obtain the weight of each classifier to perform score fusion, we use equation 7:


where, is the relative weight for a classifier (such as SVM, RCN, and CNN), which is performed in turn for each classifier. To obtain the final fused score of the 3 classifiers, we simply apply the relative weight values to the predicted scores from the classifiers as expressed in equation 8:


where is a classifier such as SVM, RCN, CNN.

5 Experimental Results

Experiments are conducted on two benchmark datasets, described as follows:

5.1 Dataset

UCI-HAR: the main dataset used in this paper is presented in [47], which is collected by requesting 30 different subject to wear a smartphone (Samsung Galaxy S II) on their waists. Using the phone’s accelerometer and gyroscope, tri-axial data of six different activities (walking, walking-upstairs, walking-downstairs, sitting, standing, laying) were collected. The data were sampled at a rate of 50 Hz, and separated into windows of 128 values, with overlap. In total there are 9 channels (axes) of gyroscope and accelerometer data, where each axis has a 128-real value vector activity depicting an activity. The 9 channels are:

  • body accelerometer x-axis, y-axis, z-axis: 128 x 3

  • total accelerometer x-axis, y-axis, z-axis: 128 x 3

  • body gyroscope x-axis, y-axis, z-axis: 128 x 3

To conduct the experiments, we used the original split of the dataset composing of 7352 samples for the training and 2947 samples for testing. During training, we used random of the training data as validation set. Once training is completed, we then test the model with 2947 samples that were not used in training.

WISDM: the second dataset used in this work is WISDM [48]

collected by recording 36 subjects performing 6 different activities such as walking, jogging, sitting, standing, climbing upstairs and downstairs. The dataset contains a total of 1,098,207 samples of one triaxial accelerometer sampled at a rate of 20 Hz. In addition, the authors of the dataset have included 43 extracted features based on each segment of 200 raw accelerometer readings, which are primarily based on average, standard deviation, average absolute difference and time between peaks for each axis.

To conduct the experiments on WISDM we used 70% of the dataset for training. Similarly, during training 10% of the dataset are used as validation set. Once the model has been fully trained, the remaining 30% of the dataset are used for testing.

5.2 Settings

For UCI-HAR data, in addition to the data restructuring performed in Section III, we used the original form of the first dataset [47] (which contains 9 channels of 128 accelerometer and 128 gyroscope values) and the preprocessed version of the data which consists of 561 values. Thus, the data configuration for the 3 learning algorithms are as followss:

  • RCN: we used the raw input of 9 channels of accelerometer and gyriscope axes as input. The preceding layers of the network are composed of 2 layers 1D convolutional with rectified linear unit (ReLU) activation function, 2 layers of max pooling, and 1 layer of LSTM as illustrated in Table 1.

    Parameter Value
    Input data size 128
    Input channels 9
    Number of feature maps 50-250
    Filter size -
    Pooling size
    Activation function rectified linear unit (ReLU)
    Learning rate 0.01
    Dropout 0.2
    Batche size 200
    Epochs 200
    LSTM cells size 450
    Table 1: RCN training paramters
  • Scenario 2: involves using SVM for classification. The inputs to SVM are the preprocessed 561 features and the type of kernel used is radial basis function (RBF).

  • Scenario 3: involves training 2D CNN on the restructured data explained in Section III. The 128 raw data from each axis is restructured into , resulting for the 9 axes as shown in Table 2.

Parameter Value
Input data size
Input channels 1 9
Number of feature maps 10-80 20-100
Filter size - -
Pooling size
Activation function ReLU ReLU
Learning rate 0.01 0.001
Dropout 0.2 0.2
Batche size 200 300
Epochs 200 100
Table 2: CNN training paramters

For WISDM, we used the same parameters for training RCN, with the only variation in input data since the number of features is 43 with only one channel, SVM is trained using RBF kernel function, and 1D CNN is used as illustrated in Table 2

5.3 Results

Examining the independent learning scenarios, we discovered that SVM with preprocessed data provided the best recognition performance of 96% on UCI-HAR, RCN produced a result of 93.7%, while 2D CNN with restructured data yielded the least performance of 91.9%. This is quite surprising given that the raw tri-axial data typically possess features along 3 axes, which are not representative of coordinates in 2D matrix. Nevertheless, there seem to be some level of underlying structure in the 128 axial values which are somewhat transferable to 2D matrix. RCN on the other hand gracefully took advantage of the 1D form of the data, given that the convolutional layers are only 1-dimensional and LSTM itself is well suited to feature sequence.

Acc (%) F1-score Acc (%) F1-score
CNN 91.9 91 81.7 81.5
RCN 93.8 93.7 94 92.3
SVM 96 96 82 81
Score Fusion 94 94 86 84
Weighted Score Fusion 94.7 94.8 88.7 86.7
Proposed 96.4 96.3 89.5 89.4
Proposed (RCN + SVM) 97.4 97.4 91.5 91
  • in all experiments

Table 3: HAR results using different Learning Methods
Figure 4: HAR results using weighted score fusion.
Figure 5: HAR results using Entropy Weighted Score Fusion

In term of activity classes, RCN attain 100% recognition on walking-downstairs activity and 84.11% on ssitting activity which is its worst performance. CNN and SVM also showed their worst performance on sitting activity with recognition rates of 81.47% and 89%, while their best performance was attained on laying activity with recognition results of 94.97% and 100% respectively.

With regard to decision fusion, the performance leveled out using ordinary score level fusion with a recognition rate of 94%, weighted score fusion resulted in 94.8%, the proposed entropy weighted score fusion with a performance of 96.4% respectively. The confusion matrix of weighted score fusion and the propose method are depicted in Figure

4 and 5. Though it can be argued that SVM is almost producing similar performance, however when we fused only RCN and SVM we attained a recognition result of 97.4%.

In the case of WISDM, RCN with a recognition result of 94% significantly outperformed SVM and CNN whose results are 82% and 81.7% respectively. In terms of decision fusion, we noticed an increase in performance using weighted score fusion with recognition result of 88.7%, while the proposed method attained a result of 91.5%.

The performance comparison of the proposed technique with state-of-the-art methods in the literature are presented in Table 4 and 5

Technique Recognition Result (%)
Random Forest [49] 91
SVM [47] 96
Stacked Autoencoder [50] 92.16
CNN [11] 94.79
tFFT + CNN [11] 95.75
Proposed Method 97.4
Table 4: Performance comparison on dataset [47]
Technique Recognition Result (%)
J48 [48] 85.1
Handcrafted + Dropout [51] 85.36
Multilayer perceptron [48] 91.7
Proposed Method 91.5
Table 5: Performance comparison on WISDM

6 Conclusion

This paper has presented a new approach for performing decision fusion of classifier predicted scores of activity classes. The method involves computing self informed classifier score significance based on Tsallis entropy to obtain the weights for score fusion. We first examined the performance of independent learning algorithms on different structures of data captured using smartphone sensors from UCI-HAR and WISDM dataset. For this, we utilized RCN, CNN, and SVM, with SVM producing the best recognition result of 96% on UCI-HAR and RCN 94% on WISDM. We then assessed the proposed decision fusion method on the aformentioned benchmark datasets. From the experiments on UCI-HAR, we attained a recognition result of 97% which is an improvement in comparison to independent classifiers, and other score fusion techniques. Moreover, the performance exceeded that of existing methods in the literature, while the performance on WISDM is quite competitive.


  • [1] Wang, X., Han, Y., Wang, C., Zhao, Q., Chen, X., and Chen, M. (2019). In-edge ai: Intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Network, 33(5), 156-165.
  • [2] Yeffet, L., and Wolf, L. (2009, September). Local trinary patterns for human action recognition. In 2009 IEEE 12th international conference on computer vision (pp. 492-497).
  • [3] Chen, C., Jafari, R., and Kehtarnavaz, N. (2015, September). UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168-172).
  • [4] Ikizler-Cinbis, N., and Sclaroff, S. (2010, September). Object, scene and actions: Combining multiple features for human action recognition. In European conference on computer vision (pp. 494-507). Springer, Berlin, Heidelberg.
  • [5] Chen, K., Zhang, D., Yao, L., Guo, B., Yu, Z., and Liu, Y. (2020). Deep learning for sensor-based human activity recognition: overview, challenges and opportunities. arXiv preprint arXiv:2001.07416.
  • [6] Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., and Amirat, Y. (2015). Physical human activity recognition using wearable sensors. Sensors, 15(12), 31314-31338.
  • [7] Palumbo, F., Gallicchio, C., Pucci, R., and Micheli, A. (2016). Human activity recognition using multisensor data fusion based on reservoir computing. Journal of Ambient Intelligence and Smart Environments, 8(2), 87-107.
  • [8] Ehatisham-Ul-Haq, M., Javed, A., Azam, M. A., Malik, H. M., Irtaza, A., Lee, I. H., and Mahmood, M. T. (2019). Robust human activity recognition using multimodal feature-level fusion. IEEE Access, 7, 60736-60751.
  • [9] Nweke, H. F., Teh, Y. W., Mujtaba, G., and Al-Garadi, M. A. (2019). Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Information Fusion, 46, 147-170.
  • [10] Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of statistical physics, 52(1-2), 479-487.
  • [11] Ronao, C. A., and Cho, S. B. (2016). Human activity recognition with smartphone sensors using deep learning neural networks. Expert systems with applications, 59, 235-244.
  • [12]

    Arigbabu, O. A., Ahmad, S. M. S., Adnan, W. A. W., and Yussof, S. (2015). Integration of multiple soft biometrics for human identification. Pattern Recognition Letters, 68, 278-287.

  • [13] Almaslukh, B., AlMuhtadi, J., and Artoli, A. (2017). An effective deep autoencoder approach for online smartphone-based human activity recognition. Int. J. Comput. Sci. Netw. Secur, 17(4), 160-165.
  • [14] Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., and Amirat, Y. (2015). Physical human activity recognition using wearable sensors. Sensors, 15(12), 31314-31338.
  • [15] Lara, O. D., and Labrador, M. A. (2012). A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials, 15(3), 1192-1209.
  • [16] Wang, W., Liu, A. X., Shahzad, M., Ling, K., and Lu, S. (2015, September). Understanding and modeling of wifi signal based human activity recognition. In Proceedings of the 21st annual international conference on mobile computing and networking (pp. 65-76).
  • [17] Fergani, B. (2015). News schemes for activity recognition systems using PCA-WSVM, ICA-WSVM, and LDA-WSVM. Information, 6(3), 505-521.
  • [18] Ren, Z., Zhang, Q., Gao, X., Hao, P., and Cheng, J. (2020). Multi-modality learning for human action recognition. Multimedia Tools and Applications, 1-19.
  • [19] LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
  • [20] Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11-26.
  • [21] Plötz, T., Hammerla, N. Y., and Olivier, P. L. (2011, June). Feature learning for activity recognition in ubiquitous computing. In Twenty-second international joint conference on artificial intelligence.
  • [22] Ordóñez, F. J., and Roggen, D. (2016). Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors, 16(1), 115.
  • [23] Murad, A., and Pyun, J. Y. (2017). Deep recurrent neural networks for human activity recognition. Sensors, 17(11), 2556.
  • [24] Nweke, H. F., Teh, Y. W., Al-Garadi, M. A., and Alo, U. R. (2018). Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges. Expert Systems with Applications, 105, 233-261.
  • [25] Hammerla, N. Y., Halloran, S., and Plötz, T. (2016). Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880.
  • [26] Li, F., Shirahama, K., Nisar, M. A., Köping, L., and Grzegorzek, M. (2018). Comparison of feature learning methods for human activity recognition using wearable sensors. Sensors, 18(2), 679.
  • [27] Inoue, M., Inoue, S., and Nishida, T. (2018). Deep recurrent neural network for mobile human activity recognition with high throughput. Artificial Life and Robotics, 23(2), 173-185.
  • [28] Haghighat, M., Abdel-Mottaleb, M., and Alhalabi, W. (2016). Discriminant correlation analysis: Real-time feature level fusion for multimodal biometric recognition. IEEE Transactions on Information Forensics and Security, 11(9), 1984-1996.
  • [29] Zhang, B., Yang, Y., Chen, C., Yang, L., Han, J., and Shao, L. (2017). Action recognition using 3D histograms of texture and a multi-class boosting classifier. IEEE Transactions on Image processing, 26(10), 4648-4660.
  • [30] Politi, O., Mporas, I., and Megalooikonomou, V. (2014, September). Human motion detection in daily activity tasks using wearable sensors. In 2014 22nd European signal processing conference (EUSIPCO) (pp. 2315-2319). IEEE.
  • [31] Shoaib, M., Bosch, S., Incel, O. D., Scholten, H., and Havinga, P. J. (2014). Fusion of smartphone motion sensors for physical activity recognition. Sensors, 14(6), 10146-10176.
  • [32] Münzner, S., Schmidt, P., Reiss, A., Hanselmann, M., Stiefelhagen, R., and Dürichen, R. (2017, September). CNN-based sensor fusion techniques for multimodal human activity recognition. In Proceedings of the 2017 ACM International Symposium on Wearable Computers (pp. 158-165).
  • [33] Banos, O., Damas, M., Pomares, H., Rojas, F., Delgado-Marquez, B., and Valenzuela, O. (2013). Human activity recognition based on a sensor weighting hierarchical classifier. Soft Computing, 17(2), 333-343.
  • [34] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
  • [35] Guo, M., Wang, Z., Yang, N., Li, Z., and An, T. (2018). A multisensor multiclassifier hierarchical fusion model based on entropy weight for human activity recognition using wearable inertial sensors. IEEE Transactions on Human-Machine Systems, 49(1), 105-111.
  • [36]

    Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

  • [37] Maturana, D., and Scherer, S. (2015, September). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 922-928). IEEE.
  • [38] Gertner, I. (1988). A new efficient algorithm to compute the two-dimensional discrete Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(7), 1036-1050.
  • [39] Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.
  • [40] Pascanu, R., Mikolov, T., and Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310-1318).
  • [41] Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
  • [42] Bengio Y., Lamblin P., Popovici D., and Larochelle H. (2007). Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19. (pp. 153–160). MIT Press.
  • [43] Cortes, C., and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
  • [44] Shannon, C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1), 3-55.
  • [45] Maszczyk, T., and Duch, W. (2008, June). Comparison of Shannon, Renyi and Tsallis entropy used in decision trees. In International Conference on Artificial Intelligence and Soft Computing (pp. 643-651). Springer, Berlin, Heidelberg.
  • [46] Wang, Y., Song, C., and Xia, S. T. (2016, July). Improving decision trees by Tsallis Entropy Information Metric method. In 2016 International Joint Conference on Neural Networks (IJCNN) (pp. 4729-4734). IEEE.
  • [47] Anguita, D., Ghio, A., Oneto, L., Parra, X., and Reyes-Ortiz, J. L. (2013, April). A public domain dataset for human activity recognition using smartphones. In ESANN.
  • [48] Kwapisz, J. R., Weiss, G. M., and Moore, S. A. (2011). Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter, 12(2), 74-82.
  • [49] Sousa, W., Souto, E., Rodrigres, J., Sadarc, P., Jalali, R., and El-Khatib, K. (2017, October). A comparative analysis of the impact of features on human activity recognition with smartphone sensors. In Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web (pp. 397-404).
  • [50] Li, Y., Shi, D., Ding, B., and Liu, D. (2014). Unsupervised feature learning for human activity recognition using smartphone sensors. In Mining intelligence and knowledge exploration (pp. 99-107). Springer, Cham.
  • [51] Kolosnjaji, B., and Eckert, C. (2015, October). Neural network-based user-independent physical activity recognition for mobile devices. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 378-386). Springer, Cham.