Introduction
Existing sensorbased activity recognition systems [Chen et al.2012]
use shallow and conventional supervised machine learning algorithms such as multilayer perceptrons (MLPs), support vector machines, and decision trees. This reveals a gap between the recent developments of deep learning algorithms and existing sensorbased activity recognition systems. When deep learning is applied for sensorbased activity recognition, it results in many advantages in terms of system performance and flexibility. Firstly, deep learning provides an effective tool for extracting highlevel feature hierarchies from highdimensional data which is useful in classification and regression tasks
[Salakhutdinov2015]. These automatically generated features eliminate the need for handcrafted features of existing activity recognition systems. Secondly, deep generative models, such as deep belief networks
[Hinton, Osindero, and Teh2006], can utilize unlabeled activity samples for model fitting in an unsupervised pretraining phase which is exceptionally important due to the scarcity of labeled activity datasets. In the contrary, unlabeled activity datasets are abundant and cheap to collect. Thirdly, deep generative models are more robust against the overfitting problem as compared to discriminative models (e.g., MLP) [Mohamed, Dahl, and Hinton2012].In this paper, we present a systematic approach towards detecting human activities using deep learning and triaxial accelerometers. This paper is also motivated by the success of deep learning in acoustic modeling [Mohamed, Dahl, and Hinton2012, Dahl et al.2012], as we believe that speech and acceleration data have similar patterns of temporal fluctuations. Our approach is grounded over the automated ability of deep activity recognition models in extracting intrinsic features from acceleration data. Our extensive experiments are based on three public and communitybased datasets. In summary, our main results on deep activity recognition models can be summarized as follows:

Deep versus shallow models. Our experimentation shows that using deep activity recognition models significantly enhances the recognition accuracy compared with conventional shallow models. Equally important, deep activity recognition models automatically learn meaningful features and eliminate the need for the handengineering of features, e.g., statistical features, in stateoftheart methods.

Semisupervised learning
. The scarce availability of labeled activity data motivates the exploration of semisupervised learning techniques for a better fitting of activity classifiers. Our experiments show the importance of the generative (unsupervised) training of deep activity recognition models in weight tuning and optimization.

Spectrogram analysis. Accelerometers generate multifrequency, aperiodic, and fluctuating signals which complicate the activity recognition using time series data. We show that using spectrogram signals instead of the raw acceleration data exceptionally helps the deep activity recognition models to capture variations in the input data.

Temporal Modeling
. This paper presents a hybrid approach of deep learning and hidden Markov model (DLHMM) for better recognition accuracy of temporal sequence of activities, e.g., fitness movement and car maintenance checklist. This hybrid technique integrates the hierarchical representations of deep learning with stochastic modeling of temporal sequences in HMMs. Experiments show that a DLHMM outperforms HMMbased methods for temporal activity recognition. Specifically, the learned representation of deep activity recognition models is shown to be effective in estimating the posterior probabilities of HMMs. Unlike Gaussian mixture models which provide an alternative method, deep neural networks do not impose restrict assumptions on the input data distribution
[Mohamed, Dahl, and Hinton2012].
Related Work
In this section, we will focus on classification and feature engineering methods for activity recognition using accelerometers. For a more comprehensive review of the field, we refer interested readers to recent survey papers [Lara and Labrador2013, Chen et al.2012].
Limitations of Shallow Classifiers
Machine learning algorithms have been used for a wide range of activity recognition applications [Parkka et al.2006, Khan et al.2010, Altun and Barshan2010, Kwapisz, Weiss, and Moore2011], allowing the mapping between feature sets and various human activities. The classification of accelerometer samples into static and dynamic activities using MLPs is presented in [Khan et al.2010]. Conventional neural networks, including MLPs, often stuck in local optima [Rumelhart, Hinton, and Williams1986]
which leads to poor performance of activity recognition systems. Moreover, training MLPs using backpropagation
[Rumelhart, Hinton, and Williams1986]only hinders the addition of many hidden layers due to the vanishing gradient problem. The authors in
[Parkka et al.2006] used decision trees and MLPs to classify daily human activities. In [Berchtold et al.2010], a fuzzy inference system is designed to detect human activities. [Kwapisz, Weiss, and Moore2011]compared the recognition accuracy of decision tree (C4.5), logistic regression, and MLPs, where MLPs are found to outperform the other methods.
In this paper, we show significant recognition accuracy improvement on real world datasets over stateoftheart methods for human activity recognition using triaxial accelerometers. Additionally, even though some previous works have purportedly reported promising results of activity recognition accuracy, they still require a degree of handcrafted features as discussed below.
Limitations of Handcrafted Features
Handcrafted features are widely utilized in existing activity recognition systems for generating distinctive features that are fed to classifiers. The authors in [Altun and Barshan2010, Berchtold et al.2010, Kwapisz, Weiss, and Moore2011, Xu et al.2012, Catal et al.2015]
utilized statistical features, e.g., mean, variance, kurtosis and entropy, as distinctive representation features. On the negative side, statistical features are problemspecific, and they poorly generalize to other problem domains. In
[Zappi et al.2008], the signs of raw signal (positive, negative, or null) are used as distinctive features. Despite its simple design, these sign features are plain and cannot represent complex underlying activities which increase the number of required accelerometer nodes. The authors in [Bächlin et al.2010] used the energy and frequency bands in detecting the freezing events of Parkinson’s disease patients. Generally speaking, any handcraftedbased approach involves laborious human intervention for selecting the most effective features and decision thresholds from sensory data.Quite the contrary, datadriven approaches, e.g., using deep learning, can learn discriminative features from historical data which is both systematic and automatic. Therefore, deep learning can play a key role in developing selfconfigurable framework for human activity recognition. The author in [Plötz, Hammerla, and Olivier2011]
discussed the utilization of a few feature learning methods, including deep learning, in activity recognition systems. Nonetheless, this prior work is elementary in its use of deep learning methods, and it does not provide any analysis of the deep network construction, e.g., setup of layers and neurons. Moreover, our probabilistic framework supports temporal sequence modeling of activities by producing the activity membership probabilities as the emission matrix of an HMM. This is a considerable advantage for temporally modeling human actions that consist of a sequence of ordered activities, e.g., fitness movement and car maintenance checklist.
Problem Statement
This section gives a formal description of the activity recognition problem using accelerometer sensors.
Data Acquisition
Consider an accelerometer sensor that is attached to a human body and takes samples (at time index ) of the form
(1) 
where is a 3D accelerometer data point generated at time and composed of , , and which are the xacceleration, yacceleration, and zacceleration components, respectively. The proper acceleration in each axis channel is a floatingpoint value that is bounded to some known constant such that , , and . For example, an accelerometer with units indicates that it can record proper acceleration up to twice the gravitational acceleration (recall that ). Clearly, an accelerometer that is placed on a flat surface record a vertical acceleration value of upward. is a vector that contains 3axial noiseless acceleration readings.
is a noise vector of independent, zeromean Gaussian random variables with variance
such that . Examples of added noise during signal acquisition include the effect of temperature drifts and electromagnetic fields on electrical accelerometers [Fender et al.2008].Three channel frames , , and are then formed to contain the xacceleration, yacceleration, and zacceleration components, respectively. Particularly, these channel frames are created using a sliding window as follows:
(2)  
(3)  
(4) 
The sequence size should be carefully selected such as to ensure an adequate and efficient activity recognition. We assume that the system supports different activities. Specifically, let be a finite activity space. Based the windowed excerpts , , and , the activity recognition method infers the occurrence of an activity .
Data Preprocessing
A spectrogram of an accelerometer signal is a three dimensional representation of changes in the acceleration energy content of a signal as a function of frequency and time. Historically, spectrograms of speech waveforms are widely used as distinguishable features in acoustic modeling, e.g., the melfrequency cepstral [Zheng, Zhang, and Song2001]. In this paper, we use the spectrogram representation as the input of deep activity recognition models as it introduces the following advantages:

Classification accuracy. The spectrogram representation provides interpretable features in capturing the intensity differences among nearest acceleration data points. This enables the classification of activities based on the variations of spectral density which reduce the classification complexity.

Computational complexity. After applying the spectrogram on , , and , the length of the spectral signal is while the time domain signal length is . This significantly reduces the computational burdens of any classification method due to the lower data dimensionality.
Henceforth, the spectrogram signal of the triaxial accelerometer is denoted as , where is the concatenated spectrogram signals from the triaxial input data.
Deep Learning for Activity Recognition: System and Model
Our deep model learns not only the classifier’s weights used to recognize different activities, but also the informative features for recognizing these activities from raw data. This provides a competitive advantage over traditional systems that are handengineered. The model fitting and training consist of two main stages: (i) An unsupervised, generative, and pretraining step, and (ii) a supervised, discriminative, and finetuning step. The pretraining step generates intrinsic features based on a layerbylayer training approach using unlabeled acceleration samples only. Firstly, we use deep belief networks [Hinton, Osindero, and Teh2006] to find the activity membership probabilities. Then, we show how to utilize the activity membership probabilities generated by deep models to model the temporal correlation of sequential activities.
Figure 1 shows the working flow of the proposed activity recognition system. We implement deep activity recognition models based on deep belief networks (BBNs). DBNs are generative models composed of multiple layers of hidden units. In [Hinton, Osindero, and Teh2006]
, the hidden units are formed from restricted Boltzmann machines (RBMs) which are trained in a layerbylayer fashion. Notably, an alternative approach is based on using stacked autoencoders
[Bengio et al.2007]. An RBM is a bipartite graph that is restricted in that no weight connections exist between hidden units. This restriction facilitates the model fitting as the hidden units become conditional independent for a given input vector. After the unsupervised pretraining, the learned weights are finetuned in an updown manner using available data labels. A practical tutorial on the training of RBMs is presented in [Hinton2012].Deep Activity Recognition Models
DBNs [Hinton, Osindero, and Teh2006] can be trained on greedy layerwise training of RBMs as shown in Figure 2. In our model, the acceleration spectrogram signals
are continuous and are fed to a deep activity recognition model. As a result, the first layer of the deep model is selected as a Gaussianbinary RBM (GRBM) which can model the energy content in the continuous accelerometer data. Afterward, the subsequent layers are binarybinary RBMs (BRBMs). RBMs are energybased probabilistic models which are trained using stochastic gradient descent on the negative loglikelihood of the training data. For the GRBM layer, the energy of an observed vector
and a hidden code is denoted as follows:(5) 
where is the weight matrix connecting the input and hidden layers, and are the visible and hidden unit biases, respectively. For a BRBM, the energy function is defined as follows:
(6) 
An RBM can be trained using the contrastive divergence approximation
[Hinton2002] as follows:(7) 
where is a learning rate. is the expectation of reconstruction over the data, and is the expectation of reconstruction over the model using one step of the Gibbs sampler. Please refer to [Hinton, Osindero, and Teh2006, Hinton2012] for further details on the training of DBNs. For simplicity, we denote the weights and biases of a DBN model as which can be used to find the posterior probabilities for each joint configuration .
To this end, the underlying activity can be predicted at time using the softmax regression as follows:
(8) 
Alternatively, the temporal patterns in a sequence of activities can be further analyzed using HMMs. The following section establishes the probabilistic connection between the input data and activity prediction over a sequence of observations .
Temporal Activity Recognition Models (DLHMM)
In some activity recognition applications, there is a temporal pattern in executed human activities, e.g., car checkpoint [Zappi et al.2008]. Hidden Markov models (HMMs) [Rabiner and Juang1986] are a type of graphical models that can simulate the temporal generation of a firstorder Markov process. The temporal activity recognition problem includes finding the most probable sequence of (hidden) activities that produce an (observed) sequence of input . An HMM model is represented as a 3tuple where
is the prior probabilities of all activities in the first hidden state,
is the transition probabilities, and is the emission matrix for observables from hidden symbols . Given a sequence of observations, the emission probabilities is found using a deep model. In particular, the joint probabilities of each joint configuration in an HMM is found as follows:(9)  
(10) 
Herein, (10) shows that an HMM infers the posterior distribution as a recursive process. This decoding problem is solved for the most probable path of sequential activities.
Computational Complexity
Our algorithm consists of three working phases: (a) data gathering, (b) offline learning, and (c) online activity recognition and inference. The computational burden of the offline learning is relatively heavy to be run on a mobile device as it based on stochastic gradient descent optimization. Therefore, it is recommended to run the offline training of a deep activity recognition model on a capable server. Nonetheless, after the offline training is completed, the model parameter is only disseminated to the wearable device where the online activity recognition is lightweight with a linear time complexity (), where
is the number of layers in the deep activity recognition model. Here, the time complexity of the online activity recognition system represents the time needed to recognize the activity as a function of the accelerometer input length. The time complexity of finding the shorttime Fourier transform (STFT) is
. Finally, the time complexity of the HMM decoding problem is .Baselines and Result Summary
Datasets
For empirical comparison with existing approaches, we use three public datasets that represent different application domains to verify the efficiency of our proposed solution. These three testbeds are described as follows:

WISDM Actitracker dataset [Kwapisz, Weiss, and Moore2011]: This dataset contains samples of one triaxial accelerometer that is programmed to sample at a rate of Hz. The data samples belong to users and distinctive human activities of walking, jogging, sitting, standing, and climbing stairs. The acceleration samples are collected using mobile phones with Android operating system.

Daphnet freezing of gait dataset [Bächlin et al.2010]: We used this dataset to demonstrate the healthcare applications of deep activity recognition models. The data samples are collected from patients with the Parkinson’s disease. Three triaxial accelerometers are fixed at patient’s ankle, upper leg, and trunk with a sampling frequency of Hz. The objective is to detect freezing events of patients. The dataset contains experimentation samples from users. The samples are labeled with either “freezing” or “no freezing” classes.

Skoda checkpoint dataset [Zappi et al.2008]: The distinctive activities of this dataset belong to a car maintenance scenario in typical quality control checkpoints. The sampling rate is Hz. Even though the dataset contains nodes of triaxial accelerometers, it would be inconvenient and costly to fix nodes to employee hands which can hinder the maintenance work. Therefore, we use one accelerometer node (ID # 16) for the experimental validation of deep models.
Performance Measures
For binary classification (experimentation on the Daphnet dataset), we use three performance metrics: , , and where TP, TN, FP, and FN mean true positive, true negative, false positive, and false negative, respectively. For multiclass classification of nonoverlapping activities, which are based on the experimentation of the WISDM Actitracker and Skoda checkpoint datasets, the average recognition accuracy (ACC) is found as , where is the number of supported activities.
Baselines
Table 1 summarizes the main performance results of our proposed method and some previous solutions on using the three datasets. Deep activity recondition models introduce significant accuracy improvement over conventional methods. For example, it improves accuracy by over MLPs and over ensemble learning on the WISDM Actitracker dataset. Similarly, significant improvements are also reported for the Daphnet freezing of gait and Skoda checkpoint datasets. This summarized result shows that the deep models are both (a) effective in improving recognition accuracy over stateoftheart methods, and (b) practical for avoiding the handengineering of features.
Dataset  Reference  Solution  Window size  Accuracy (%) 

WISDM  [Kwapisz, Weiss, and Moore2011]  C4.5  10 sec  85.1 
[Kwapisz, Weiss, and Moore2011]  Logistic regression  78.1  
[Kwapisz, Weiss, and Moore2011]  MLPs  91.7  
[Catal et al.2015]  Ensemble learning  94.3  
Our solution  Deep learning models  98.23  
Daphnet  [Bächlin et al.2010]  Energy threshold on power spectral density (0.5sec)  4 sec  TPR: 73.1 and TNR: 81.6 
[Hammerla et al.2013]  C4.5 and kNNs with feature extraction methods    TPR and TNR 82  
Our solution  Deep learning models  4 sec  TPR and TNR 91.5  
Skoda  [Zappi et al.2008]  HMMs    Node 16 (86), nodes 20, 22 and 25 (84) 
Our solution  Deep learning models  4 sec  Node 16 (89.38) 
Experiments on Real Datasets
Spectrogram Analysis
Figure 3 shows triaxial time series and spectrogram signals of 6 activities of the WISDM Actitracker dataset. Clearly, the high frequency signals (a.k.a. AC components) belong to activities with active body motion, e.g., jogging and walking. On the other hand, the low frequency signals (a.k.a. DC components) are collected during semistatic body motions, e.g., sitting and standing. Thereby, these low frequency activities are only distinguishable by the accelerometer measurement of the gravitational acceleration.
Performance Analysis
In our experiments, the data is firstly centered to the mean and scaled to a unit variance. The deep activity recognition models are trained using stochastic gradient decent with minibatch size of . For the first GBRM layer, the pretraining learning rate is set to
with pretraining epochs of
. For next BRBM layers, the number of pretraining epochs is fixed to with pretraining learning rate of . The finetuning learning rate is and the number of finetuning epochs is . For interested technical readers, Hinton [Hinton2012] provides a tutorial on training RBMs with many practical advices on parameter setting and tuning.Deep Model Structure
Figure 4 shows the recognition accuracy on different DBN structures (joint configurations of number of layers and number of neurons per layer). Two important results are summarized as follows:

Deep models outperforms shallow ones. Clearly, the general trend in the recognition accuracy is that using more layers will enhance the recognition accuracy. For example, using layers of neurons at each layer is better than layers of neurons at each layer, which is better than 1 layer of neurons.

Overcomplete representations are advantageous. An overcompete representation is achieved when the number of neurons at each layer is larger than the input length. An overcompete representation is essential for learning deep models with many hidden layers, e.g., deep model of neurons per layer. On the other hand, it is noted that a deep model will be hard to optimized when using undercomplete representations, e.g., layers of neurons at each layer. This harder optimization issue is distinguishable from the overfitting problem as the training data accuracy is also degrading by adding more layers (i.e., an overfitted model is diagnosed when the recognition accuracy on training data is enhancing by adding more layer while getting poorer accuracy on testing data). Therefore, we recommend 4x overcomplete deep activity recognition models (i.e., the number of neurons at each layer is four times the input size).
Pretraining Effects
Expriement  # of layers  Accuracy (%) 

Generative & discriminative training  1  96.87 
3  97.75  
5  97.85  
Discriminative training only  1  96.87 
3  96.46  
5  96.51 
Table 2 shows the recognition accuracy with and without the pretraining phase. These results confirm the importance of the generative pretraining phase of deep activity recognition models. Specifically, a generative pretraining of a deep model guides the discriminative training to better generalization solutions [Erhan et al.2010]. Clearly, the generative pretraining is almost ineffective for 1layer networks. However, using the generative pretraining becomes more essential for the recognition accuracy of deeper activity recognition models, e.g., 5 layers.
Temporal Modeling
We used a deep activity recognition model with 3 layers of 1000 neurons each. The recognition accuracy is for the 10 activities on the Skoda checkpoint dataset (node ID 16), improving over the HMM method presented by [Zappi et al.2008]. Furthermore, the results can be significantly enhanced by exploring the temporal correlation in the dataset. Our hybrid DLHMM achieves near perfect recognition accuracy of . In particular, Figure 5 shows the parameters of a HMM model that is used to model the temporal sequences of the Skoda checkpoint dataset. Here, the checkpoint task must follow a specific activity sequence.
Conclusions and Future Work
We investigated the problem of activity recognition using triaxial accelerometers. The proposed approach is superior to traditional methods of using shallow networks with handcrafted features by using deep activity recognition models. The deep activity recognition models produce significant improvement to the recognition accuracy by extracting hierarchical features from triaxial acceleration data. Moreover, the recognition probabilities of deep activity recognition models are utilized as an emission matrix of a hidden Markov model to temporally model a sequence of human activities.
References
 [Altun and Barshan2010] Altun, K., and Barshan, B. 2010. Human activity recognition using inertial/magnetic sensor units. In Human Behavior Understanding. Springer. 38–51.
 [Bächlin et al.2010] Bächlin, M.; Plotnik, M.; Roggen, D.; Maidan, I.; Hausdorff, J. M.; Giladi, N.; and Tröster, G. 2010. Wearable assistant for Parkinson’s disease patients with the freezing of gait symptom. IEEE Transactions on Information Technology in Biomedicine 14(2):436–446.
 [Bengio et al.2007] Bengio, Y.; Lamblin, P.; Popovici, D.; Larochelle, H.; et al. 2007. Greedy layerwise training of deep networks. Advances in neural information processing systems 19:153.
 [Berchtold et al.2010] Berchtold, M.; Budde, M.; Gordon, D.; Schmidtke, H. R.; and Beigl, M. 2010. ActiServ: Activity recognition service for mobile phones. In Proceedings of the International Symposium on Wearable Computers, 1–8. IEEE.
 [Catal et al.2015] Catal, C.; Tufekci, S.; Pirmit, E.; and Kocabag, G. 2015. On the use of ensemble of classifiers for accelerometerbased activity recognition. Applied Soft Computing.
 [Chen et al.2012] Chen, L.; Hoey, J.; Nugent, C. D.; Cook, D. J.; and Yu, Z. 2012. Sensorbased activity recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42(6):790–808.
 [Dahl et al.2012] Dahl, G. E.; Yu, D.; Deng, L.; and Acero, A. 2012. Contextdependent pretrained deep neural networks for largevocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1):30–42.
 [Erhan et al.2010] Erhan, D.; Bengio, Y.; Courville, A.; Manzagol, P.A.; Vincent, P.; and Bengio, S. 2010. Why does unsupervised pretraining help deep learning? The Journal of Machine Learning Research 11:625–660.
 [Fender et al.2008] Fender, A.; MacPherson, W. N.; Maier, R.; Barton, J. S.; George, D. S.; Howden, R. I.; Smith, G. W.; Jones, B.; McCulloch, S.; Chen, X.; et al. 2008. Twoaxis temperatureinsensitive accelerometer based on multicore fiber Bragg gratings. IEEE sensors journal 7(8):1292–1298.
 [Hammerla et al.2013] Hammerla, N. Y.; Kirkham, R.; Andras, P.; and Ploetz, T. 2013. On preserving statistical characteristics of accelerometry data using their empirical cumulative distribution. In Proceedings of the International Symposium on Wearable Computers, 65–68. ACM.
 [Hinton, Osindero, and Teh2006] Hinton, G. E.; Osindero, S.; and Teh, Y.W. 2006. A fast learning algorithm for deep belief nets. Neural computation 18(7):1527–1554.
 [Hinton2002] Hinton, G. E. 2002. Training products of experts by minimizing contrastive divergence. Neural computation 14(8):1771–1800.
 [Hinton2012] Hinton, G. E. 2012. A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade. Springer. 599–619.
 [Khan et al.2010] Khan, A. M.; Lee, Y.K.; Lee, S. Y.; and Kim, T.S. 2010. A triaxial accelerometerbased physicalactivity recognition via augmentedsignal features and a hierarchical recognizer. IEEE Transactions on Information Technology in Biomedicine 14(5):1166–1172.
 [Kwapisz, Weiss, and Moore2011] Kwapisz, J. R.; Weiss, G. M.; and Moore, S. A. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter 12(2):74–82.
 [Lara and Labrador2013] Lara, O. D., and Labrador, M. A. 2013. A survey on human activity recognition using wearable sensors. IEEE Communications Surveys & Tutorials 15(3):1192–1209.
 [Mohamed, Dahl, and Hinton2012] Mohamed, A.R.; Dahl, G. E.; and Hinton, G. 2012. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20(1):14–22.
 [Parkka et al.2006] Parkka, J.; Ermes, M.; Korpipaa, P.; Mantyjarvi, J.; Peltola, J.; and Korhonen, I. 2006. Activity classification using realistic data from wearable sensors. IEEE Transactions on Information Technology in Biomedicine 10(1):119–128.
 [Plötz, Hammerla, and Olivier2011] Plötz, T.; Hammerla, N. Y.; and Olivier, P. 2011. Feature learning for activity recognition in ubiquitous computing. In IJCAI ProceedingsInternational Joint Conference on Artificial Intelligence, volume 22, 1729.
 [Rabiner and Juang1986] Rabiner, L. R., and Juang, B.H. 1986. An introduction to hidden markov models. IEEE ASSP Magazine 3(1):4–16.
 [Rumelhart, Hinton, and Williams1986] Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning representations by backpropagating errors. Nature 323(6088):533–536.
 [Salakhutdinov2015] Salakhutdinov, R. 2015. Learning deep generative models. Annual Review of Statistics and Its Application 2(1):361–385.
 [Xu et al.2012] Xu, W.; Zhang, M.; Sawchuk, A. A.; and Sarrafzadeh, M. 2012. Robust human activity and sensor location corecognition via sparse signal representation. IEEE Transactions on Biomedical Engineering 59(11):3169–3176.
 [Zappi et al.2008] Zappi, P.; Lombriser, C.; Stiefmeier, T.; Farella, E.; Roggen, D.; Benini, L.; and Tröster, G. 2008. Activity recognition from onbody sensors: Accuracypower tradeoff by dynamic sensor selection. In Wireless sensor networks. Springer. 17–33.
 [Zheng, Zhang, and Song2001] Zheng, F.; Zhang, G.; and Song, Z. 2001. Comparison of different implementations of MFCC. Journal of Computer Science and Technology 16(6):582–589.