Multivariate Time Series Classification using Dilated Convolutional Neural Network

05/05/2019 ∙ by Omolbanin Yazdanbakhsh, et al. ∙ 0

Multivariate time series classification is a high value and well-known problem in machine learning community. Feature extraction is a main step in classification tasks. Traditional approaches employ hand-crafted features for classification while convolutional neural networks (CNN) are able to extract features automatically. In this paper, we use dilated convolutional neural network for multivariate time series classification. To deploy dilated CNN, a multivariate time series is transformed into an image-like style and stacks of dilated and strided convolutions are applied to extract in and between features of variates in time series simultaneously. We evaluate our model on two human activity recognition time series, finding that the automatic features extracted for the time series can be as effective as hand-crafted features.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Time series are observations recorded sequentially over time. In univariate time series, data are collected from one source, thus, each observation is a single scalar while in multivariate time series, observations are recorded from multiple sources simultaneously, thus, each data point is a multi-dimensional vector. Time series applications arises in a broad range of domains including health care

(Jones et al., 2009), climate (Zhang et al., 2018), robotics (Pérez-D’Arpino & Shah, 2015) and stock markets (Maknickienė et al., 2011).

Various statistical and machine learning algorithms have been proposed for time series classification (Yazdanbakhsh & Dick, 2017; Che et al., 2018; Saad & Mansinghka, 2017; Dobigeon et al., 2007; Zhang, 2003). General approach for time series classification is splitting time series to equal size segments using a fixed-length sliding window and extracting hand-crafted features from the segments for classification tasks. The features are usually statistical measurements or features extracted from another domain such Fourier and Wavelet domain (Jiang & Yin, 2015; Ravi et al., 2017; Lin et al., 2003). In multivariate time series classification, commonly, information is extracted separately from each variate, and the features are concatenated for the classification task (Zheng et al., 2016; Liu et al., 2019; Chambon et al., 2018). However, it may ignore the relation between data items comprising each component of an observation.

Deep learning approaches substitute hand-crafted features with features learned automatically from data. In CNN, the features are obtained by learning filters convolved with small sub-region of data (Krizhevsky et al., 2012; LeCun et al., 1998). Performance of a wide range of applications have been improved by replacing traditional approaches with deep learning methods (Van Den Oord et al., 2016; Toshev & Szegedy, 2014; Young et al., 2018). Various deep learning methods have been employed for time series classification (Zheng et al., 2016; Liu et al., 2019; Gao et al., 2018). However, in most studies, hand-crafted features are still injected to the network to improve its performance (Ravi et al., 2017; Ignatov, 2018).

In this paper, we propose a convolution-based learning algorithm for multivariate time series classification. In this algorithm, a multivariate time series is studied as a one-channel image where each row in the image corresponds to one of the variates in the time series. The CNN employed in this work is inspired by Wavenet architecture (Van Den Oord et al., 2016) which applied dilated convolution operations for audio synthesis. We show that this structure is able to learn the relation between variates in the time series and uses their mutual information to reduce noise and improve performance. This method is applied to two human activity recognition time series (WISDM v.1.1 (Kwapisz et al., 2011) and WISDM v.2 (Lockhart et al., 2011)) and its performance is compared against published results on these time series.

The remainder of the paper is organized as follow. In Section  2, we review related works on time series studies using deep learning methods. Our method is presented in Section  3 and our experimental results are presented in Section  4. We close with a summary in Section  5.

2 Related Works

Various approaches have been proposed for time series analysis. (Zhang, 2003) combined auto regressive integrated moving average model (ARIMA) and neural network models for time series prediction. (Yazdanbakhsh & Dick, 2017) used neuro-complex fuzzy systems for multivariate time series prediction. (Che et al., 2018)

applied an extension of recurrent neural network (RNN) for multivariate time series classification with missing values.

(Yeo, 2017)

employed long short-term memory (LSTM) model for chaotic time series forecasting.

(Orozco et al., 2018) developed an ordinal regression deep neural network based on LSTM. (Saad & Mansinghka, 2017) proposed a Bayesian non-parametric method for multivariate time series forecasting.

Convolution-based approaches have been proposed for time series analysis as well. (Borovykh et al., 2017) uses WaveNet model for conditional multivariate time series prediction where forecasting of each variate is conditioned on other variates in the time series. (Zheng et al., 2016)

designed a multi-channel CNN where each channel takes a variate of multivariate time series as input and learns features individually; classification task is done by applying a multi-layer perceptron network on the combined features of each channel.

(Lee et al., 2017) applied a CNN for fault classification and diagnostic in semiconductor manufacturing where the relation between variates are only explored over time. (Liu et al., 2019)

did multivariate time series classification in a four-stage process; the time series is converted to a 3-D tensor and passed through three stages including univariate convolution stage, multivariate convolution stage and fully connected stage.

(Gao et al., 2018) studied multivariate time series by partitioning the data to groups based on covariance structure of the time series. (Chambon et al., 2018) employed deep learning for sleep stage classification; features are extracted separately from each channel of Polysomnography (PSG) signals and combined to give a classification label. (Gamboa, 2017) did a review on deep learning approaches for time series analysis. (Fawaz et al., 2019) benchmarked some well-known deep learning algorithms for univariate and multivariate time series classification.

Moreover, WISDM time series has been studied using deep learning approaches. (Ravi et al., 2017) applied deep learning algorithms on spectrogram features of the time series. (Ignatov, 2018) extracted features from each variate of the time series separately using CNN; more over, statistical features are added to the network as additional information.

3 Methods

Our proposed system is shown in Figure 1. It has three components a) image module 2) CNN module 3) fully connected module. Detailed description of each module is as follow:

Figure 1: Block diagram of the proposed approach

3.1 Image Module

In this module, first, multivariate time series is split to equal-size segments. A sliding window is moved over the time series to give batches of the time series; selecting between overlapping and non-overlapping sliding window is application-dependent. Each segment is a multi-dimensional vector of size () where is length of the sliding window and is number of variates in the time series. Given the segments of time series, we transform each segment to a one-channel image where each variate in the segment forms a row in the image. Outputs of this module are one-channel images of size ().

3.2 CNN Module

CNN module comprises of stacks of dilated convolution and strided convolution layers with ReLU activations. Dilated convolution layers are applied to extract features from the segments of multivariate time series obtained from the image module, and strided convolution layers reduce spatial dimension. The first layer in the CNN module is a dilated convolution layer; the filters in this layer slide over a one-channel image of size (

) with stride=1 in both horizontal and vertical direction; sliding the filters in the both directions extracts features in and between variates simultaneously, thus, there is no need to study each variate individually at the beginning of our process. Moreover, employing dilation rate in the convolution allows the model to learn relation between observations that are far apart.

Next layers are stacks of strided and dilated convolution layers processing the obtained feature maps. Filters in the strided convolution layers are applied on each row separately, thus, only reduce number of columns of the feature maps and preserve number of rows.

3.3 Fully Connected Module

In this module, fully connected layers are applied to the output of the CNN module, then, a Softmax layer computes probability of predicted classes. As loss function, cross-entropy function is used.

4 Experiments

To evaluate the performance of the proposed model, we apply the model to two public multivariate time series containing accelerometer data. The results are compared against published results on these datasets.

4.1 Time Series

WISDM v1.1 (Kwapisz et al., 2011) has accelerometer data (x-, y-, z- channels) collected from 36 users carrying an Android phone over their front pocket while performing a set of daily activities including walking, jogging, upstairs, downstairs, sitting, and standing. Table  1 shows details of this data set.

width=1 Activity Walking Jogging Upstairs Downstairs Sitting Standing Number of Examples 424,400 (38.6%) 342,177 (31.2%) 122,869 (11.2%) 100,427 (9.1%) 59,939 (5.5%) 48,395 (4.4%) Total number of examples= 1,098,207 Number of activities= 6 Sampling rate= 20 Hz

Table 1: Details of WISDM v.1.1

We create two data sets from this time series in order to be able to compare the results with published results. The first data set (v1-split) has non-overlapping segments with length 100 (5 sec) and the segments are split to 80% and 20% as training and testing sets, respectively, giving 8347 instances as training and 2643 instances as testing. The second data set (v1-individual) considers data of 28 users as training set and 8 users as training; the training and testing sets are segmented with sliding window of 200 (10 sec) with step size of 20 giving 41729 training and 13162 testing instances.

WISDM v.2 (Lockhart et al., 2011) has x-, y-, z- channels of accelerometer collected from 536 users while walking, jogging, stairs, sitting, standing and lying down. Table  2 shows details of this time series.

width=1 Activity Walking Jogging Upstairs Downstairs Sitting Standing Number of Examples 1,255,923 (42.1%) 438,871 (14.7%) 57,425 (1.9%) 663,706 (22.3%) 288,873 (9.7%) 275,967 (9.3%) Total number of examples= 2,980,765 Number of activities= 6 Sampling rate= 20 Hz

Table 2: Details of WISDM v.2

Non-overlapping segments with length 200 (10 sec) is obtained from the time series; the segments are split to 80% and 20% as training and testing sets, respectively, giving 10396 segments as training and 4456 segments as testing.

4.2 Network Analysis

The network designed for WISDM v1-individual has six layers including: DL (filter size= (3,20), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,4), number of filters=32, stride =(1,4)) DL (filter size= (3,3), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,4), number of filters=32, stride =(1,4)) FL (unit=1024) FL (unit=6) where DL, SL and FL stand for dilated convolution layer, strided convolution layer and fully connected layer, respectively.

The network designed for WISDM v1-split is as follow: DL (filter size= (3,10), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,4), number of filters=32, stride =(1,4)) DL (filter size= (3,3), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,2), number of filters=32, stride =(1,2)) FL (unit=1024) FL (unit=6)

To train the network, we use batch size of 256 on a single GPU. Adam optimizer (Kingma & Ba, 2014) is used with and and a the learning rate was set to . We also apply regularization with weight for v1-individual and for v1-split.

The network designed for WISDM v.2 is trained same as WISDM v.1.1 with regularization weight of ; the model has 8 layers as follow: DL (filter size= (3,10), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,2), number of filters=32, stride =(1,2)) DL (filter size= (3,3), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,2), number of filters=32, stride =(1,2)) DL (filter size= (3,3), number of filters=64, dilation rate=(1,1)) SL (filter size= (1,2), number of filters=64, stride =(1,2)) FL (unit=512) FL (unit=6)

Table  3 shows performance of our network in terms of F1-score (Goutte & Gaussier, 2005) for WISDM v1-split. We compare our results against (Ravi et al., 2017) which passes spectrogram features of non-overlapping segments of the time series to a deep learning algorithm.

Our Method (Ravi et al., 2017)
Walking 97.4% 99.3%
Jogging 98.3% 99.5%
Upstairs 86.4% 95.3%
Downstairs 80.5% 95.1%
Sitting 98% 98.2%
Standing 94.9% 97.6%
Table 3: Accuracy results for WISDM v1-split based on F1-score

Table  4 shows our results for WISDM v1-individual based on recall (Goutte & Gaussier, 2005) and compares it to (Ignatov, 2018) which creates segments in training and testing sets based on individuals; the network uses a CNN to extract features from each variate separately and combines them with statistical features for classification. Table  5 shows performance of our method on WISDM v.2 based on F1-score and compares it with (Ravi et al., 2017).

Our Method (Ignatov, 2018)
Walking 97.8% 97.8%
Jogging 95.5% 98.5%
Upstairs 78.5% 72.2%
Downstairs 75.1% 87.0%
Sitting 90.9% 82.6%
Standing 91.8% 93.3%
Table 4: Accuracy results for WISDM v.1 individual based on recall

Our Method (Ravi et al., 2017)
Walking 96.6% 97.2%
Jogging 96.9% 97.9%
Stairs 63.1% 79.3%
Sitting 91.2% 88.2%
Standing 87.2% 82.1%
Lying Down 90.7% 87.2%
Table 5: Accuracy results for WISDM v.2 based on F1-score

Table  3 and  5 indicate that features extracted automatically using our model can be as effective as spectrogram features. In WISDM v.2, for walking and jogging activities, our accuracy is 1% lower and our model outperforms (Ravi et al., 2017) in the static activities including sitting, standing and lying down (3.9% higher on average). However, the model cannot compete in the stairs activity where we have the fewest number of instances (1.9%). In WISDM v1-split, (Ravi et al., 2017)] outperforms our model in all activities; our accuracy is lower 8.9% and 14.6% for upstairs and downstairs, respectively and for the other activities, our accuracy is 1.5% lower on average. Table  4 indicates, our model outperforms combination of statistic features and CNN features (Ignatov, 2018) for upstairs and standing activities (2.4% higher on average), and has similar performance on walking. For jogging, downstairs and standing is lower 5.5% on average.

5 Conclusion

We have proposed a new algorithm for multivariate time series classification. After splitting a multivariate time series to equal size segments, we convert the segments to one-channel images; the one-channel images are processed using stacks of dilated and strided convolutions. The extracted features for classification task are obtained by considering the inter and intra relation between variates. Our experiments show that the proposed model can be as effective as models working with hand-crafted features such as spectrogram and statistical features.


  • Borovykh et al. (2017) Borovykh, A., Bohte, S., and Oosterlee, C. W. Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017.
  • Chambon et al. (2018) Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G., and Gramfort, A. A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(4):758–769, 2018.
  • Che et al. (2018) Che, Z., Purushotham, S., Cho, K., Sontag, D., and Liu, Y. Recurrent neural networks for multivariate time series with missing values. Scientific reports, 8(1):6085, 2018.
  • Dobigeon et al. (2007) Dobigeon, N., Tourneret, J.-Y., and Scargle, J. D. Joint segmentation of multivariate astronomical time series: Bayesian sampling with a hierarchical model. IEEE Transactions on Signal Processing, 55(2):414–423, 2007.
  • Fawaz et al. (2019) Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L., and Muller, P.-A. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, pp. 1–47, 2019.
  • Gamboa (2017) Gamboa, J. C. B. Deep learning for time-series analysis. arXiv preprint arXiv:1701.01887, 2017.
  • Gao et al. (2018) Gao, J., Murphey, Y. L., and Zhu, H. Multivariate time series prediction of lane changing behavior using deep neural network. Applied Intelligence, 48(10):3523–3537, 2018.
  • Goutte & Gaussier (2005) Goutte, C. and Gaussier, E.

    A probabilistic interpretation of precision, recall and f-score, with implication for evaluation.

    In European Conference on Information Retrieval, pp. 345–359. Springer, 2005.
  • Ignatov (2018) Ignatov, A. Real-time human activity recognition from accelerometer data using convolutional neural networks. Applied Soft Computing, 62:915–922, 2018.
  • Jiang & Yin (2015) Jiang, W. and Yin, Z. Human activity recognition using wearable sensors by deep convolutional neural networks. In Proceedings of the 23rd ACM international conference on Multimedia, pp. 1307–1310. Acm, 2015.
  • Jones et al. (2009) Jones, S. S., Evans, R. S., Allen, T. L., Thomas, A., Haug, P. J., Welch, S. J., and Snow, G. L. A multivariate time series approach to modeling and forecasting demand in the emergency department. Journal of biomedical informatics, 42(1):123–139, 2009.
  • Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
  • Kwapisz et al. (2011) Kwapisz, J. R., Weiss, G. M., and Moore, S. A. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter, 12(2):74–82, 2011.
  • LeCun et al. (1998) LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • Lee et al. (2017) Lee, K. B., Cheon, S., and Kim, C. O. A convolutional neural network for fault classification and diagnosis in semiconductor manufacturing processes. IEEE Transactions on Semiconductor Manufacturing, 30(2):135–142, 2017.
  • Lin et al. (2003) Lin, J., Keogh, E., Lonardi, S., and Chiu, B. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 2–11. ACM, 2003.
  • Liu et al. (2019) Liu, C.-L., Hsaio, W.-H., and Tu, Y.-C. Time series classification with multivariate convolutional neural network. IEEE Transactions on Industrial Electronics, 66(6):4788–4797, 2019.
  • Lockhart et al. (2011) Lockhart, J. W., Weiss, G. M., Xue, J. C., Gallagher, S. T., Grosner, A. B., and Pulickal, T. T. Design considerations for the wisdm smart phone-based sensor mining architecture. In Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data, pp. 25–33. ACM, 2011.
  • Maknickienė et al. (2011) Maknickienė, N., Rutkauskas, A. V., and Maknickas, A. Investigation of financial market prediction by recurrent neural network. Innovative Technologies for Science, Business and Education, 2(11):3–8, 2011.
  • Orozco et al. (2018) Orozco, B. P., Abbati, G., and Roberts, S. Mordred: Memory-based ordinal regression deep neural networks for time series forecasting. arXiv preprint arXiv:1803.09704, 2018.
  • Pérez-D’Arpino & Shah (2015) Pérez-D’Arpino, C. and Shah, J. A. Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. In 2015 IEEE international conference on robotics and automation (ICRA), pp. 6175–6182. IEEE, 2015.
  • Ravi et al. (2017) Ravi, D., Wong, C., Lo, B., and Yang, G.-Z. A deep learning approach to on-node sensor data analytics for mobile or wearable devices. IEEE journal of biomedical and health informatics, 21(1):56–64, 2017.
  • Saad & Mansinghka (2017) Saad, F. A. and Mansinghka, V. K. Temporally-reweighted chinese restaurant process mixtures for clustering, imputing, and forecasting multivariate time series. arXiv preprint arXiv:1710.06900, 2017.
  • Toshev & Szegedy (2014) Toshev, A. and Szegedy, C.

    Deeppose: Human pose estimation via deep neural networks.


    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pp. 1653–1660, 2014.
  • Van Den Oord et al. (2016) Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. W., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. SSW, 125, 2016.
  • Yazdanbakhsh & Dick (2017) Yazdanbakhsh, O. and Dick, S. Forecasting of multivariate time series via complex fuzzy logic. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(8):2160–2171, 2017.
  • Yeo (2017) Yeo, K. Model-free prediction of noisy chaotic time series by deep learning. arXiv preprint arXiv:1710.01693, 2017.
  • Young et al. (2018) Young, T., Hazarika, D., Poria, S., and Cambria, E.

    Recent trends in deep learning based natural language processing.

    ieee Computational intelligenCe magazine, 13(3):55–75, 2018.
  • Zhang (2003) Zhang, G. P. Time series forecasting using a hybrid arima and neural network model. Neurocomputing, 50:159–175, 2003.
  • Zhang et al. (2018) Zhang, Z., Zhang, and Khelifi. Multivariate Time Series Analysis in Climate and Environmental Research. Springer, 2018.
  • Zheng et al. (2016) Zheng, Y., Liu, Q., Chen, E., Ge, Y., and Zhao, J. L. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification. Frontiers of Computer Science, 10(1):96–112, 2016.