1 Introduction
Time series are observations recorded sequentially over time. In univariate time series, data are collected from one source, thus, each observation is a single scalar while in multivariate time series, observations are recorded from multiple sources simultaneously, thus, each data point is a multidimensional vector. Time series applications arises in a broad range of domains including health care
(Jones et al., 2009), climate (Zhang et al., 2018), robotics (PérezD’Arpino & Shah, 2015) and stock markets (Maknickienė et al., 2011).Various statistical and machine learning algorithms have been proposed for time series classification (Yazdanbakhsh & Dick, 2017; Che et al., 2018; Saad & Mansinghka, 2017; Dobigeon et al., 2007; Zhang, 2003). General approach for time series classification is splitting time series to equal size segments using a fixedlength sliding window and extracting handcrafted features from the segments for classification tasks. The features are usually statistical measurements or features extracted from another domain such Fourier and Wavelet domain (Jiang & Yin, 2015; Ravi et al., 2017; Lin et al., 2003). In multivariate time series classification, commonly, information is extracted separately from each variate, and the features are concatenated for the classification task (Zheng et al., 2016; Liu et al., 2019; Chambon et al., 2018). However, it may ignore the relation between data items comprising each component of an observation.
Deep learning approaches substitute handcrafted features with features learned automatically from data. In CNN, the features are obtained by learning filters convolved with small subregion of data (Krizhevsky et al., 2012; LeCun et al., 1998). Performance of a wide range of applications have been improved by replacing traditional approaches with deep learning methods (Van Den Oord et al., 2016; Toshev & Szegedy, 2014; Young et al., 2018). Various deep learning methods have been employed for time series classification (Zheng et al., 2016; Liu et al., 2019; Gao et al., 2018). However, in most studies, handcrafted features are still injected to the network to improve its performance (Ravi et al., 2017; Ignatov, 2018).
In this paper, we propose a convolutionbased learning algorithm for multivariate time series classification. In this algorithm, a multivariate time series is studied as a onechannel image where each row in the image corresponds to one of the variates in the time series. The CNN employed in this work is inspired by Wavenet architecture (Van Den Oord et al., 2016) which applied dilated convolution operations for audio synthesis. We show that this structure is able to learn the relation between variates in the time series and uses their mutual information to reduce noise and improve performance. This method is applied to two human activity recognition time series (WISDM v.1.1 (Kwapisz et al., 2011) and WISDM v.2 (Lockhart et al., 2011)) and its performance is compared against published results on these time series.
2 Related Works
Various approaches have been proposed for time series analysis. (Zhang, 2003) combined auto regressive integrated moving average model (ARIMA) and neural network models for time series prediction. (Yazdanbakhsh & Dick, 2017) used neurocomplex fuzzy systems for multivariate time series prediction. (Che et al., 2018)
applied an extension of recurrent neural network (RNN) for multivariate time series classification with missing values.
(Yeo, 2017)employed long shortterm memory (LSTM) model for chaotic time series forecasting.
(Orozco et al., 2018) developed an ordinal regression deep neural network based on LSTM. (Saad & Mansinghka, 2017) proposed a Bayesian nonparametric method for multivariate time series forecasting.Convolutionbased approaches have been proposed for time series analysis as well. (Borovykh et al., 2017) uses WaveNet model for conditional multivariate time series prediction where forecasting of each variate is conditioned on other variates in the time series. (Zheng et al., 2016)
designed a multichannel CNN where each channel takes a variate of multivariate time series as input and learns features individually; classification task is done by applying a multilayer perceptron network on the combined features of each channel.
(Lee et al., 2017) applied a CNN for fault classification and diagnostic in semiconductor manufacturing where the relation between variates are only explored over time. (Liu et al., 2019)did multivariate time series classification in a fourstage process; the time series is converted to a 3D tensor and passed through three stages including univariate convolution stage, multivariate convolution stage and fully connected stage.
(Gao et al., 2018) studied multivariate time series by partitioning the data to groups based on covariance structure of the time series. (Chambon et al., 2018) employed deep learning for sleep stage classification; features are extracted separately from each channel of Polysomnography (PSG) signals and combined to give a classification label. (Gamboa, 2017) did a review on deep learning approaches for time series analysis. (Fawaz et al., 2019) benchmarked some wellknown deep learning algorithms for univariate and multivariate time series classification.Moreover, WISDM time series has been studied using deep learning approaches. (Ravi et al., 2017) applied deep learning algorithms on spectrogram features of the time series. (Ignatov, 2018) extracted features from each variate of the time series separately using CNN; more over, statistical features are added to the network as additional information.
3 Methods
Our proposed system is shown in Figure 1. It has three components a) image module 2) CNN module 3) fully connected module. Detailed description of each module is as follow:
3.1 Image Module
In this module, first, multivariate time series is split to equalsize segments. A sliding window is moved over the time series to give batches of the time series; selecting between overlapping and nonoverlapping sliding window is applicationdependent. Each segment is a multidimensional vector of size () where is length of the sliding window and is number of variates in the time series. Given the segments of time series, we transform each segment to a onechannel image where each variate in the segment forms a row in the image. Outputs of this module are onechannel images of size ().
3.2 CNN Module
CNN module comprises of stacks of dilated convolution and strided convolution layers with ReLU activations. Dilated convolution layers are applied to extract features from the segments of multivariate time series obtained from the image module, and strided convolution layers reduce spatial dimension. The first layer in the CNN module is a dilated convolution layer; the filters in this layer slide over a onechannel image of size (
) with stride=1 in both horizontal and vertical direction; sliding the filters in the both directions extracts features in and between variates simultaneously, thus, there is no need to study each variate individually at the beginning of our process. Moreover, employing dilation rate in the convolution allows the model to learn relation between observations that are far apart.Next layers are stacks of strided and dilated convolution layers processing the obtained feature maps. Filters in the strided convolution layers are applied on each row separately, thus, only reduce number of columns of the feature maps and preserve number of rows.
3.3 Fully Connected Module
In this module, fully connected layers are applied to the output of the CNN module, then, a Softmax layer computes probability of predicted classes. As loss function, crossentropy function is used.
4 Experiments
To evaluate the performance of the proposed model, we apply the model to two public multivariate time series containing accelerometer data. The results are compared against published results on these datasets.
4.1 Time Series
WISDM v1.1 (Kwapisz et al., 2011) has accelerometer data (x, y, z channels) collected from 36 users carrying an Android phone over their front pocket while performing a set of daily activities including walking, jogging, upstairs, downstairs, sitting, and standing. Table 1 shows details of this data set.
We create two data sets from this time series in order to be able to compare the results with published results. The first data set (v1split) has nonoverlapping segments with length 100 (5 sec) and the segments are split to 80% and 20% as training and testing sets, respectively, giving 8347 instances as training and 2643 instances as testing. The second data set (v1individual) considers data of 28 users as training set and 8 users as training; the training and testing sets are segmented with sliding window of 200 (10 sec) with step size of 20 giving 41729 training and 13162 testing instances.
WISDM v.2 (Lockhart et al., 2011) has x, y, z channels of accelerometer collected from 536 users while walking, jogging, stairs, sitting, standing and lying down. Table 2 shows details of this time series.
Nonoverlapping segments with length 200 (10 sec) is obtained from the time series; the segments are split to 80% and 20% as training and testing sets, respectively, giving 10396 segments as training and 4456 segments as testing.
4.2 Network Analysis
The network designed for WISDM v1individual has six layers including: DL (filter size= (3,20), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,4), number of filters=32, stride =(1,4)) DL (filter size= (3,3), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,4), number of filters=32, stride =(1,4)) FL (unit=1024) FL (unit=6) where DL, SL and FL stand for dilated convolution layer, strided convolution layer and fully connected layer, respectively.
The network designed for WISDM v1split is as follow: DL (filter size= (3,10), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,4), number of filters=32, stride =(1,4)) DL (filter size= (3,3), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,2), number of filters=32, stride =(1,2)) FL (unit=1024) FL (unit=6)
To train the network, we use batch size of 256 on a single GPU. Adam optimizer (Kingma & Ba, 2014) is used with and and a the learning rate was set to . We also apply regularization with weight for v1individual and for v1split.
The network designed for WISDM v.2 is trained same as WISDM v.1.1 with regularization weight of ; the model has 8 layers as follow: DL (filter size= (3,10), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,2), number of filters=32, stride =(1,2)) DL (filter size= (3,3), number of filters=32, dilation rate=(1,2)) SL (filter size= (1,2), number of filters=32, stride =(1,2)) DL (filter size= (3,3), number of filters=64, dilation rate=(1,1)) SL (filter size= (1,2), number of filters=64, stride =(1,2)) FL (unit=512) FL (unit=6)
Table 3 shows performance of our network in terms of F1score (Goutte & Gaussier, 2005) for WISDM v1split. We compare our results against (Ravi et al., 2017) which passes spectrogram features of nonoverlapping segments of the time series to a deep learning algorithm.
Our Method  (Ravi et al., 2017)  

Walking  97.4%  99.3% 
Jogging  98.3%  99.5% 
Upstairs  86.4%  95.3% 
Downstairs  80.5%  95.1% 
Sitting  98%  98.2% 
Standing  94.9%  97.6% 
Table 4 shows our results for WISDM v1individual based on recall (Goutte & Gaussier, 2005) and compares it to (Ignatov, 2018) which creates segments in training and testing sets based on individuals; the network uses a CNN to extract features from each variate separately and combines them with statistical features for classification. Table 5 shows performance of our method on WISDM v.2 based on F1score and compares it with (Ravi et al., 2017).
Our Method  (Ignatov, 2018)  

Walking  97.8%  97.8% 
Jogging  95.5%  98.5% 
Upstairs  78.5%  72.2% 
Downstairs  75.1%  87.0% 
Sitting  90.9%  82.6% 
Standing  91.8%  93.3% 
Our Method  (Ravi et al., 2017)  

Walking  96.6%  97.2% 
Jogging  96.9%  97.9% 
Stairs  63.1%  79.3% 
Sitting  91.2%  88.2% 
Standing  87.2%  82.1% 
Lying Down  90.7%  87.2% 
Table 3 and 5 indicate that features extracted automatically using our model can be as effective as spectrogram features. In WISDM v.2, for walking and jogging activities, our accuracy is 1% lower and our model outperforms (Ravi et al., 2017) in the static activities including sitting, standing and lying down (3.9% higher on average). However, the model cannot compete in the stairs activity where we have the fewest number of instances (1.9%). In WISDM v1split, (Ravi et al., 2017)] outperforms our model in all activities; our accuracy is lower 8.9% and 14.6% for upstairs and downstairs, respectively and for the other activities, our accuracy is 1.5% lower on average. Table 4 indicates, our model outperforms combination of statistic features and CNN features (Ignatov, 2018) for upstairs and standing activities (2.4% higher on average), and has similar performance on walking. For jogging, downstairs and standing is lower 5.5% on average.
5 Conclusion
We have proposed a new algorithm for multivariate time series classification. After splitting a multivariate time series to equal size segments, we convert the segments to onechannel images; the onechannel images are processed using stacks of dilated and strided convolutions. The extracted features for classification task are obtained by considering the inter and intra relation between variates. Our experiments show that the proposed model can be as effective as models working with handcrafted features such as spectrogram and statistical features.
References
 Borovykh et al. (2017) Borovykh, A., Bohte, S., and Oosterlee, C. W. Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017.
 Chambon et al. (2018) Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G., and Gramfort, A. A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(4):758–769, 2018.
 Che et al. (2018) Che, Z., Purushotham, S., Cho, K., Sontag, D., and Liu, Y. Recurrent neural networks for multivariate time series with missing values. Scientific reports, 8(1):6085, 2018.
 Dobigeon et al. (2007) Dobigeon, N., Tourneret, J.Y., and Scargle, J. D. Joint segmentation of multivariate astronomical time series: Bayesian sampling with a hierarchical model. IEEE Transactions on Signal Processing, 55(2):414–423, 2007.
 Fawaz et al. (2019) Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L., and Muller, P.A. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, pp. 1–47, 2019.
 Gamboa (2017) Gamboa, J. C. B. Deep learning for timeseries analysis. arXiv preprint arXiv:1701.01887, 2017.
 Gao et al. (2018) Gao, J., Murphey, Y. L., and Zhu, H. Multivariate time series prediction of lane changing behavior using deep neural network. Applied Intelligence, 48(10):3523–3537, 2018.

Goutte & Gaussier (2005)
Goutte, C. and Gaussier, E.
A probabilistic interpretation of precision, recall and fscore, with implication for evaluation.
In European Conference on Information Retrieval, pp. 345–359. Springer, 2005.  Ignatov (2018) Ignatov, A. Realtime human activity recognition from accelerometer data using convolutional neural networks. Applied Soft Computing, 62:915–922, 2018.
 Jiang & Yin (2015) Jiang, W. and Yin, Z. Human activity recognition using wearable sensors by deep convolutional neural networks. In Proceedings of the 23rd ACM international conference on Multimedia, pp. 1307–1310. Acm, 2015.
 Jones et al. (2009) Jones, S. S., Evans, R. S., Allen, T. L., Thomas, A., Haug, P. J., Welch, S. J., and Snow, G. L. A multivariate time series approach to modeling and forecasting demand in the emergency department. Journal of biomedical informatics, 42(1):123–139, 2009.
 Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
 Kwapisz et al. (2011) Kwapisz, J. R., Weiss, G. M., and Moore, S. A. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter, 12(2):74–82, 2011.
 LeCun et al. (1998) LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 Lee et al. (2017) Lee, K. B., Cheon, S., and Kim, C. O. A convolutional neural network for fault classification and diagnosis in semiconductor manufacturing processes. IEEE Transactions on Semiconductor Manufacturing, 30(2):135–142, 2017.
 Lin et al. (2003) Lin, J., Keogh, E., Lonardi, S., and Chiu, B. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 2–11. ACM, 2003.
 Liu et al. (2019) Liu, C.L., Hsaio, W.H., and Tu, Y.C. Time series classification with multivariate convolutional neural network. IEEE Transactions on Industrial Electronics, 66(6):4788–4797, 2019.
 Lockhart et al. (2011) Lockhart, J. W., Weiss, G. M., Xue, J. C., Gallagher, S. T., Grosner, A. B., and Pulickal, T. T. Design considerations for the wisdm smart phonebased sensor mining architecture. In Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data, pp. 25–33. ACM, 2011.
 Maknickienė et al. (2011) Maknickienė, N., Rutkauskas, A. V., and Maknickas, A. Investigation of financial market prediction by recurrent neural network. Innovative Technologies for Science, Business and Education, 2(11):3–8, 2011.
 Orozco et al. (2018) Orozco, B. P., Abbati, G., and Roberts, S. Mordred: Memorybased ordinal regression deep neural networks for time series forecasting. arXiv preprint arXiv:1803.09704, 2018.
 PérezD’Arpino & Shah (2015) PérezD’Arpino, C. and Shah, J. A. Fast target prediction of human reaching motion for cooperative humanrobot manipulation tasks using time series classification. In 2015 IEEE international conference on robotics and automation (ICRA), pp. 6175–6182. IEEE, 2015.
 Ravi et al. (2017) Ravi, D., Wong, C., Lo, B., and Yang, G.Z. A deep learning approach to onnode sensor data analytics for mobile or wearable devices. IEEE journal of biomedical and health informatics, 21(1):56–64, 2017.
 Saad & Mansinghka (2017) Saad, F. A. and Mansinghka, V. K. Temporallyreweighted chinese restaurant process mixtures for clustering, imputing, and forecasting multivariate time series. arXiv preprint arXiv:1710.06900, 2017.

Toshev & Szegedy (2014)
Toshev, A. and Szegedy, C.
Deeppose: Human pose estimation via deep neural networks.
InProceedings of the IEEE conference on computer vision and pattern recognition
, pp. 1653–1660, 2014.  Van Den Oord et al. (2016) Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. W., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. SSW, 125, 2016.
 Yazdanbakhsh & Dick (2017) Yazdanbakhsh, O. and Dick, S. Forecasting of multivariate time series via complex fuzzy logic. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(8):2160–2171, 2017.
 Yeo (2017) Yeo, K. Modelfree prediction of noisy chaotic time series by deep learning. arXiv preprint arXiv:1710.01693, 2017.

Young et al. (2018)
Young, T., Hazarika, D., Poria, S., and Cambria, E.
Recent trends in deep learning based natural language processing.
ieee Computational intelligenCe magazine, 13(3):55–75, 2018.  Zhang (2003) Zhang, G. P. Time series forecasting using a hybrid arima and neural network model. Neurocomputing, 50:159–175, 2003.
 Zhang et al. (2018) Zhang, Z., Zhang, and Khelifi. Multivariate Time Series Analysis in Climate and Environmental Research. Springer, 2018.
 Zheng et al. (2016) Zheng, Y., Liu, Q., Chen, E., Ge, Y., and Zhao, J. L. Exploiting multichannels deep convolutional neural networks for multivariate time series classification. Frontiers of Computer Science, 10(1):96–112, 2016.
Comments
There are no comments yet.