1 Introduction
Over the past few decades, with the everaccelerated development of machine learning and deep learning, many previous bottleneck problems in finance, healthcare, media, and other application fields could be transformed into time series classification (TSC) problems and then be solved with the help of advanced deep learning tools, such disease diagnosis from time series of physiological parameters, classifying heart arrhythmias from ECG signals
[1], and human activity recognition [2]. Among them, Deep Neural Networks (DNNs) such as Long Short Term Memory Recurrent Neural Networks (LSTMRNNs)
[3]and 1dimensional convolution neural networks (CNNs)
[4, 5, 6] have achieved stateoftheart results. However, under circumstances where access to a large labeled training data set is not available, which is always the case with time series data, these fancy DNNs overfit terribly [7, 8]. For example, even though Convolutional Neural Networks (CNNs) could have achieved impressive model performance figures when combined with the Dynamic Time Warping (DTW) algorithm (1NNDTW)[9], it suffers from the problem of overfitting once applied in the Time Series Classification (TSC) tasks, and this would become worse when there are not enough samples or when patterns of the data are timevariant [10]. In short, due to the difficulty in collecting and annotating timeseries data, DNNs could hardly be applied to smallscale time series data sets [11].Therefore, studies that focus on solving time series classification (TSC) with other techniques have appeared in the past decade. Although many metrics are proposed in the previous works (e.g., Dynamic Time Wrapping (DTW) [12], edit distance [13], elastic distance[14], they concentrate only on singleview [15] or univariate time series (u.t.s.) classification tasks [16, 17] instead of those on multiview time series. Moreover, since these traditional methods largely count on tremendous sample size and labels, they tend to receive poor performance especially in model efficiency and accuracy.
Thanks to the increasing number of various sensors, information of the same object could be collected from multiple perspectives. Such mutually enriched and supported information could largely help machine learning tasks by offering higher quality, more diverse information, and thus increase model performance[18]. Compared with traditional singleview methods, multiview learning yields better results and has received an increasing amount of attention over the past few years [19, 20]. Recent popular multiview learning methods include collaborative training, multicore learning and subspace learning [19]. Hence, if applicable, multiview data are usually preferred over singleview data. However, in Time Series Classification (TSC) tasks, either existing methods for time series data classification only focus on singleview data and the benefits of mutualsupport multiple views are not taken into account, or existing multiview learning methods cannot be appropriately applied to multiview time series, because many of the unique properties of multiview time series are ignored.
On the other hand, transfer learning received quite an amount of research attention recently. It is a research problem that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. In other words, transfer learning methods could learn from a source task which has enough labeled data collections [21]. The method yields rather satisfying model performance when used in computer vision, [22], [23], social media analytics [24]–[25]
[26]. However, not so much work has examined its performance in timeseries classification problems.In light of all these challenges, we proposed a novel approach to deal with classification tasks on multiview time series data sets through transfer learning. Overall, our contributions are as follows:

We proposed a dynamic interview importance measurement to capture the correlation based on different views more robustly, which can enhance interpretability when combined with knowledge transfer.

We combined cuttingedge density estimation techniques with classical univariate and multivariate time series distance measurements. Density estimation tools are used to approximate the posterior distribution based on similarity features captured by time series distance algorithms.

We proposed a concept of adaptive transfer degree based on multiview time series data, which will be sampled from approximated posterior distribution. During the training process in source domains and views, the proposed transfer degree can control the degree of knowledge transfer.

We validated our framework’s performance on some widely used time series classification models and experimented results on several open data sets to show that our proposed method truly generalizes, and could significantly improve the classification accuracy.
2 Related Work
Since deep learning models exhibited impressive performance in many application fields during past decades, it has been applied to the TSC problems. A three layers fully connected Convolutional Neural Network with an average pooling layer designed for timeseries classification has been introduced in [10]. Fawaz et al. [3] proposed new data augmentation techniques in order to avoid overfitting. In [27]
, the authors modified the cost function to enable the FCN model to be more sensitive with skewed time series data sets. In forecasting time series of spatial process, a dynamical spatiotemporal model was proposed in
[28]. For limited time series data sets, like Electronic Health Records (EHRs) sequential data, Che et al. [29] recently trained a generative adversarial network with CNN to generate satisfactory risk prediction. Despite all these applications of deep learning in time series data sets across various domain fields, obstacles to further apply DNNs to other data resources still exist. After all, the fundamental challenge is still the availability of big data across different domains [30].On the other hand, transfer learning becomes heated research topic these days, and has also been applied to time series data mining tasks. In this case, a model is simultaneously learned both on the source and the target domains to minimize effects of crossdomain discrepancy within the learned model [31, 32]
In [33], in time series anomaly detection, the authors proposed a transfer learning approach with a 1NN DTW classifier to select time series transferred from the source to the target data set.
Apart from time series classification tasks, transfer learning method was found in time series forecasting[34], where the authors utilized a model trained on the historical windspeed data of an old farm to predict wind speed in another farm. Also, similar techniques appeared in time series recognition tasks: a model was trained under similar conditions for acoustic phoneme recognition before it was applied to posttraumatic stress disorder diagnosis [35].
Meanwhile, time series itself has been redefined, as sensor development enables improvements in model performance by taking into consideration multiview time series data sets, which have received wide attention across many application domains during recent years. Because of their nice property that each of those multiple views provides support to each other and thus could enrich information on the object, it has been applied to many machine learning problems, such as clustering [36, 37], classification [38, 39] on deep multiview representation learning.
3 Problem Definition
In this section, we presented formal definitions for time series classification problem and multiview time series classification problem. Let’s start with the simplest definition of univariate time series.
A univariate time series is an ordered set of real values. The length of is equal to the number of real values
We then define the multivariate time series.
A multivariate time series is an ordered sequence of
dimensional vectors, in which
is the observation at the th timestamp, and is the length of time series.Since the increasing number of various sensors, information of the same object could be collected from multiple perspectives. Here, we give a formal definition of multiview multivariate time series.
MultiView Multivariate Time Series A multiview m.t.s is a set of time series data collected from multiple views, where denotes the time series observed in the th view, is the number of measurements of time series, is the length of time series, and is the total number of views.
Next we introduce the formal definition of Transfer Learning Based MultiView Multivariate Time Series Classification.
Transfer Learning Based MultiView Multivariate Time Series Classification. Let denote a set of multiview .t.s., where is the number of in each view, and denote a set of class labels shared by views. Specifically, we define the last view is the target view and views from all can be considered as source views. The task of transfer learning based multiview m.t.s. classification is first to learn a classier from source views and then transfer the learned knowledge and features (the netwrok’s weights) to the target view and task.
For any set we denote set with out the th element as
After reviewing all definitions above, it is natural to come up with the intuition that the framework of multiview multivariate TSC may consist pieces of solutions to both multivariate TSC and univariate TSC tasks. Here we introduce another that we would applied in the following section.
The technique of Importance Sampling (IS) can be used to improve the sampling efficiency. The basic idea of the importance sampling is quite straightforward: instead of sampling from the nominal distribution, we draw samples from an alternative distribution from which an appropriate weight is assigned to each sample. Most recent works focus on its application on stochastic gradient descent. Now we apply this perspective to different views, we construct a dynamic interview importance metric to measure each source view’s importance contributing to target view.
Dynamic InterView Importance Score is a kind of metric that indicates view ’s importance to view via sampled from the alternative distribution (for this task this is the approximated posterior distribution of density in latent space)
The notion of dynamic interview importance is similar to viewlevel similarity, to some degree, but with more uncertainty.
4 Adaptive Transfer Learning of MultiView Time Series Classification
In this section we presented our novel approach. We first gave a brief introduction of our framework and then elaborated it in details later.
Since interview importance is hard to define on multiview time series datasets, we proposed this dynamic measurement, which is constructed by the following steps. First we compute similarities between corresponding multivariate time series in different views. After carrying out the all the pairwise similarities, we put those similarities into a latent space and apply density estimation methods such as kernel density estimation or normalizing flow to approximate the posterior similarity distributions. Then, we look at it from a perspective of importance sampling, where this sampled value from interview importance approximated posterior distribution could be viewed as importance value for the target view, which would control the knowledge transfer degree of each source view in the pretraining process.
4.1 Computation of InterView Importance Value
To implement transfer learning on multiview time series data, we need to capture interview importance. In this part, we list various measurements. Among them, we chose Dynamic Time Wrapping (DTW) and Bag of SFA Symbols (BOSS) to calibrate the interview importance, as these two measure show great performance in univariate time series cases. We speculate the performance to be consistent.
Complete procedures to carry out interview importance value are as follows.
Let be the source view , and target view be . The order from to in both source and target view corresponds to each other. In other words, for the term, we have both views and .
Notice that and are multivariate time series, therefore when calculating similarities we should decompose multivariate time series and into univariate time series and , where denotes the dimension of multivariate time series in both views.
We then compute the corresponding decomposed pairwise univariate time series distance between set and
(1) 
Here, we list several widelyused time series distance measurements, but remember we chose Dynamic Time Wrapping (DTW) and Bag of SFA Symbols (BOSS) for in this paper.

Dynamic Time Wrapping (DTW): could yield better performance when the lengths differ. Dynamic time warping distance is given as
(2) where represents the set of all possible sequences of
pairs. Under most circumstances, shapedbased approaches give better results on smallscale time series data sets with much less noises and outliers.

Bag of SFA Symbols (BOSS):
BOSS uses windows to form words over series. BOSS uses a truncated Discrete Fourier Transform (DFT) instead of PAA
[22] on each window, and the truncated series is discretized through a technique called Multiple Coefficient Binning (MCB). Then it windows each series to form word distribution through the application of DFT and discretization by MCB.
Every decomposed univariate time series distance between source and target view is a good measure of importance value, and all of these measurements are later transformed to dimensions, adjusted by length of multivariate time series and the number of time series views.
(3) 
We carry out all above pairwise distance to , and put all values into this observation set in latent space. Now, after mapping all elements into a latent space , we are ready to construct a approximated posterior distribution of importance values.
4.2 Density Estimation
In this part, we approximated a posterior distribution to describe the importance relationships between source views and target view. Below, we separately elaborated on how to deal with highdimensional and lowdimensional scenarios.

In a highdimensional scenario, a normalizing flow model is constructed as an invertible transformation which maps observed data points in latent space by a standard Gaussian latent variable
as what is like in nonlinear Independent Component Analysis. Stacking individual simple invertible transformations is the key idea in designing a flow model. Explicitly,
is made up of a series of invertible flow , with each having a tractable Jacobian determinant. This way, sampling is efficient, as can be performed by computing for . So is the training process by maximum likelihood. Because the model density is easy to compute and differentiate with respect to the flows [43](4) In computing the Jacobian determinant, we set a threshold for adding stochastic perturbation to balance the computational complexity and precision, in case the matrices are singular. The trained flow model can be considered as maximizing a posterior estimation for interview importance value in latent space.

In lowdimensional scenarios, kernel density estimation is a good fit. The univariate kernel density estimator for a continuous variable based on a sample at the evaluation point can be expressed as
(5) where is the kernel function, a symmetric weightion, and is the smoothing parameter or bandwidth.
For multivariate kernel density estimation, let be a sample of dvariate random vectors drawn from a common distribution generated by density function ƒ. The kernel density estimate is
(6) where is the bandwidth (or smoothing), matrix which is symmetric and positive definite, and , the kernel function, can be further written with respect to
(7) which is plainly a symmetric multivariate density. For simplicity purposes, we directly choose the standard multivariate normal kernel,
(8)
4.3 Importance Value Sampling and Matrix Norm Computation
After constructing a posterior distribution, we could conduct importance value sampling now. We start by sampling importance values in minibatches. Let denotes the batch with size ,
denotes the standard Gaussian distribution, and we have
(9) 
Then, let go through the trained normalizing flow . If lowdimensional scenario, we draw samples from approximated kernel distribution. In this way, we can acquire dynamic interview importance values in new batches on approximated distribution. Put all the sampled vectors into a matrix, we have
(10) 
If we unfold each , we have
(11) 
Here we compute the norm per matrix. The elements of these samplecomposed matrices contain the information of importance value in every dimension, which can be accumulated and computed via matrix norm.
Then we compute the matrix norm of by
(12) 
Finally, we arrive at the output as the desired probablistic representation of the interview importance values between source view and target view , which describes the degree of knowledge transfer in the pretraining process. We will elaborate how we use these dynamic importance values to control the the degree of knowledge transfer in the experiment section.
4.4 Model Architecture
We selected one dimensional Fully Convolutional Neural Network[10] (FCN) and Long ShortTerm Memory (LSTM)[3] network model to construct our adaptive transfer learning framework. The reason behind our choice is these networks’ robustness as they have already achieved stateoftheart results on several data sets from the UCR archive and UEA repository. However, please note that our adaptive transfer learning framework is totally independent of the chosen neural networks.
LSTM ( 256) 

DenseLayer (classes) 
softmax 
Conv1D (length = or ) 128 

BN+relu+Dropout(0.2) 
Conv1D (length = or ) 256 
BN+relu+Dropout(0.2) 
Conv1D (length = or ) 128 
BN+relu+Dropout(0.2) 
DenseLayer(classes) 
softmax 
DenseLayer(128) 

relu 
DenseLayer (128) 
relu 
DenseLayer (classes) 
softmax 
The structure of Multilayer Perceptron (MLP).
Daily and Sports Activity  Movement  SelfRegulation of SCPs  

FCN  0.76842  0.61538  0.70307 
DTWFCN  0.79342  0.68269  0.78840 
BOSSFCN  0.78815  0.684262  0.79522 
LSTM  0.41711  0.53846  0.45051 
DTWLSTM  0.48378  0.58653  0.60409 
BOSSLSTM  0.43509  0.58854  0.59727 
MLP  0.94123  0.55769  0.70307 
DTWMLP  0.94737  0.58654  0.76451 
BOSSMLP  0.94693  0.58121  0.75768 
5 Experiment Result
5.1 Data Set

UCI Daily and Sports Activity Data Set [44] contains motion sensor data of 19 daily and sports activities, each of which is performed by 8 subjects within 5 minutes. In particular, the subjects were asked to perform these activities in there own styles without any restrictions. As a result, the time series samples for each activity have considerable intersubject variations in terms of speed and amplitude, which makes it extremely difficult to reach accurate classification results. During the data collection, nine sensors were put onto torso, right arm, left arm, right leg, and left leg these five units. The 5minute time series collected from each subject is divided into 5second segments. For each activity, the total number of segments is 480, and each segment is considered as a multivariate time series sample of size 45 × 125.

Indoor User Movement Prediction from RSS Data Set represents a reallife benchmark in the area of Ambient Assisted Living applications. The binary classification task consists in predicting the pattern of user movements in realworld office environments from timeseries generated by a Wireless Sensor Network (WSN). Input data contains temporal streams of radio signal strength (RSS) measured between the nodes of a WSN, comprising 5 sensors: 4 anchors deployed in the environment and 1 mote worn by the user. Data has been collected. In the provided data set, the RSS signals have been rescaled to the interval [1,1], singly on the set of traces collected from each anchor. Target data consists in a class label indicating whether the user’s trajectory will lead to a change in the spatial context (i.e. a room change) or not.

SelfRegulation of SCPs Data Set was taken from a healthy subject. The subject was asked to move a cursor up and down on a computer screen. During the recording, the subject received visual feedback of his slow cortical potentials. Cortical positivity lead to a downward movement of the cursor on the screen. Cortical negativity lead to an upward movement of the cursor. Each trial lasted 6s. During every trial, the task was visually presented by a highlighted goal at either the top or bottom of the screen to indicate negativity or positivity from second 0.5 until the end of the trial. The visual feedback was presented from second 2 to second 5.5. Only this 3.5 second interval of every trial is provided for training and testing and 896 samples per channel for every trial.
5.2 Experiment Setup

For UCI Daily and Sports Activity data sets, we regard information from sensors on different parts the body as different views and thus this data set gives 5 views. Then, we randomly pick 4 out of the 5 as source views and the rest as target view. Next, we set 6 out of 8 subjects as training set and the other 2 subjects as testing set. Based on the fact that multivariate time series in different views are all 9dimensional, we select the highdimensional solution (Normalizing Flow) for latent space density estimation after computing interview importance. After acquiring the
for 4 source views, we consider these 4 value as the importance score from corresponding source view to target view and control the proportion of pretraining. We set 200 epochs for all these 4 source views, and each takes up a proportion
of their corresponding importance score. For loss function, we choose categorical entropy.
(13) where denotes total epoch for pretraining and denotes the assigned number of epoch for a specific source view .

Indoor User Movement Prediction from RSS Data Set contains 4 sensors. By considering them as 4 different views, we get 4 views, but with different timestamp (in this case average is 42).

Pad the shorter sequences with zeros to make the length of all the series equal.

Find maximum time series length and pad the rest of the shorter sequences with lastrow values.

Identify minimum time series length of each data set and truncate all the other series to that length. However, this leads to a huge information loss.

Calculate average series lengths, truncate all longerthanaverage series, and pad all shorterthanaverage series.
After these preprocessing procedures, we randomly select 3 out of 4 views as source views and the rest as target view. By applying model described above, we can calculate to get a desired probablistic representation of corresponding importance scores. We set 120 epochs for pretraining. The detailed algorithm is the same with UCI Daily and Sports Activity data sets above.


For SelfRegulation of SCPs data set, we take time series from 6 channels as a 6view time series data set. We randomly pick 5 out of 6 views as source views and the rest as the target view, and then apply proposed model to get . With this probablistic representation, we finally get each one’s corresponding importance score. We set 100 epochs for pretraining. The detailed algorithm for assigning weight during pretraining is the same with UCI Daily and Sports Activity data sets above.
For all the network training, we normally applied batch size as and optimizer as AdaM. Due to the error and randomness, we repeat every experiment with 5 times and compute average classification accuracy.
5.3 Result Analysis
We run 3 baseline models (Long ShortTerm Memory Recurrent Neural Network (LSTMRNN), Fully Convolutional Network (FCN) and MultiLayer Perceptron (MLP)) and 6 adaptive transfer learning frameworks (Dynamic Time Wraping (DTW)LSTM, Bag Of SFA Symbols (BOSS)LSTM, DTWFCN, BOSSFCN, DTWMLp and BOSSMLP) on these 3 data sets. Our finetuned classification accuracy results are shown in the table and related figures.
As shown in Figure.2,3 and 4, in most scenarios, our proposed approaches perform better results than baselines. Due to the pretrain process, our proposed approaches always reach a high classification accuracy at beginning, but the proposed approaches take the lead in classification accuracy all the time during network training. As listed results in table.4, proposed approaches reach a better accuracy after the training process. We provide the density estimation results of latent space of different datasets too.
6 Conclusion
In this paper, we presented an adaptive transfer learning framework on multiview multivariate time series data. We looked at the multiview time series data through a perspective of importance sampling, where we attempted to measure the importance value of a specific source view time sequence for the target view time sequence. The interview importance was carried out in the following procedures. First, we calculated decomposed corresponding pairwise univariate time series distance. Second, we updated the importance value into a latent space to calculate observation density estimation. Finally, we arrived at approximated posterior distribution. Particularly, we discussed two scenarios when input dimensions are either high or low. Later, we sampled several importance values to compute a composed matrix norm as output importance score, which also indicates the degree of knowledge transfer in the pretraining process. On average, our proposed adaptive transfer learning framework demonstrates a generally improved classification performance of to over some stateoftheart baseline models.
References
 [1] P. D. Grunwald, The Minimum Description Length Principle (Adaptive Computation and Machine Learning), The MIT Press, 2007.
 [2] F. Petitjean, G. Forestier, G. I. Webb, A. E. Nicholson, Y. Chen, and E. Keogh,Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification,IEEE International Conference on Data Mining, 2014, pp. 470–479.
 [3] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. Muller,Data augmentation using synthetic data for time series classification with deep residual networks, CoRR, vol. abs/1808.02455, 2018.
 [4] S. Das Bhattacharjee, B. V. Balantrapu, W. Tolone, and A. Talukder,Identifying extremism in social media with multiview contextaware subset optimization , 2017 IEEE International Conference on Big Data (Big Data), 2017, pp. 3638–3647.

[5]
M. Langkvist, L. Karlsson, and A. Loutfi,A review of unsupervised
feature learning and deep learning for timeseries modeling
, Pattern Recognition Letters, pp. 11–24, 2014.
 [6] S. Li, Y. Li, and Y. Fu,Multiview time series classification: A discriminative bilinear projection approach, Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ser. CIKM ’16, 2016, pp. 989–998.
 [7] Z. Cui, W. Chen, and Y. Chen,MultiScale Convolutional Neural Networks for Time Series Classification, ArXiv, 2016.
 [8] I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to Sequence Learning with Neural Networks, Neural Information Processing Systems, 2014, pp. 3104–3112.
 [9] Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista,The UCR Time Series Classification Archive, July 2015.
 [10] Z. Wang, W. Yan, and T. Oates, Time series classification from scratch with deep neural networks: A strong baseline, CoRR, vol. abs/1611.06455, 2016.
 [11] J. Cristian Borges Gamboa,Deep Learning for TimeSeries Analysis, ArXiv, 2017.
 [12] S. Seto, W. Zhang, and Y. Zhou,Multivariate time series classification using dynamic time warping template selection for human activity recognition, 2015 IEEE Symposium Series on Computational Intelligence, pp. 1399–1406, 2015.
 [13] P.F. Marteau and S. Gibet,On recursive edit distance kernels with application to time series classification,IEEE transactions on neural networks and learning systems, vol. 26, no. 6, June 2015.
 [14] J. Lines and A. Bagnall,Time series classification with ensembles of elastic distance measures, Data Min. Knowl. Discov., vol. 29, no. 3, pp. 565–592, May 2015.
 [15] E. Keogh and S. Kasetty,On the need for time series data mining benchmarks: a survey and empirical demonstration,Data Mining and Knowledge Discovery, 7(4):349–371, 2003.
 [16] Z. Xing, J. Pei, and S. Y. Philip, Early classification on time series, Knowledge and information systems, 31(1):105–127, 2012.

[17]
A. Blum and T. Mitchell, Combining labeled and unlabeled
data with cotraining
, In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 92–100. ACM, 1998.
 [18] C. Xu, D. Tao, and C. Xu, A survey on multiview learning, arXiv preprint arXiv:1304.5634, 2013.
 [19] Z. Fang and Z. Zhang, Simultaneously combining multiview multilabel learning with maximum margin classification, In Proceedings of IEEE International Conference on Data Mining, pages 864–869. IEEE, 2012.
 [20] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson,How transferable are features in deep neural networks?, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds., 2014.
 [21] G. Csurka,Domain adaptation for visual applications: A comprehensive survey, CoRR, vol. abs/1702.05374, 2017.

[22]
A. T. Sreyasee Das Bhattacharjee,
“Graph clustering for weapon discharge event detection and tracking in infrared imagery using deep features
, 2017. [Online]. Available: https://doi.org/10.1117/12.2277737  [23] S. Das Bhattacharjee, B. V. Balantrapu, W. Tolone, and A. Talukder,Identifying extremism in social media with multiview contextaware subset optimization, 2017 IEEE International Conference on Big Data (Big Data), 2017, pp. 3638–3647.
 [24] S. Das Bhattacharjee, A. Talukder, and B. V. Balantrapu,Active learning based news veracity detection with feature weighting and deepshallow fusion, 2017 IEEE International Conference on Big Data (Big Data), 2017, pp. 556–565.
 [25] S. Das Bhattacharjee, V. S. Paranjpe, and W. Tolone, Identifying malicious social media contents using multiview contextaware active learning, Future Generation Computer Systems, Elsevier, 2017.
 [26] S. Das Bhattacharjee, J. Yuan, Z. Jiaqi, and Y. Tan,Contextaware graphbased analysis for detecting anomalous activities, 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 1021–1026.
 [27] Y. Geng and X. Luo, CostSensitive Convolution based Neural Networks for Imbalanced TimeSeries Classification, ArXiv eprints, 2018.
 [28] A. Ziat, E. Delasalles, L. Denoyer, and P. Gallinari, SpatioTemporal Neural Networks for SpaceTime Series Forecasting and Relations Discovery, IEEE International Conference on Data Mining, 2017, pp. 705–714.
 [29] Z. Che, Y. Cheng, S. Zhai, Z. Sun, and Y. Liu,Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records, IEEE International Conference on Data Mining, 2017, pp. 787–792.
 [30] H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.A. Muller,Deep learning for time series classification: a review, ArXiv, 2018.
 [31] M. Baktashmotlagh, M. Faraki, T. Drummond, and M. Salzmann,Learning factorized representations for openset domain adaptation, CoRR, vol. abs/1805.12277, 2018.
 [32] M. Long and J. Wang, Learning transferable features with deep adaptation networks, CoRR, vol. abs/1502.02791, 2015.
 [33] V. Vercruyssen, W. Meert, and J. Davis,Transfer Learning for Time Series Anomaly Detection, Workshop and Tutorial on Interactive Adaptive Learning colocated with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017, pp. 27–36.
 [34] D. Zhan, S. Yi and D. Jiang, SmallScale Demographic Sequences Projection Based on Time Series Clustering and LSTMRNN. ICDM Workshops 2018.
 [35] J. Serra, S. Pascual, and A. Karatzoglou, Towards a universal neural network encoder for time series,CoRR, vol. abs/1805.03908, 2018.

[36]
Y. Guo, Convex subspace representation learning from multiview data
, Proceedings of the 27th AAAI Conference on Artificial Intelligence, volume 1, page 2, 2013.

[37]
Y. Li, F. Nie, H. Huang, and J. Huang,
Largescale multiview spectral clustering via bipartite graph
, Proceedings of the TwentyEighth AAAI Conference on Artificial Intelligence, pages 2750–2756, 2015.  [38] W. Wang, R. Arora, K. Livescu, and J. Bilmes, On deep multiview representation learning, Proceedings of the 32nd International Conference on Machine Learning, pages 1083–1092, 2015.
 [39] M. Kan, S. Shan, H. Zhang, S. Lao, and X. Chen, Multiview discriminant analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1):188–194, 2016.

[40]
P.Y. Zhou and K. C. Chan,
A feature extraction method for multivariate time series classification using temporal patterns
, Advances in Knowledge Discovery and Data Mining, pages 409–421. Springer, 2015.  [41] H. Hayashi, T. Shibanoki, K. Shima, Y. Kurita, and T. Tsuji,A recurrent probabilistic neural network with dimensionality reduction based on timeseries discriminant component analysis, IEEE Transactions on Neural Networks and Learning Systems, 26(12):3021–3033, 2015.
 [42] Y. Zheng, Q. Liu, E. Chen, J. L. Zhao, L. He, and G. Lv, Convolutional nonlinear neighbourhood components analysis for time series classification, Advances in Knowledge Discovery and Data Mining, pages 534–546. Springer, 2015.
 [43] S. Yi, D. Zhan, Z. Geng, W. Zhang and C. Xu, FISGAN: GAN with Flowbased Importance Sampling, arXiv, preprint, arXiv:1910.02519.
 [44] M. Lichman. UCI machine learning repository, 2013 .
Comments
There are no comments yet.