1 Introduction.
Time series data are ubiquitous. Some examples include stock prices [KAZEM2013947], currency exchange rates, sales data [CHEN20107696], biomedical measurements [Wacker2016TimefrequencyTI], astronomical data [Rebbapragada2009]and weather data collected over time [CHEN20112856].
However, for many applications only small labeled datasets are available
for training machine learning methods and this often results in low performance of the task in hand.
To solve the small amount of data available in the training set, the simplest solution is to collect more labelled data, but often this task is unfeasible or too expensive. Another solution is to perform data augmentation. Data augmentation is the technique of creating synthetic data to augment the size of a dataset.
In the case of classification, training on augmented datasets can increase the performance of the classifier. In fact, it is wellknown that too small of training dataset can cause overfitting [10.1007/9783319080109_33] and that the overfitting decreases with the increase of the size of the dataset [Nonnemaker] [Brain].
While many dataset augmentation techniques exist for image data (for instance, images can be flipped, translated or rotated [Wang2017]), these methods do not generalize well to time series. For time seres data a simple visual comparison cannot confirm if the transformation change the nature of the time series, instead it could be done easily for an image. This is the main reason why data augmentation for time series classification has been limited to mainly two relatively simple techniques: time slicing and time warping [LeGuennec].
Time slicing is a method inspired by computer vision communities, which consists in cropping slices from time series and performing classification at the slice level. This method has been introduced for time series in
[Cui]. Time slicing can be less effective, because cutting the time series tends to remove temporal correlation in the data.Time warping is a timeseries specific method consisting of warping a randomly selected slice of a time series by stretching it, i.e., speeding it up or slowing it down [LeGuennec]. This method, in theory, should not alter the distribution of the data significantly. Its main problem is that it does not generalize well, and in some cases (such as astronomy), the time scale has significant physical meaning. As a result, the time warped data may have a very different interpretation.
These two methods however work properly only over regularly sampled time series. In real settings however, irregular time sampling is a critical problem in data analysis. Irregular sampling can occur because of several issues, such as scheduling patterns, technical faults in sensing devices, imprecision of the sensors and timing devices, or human errors.
The result is a time series where the data points’ position in time is irregular. The irregular time sampling can result in substantial uncertainty about the values of the underlying time series, thus making it more difficult to mimic the time series in a realistic manner. Furthermore, irregularity makes the data difficult to deal with using standard classification methods that assume fixeddimensional feature spaces, also because it prevents the application of basic data augmentation approaches.
In this paper we propose TimeConditional Generative Adversarial Network (TCGAN), a method aiming at generating new irregularlysampled time series, with the objective of augmenting unbalanced data sets in time series classification problems.
Given a dataset of irregularly sampled time series, our goal is to generate new time series which mimic the ones in the dataset in a realistic way. Note we neither generate missing data points in the time series, nor regularize irregular time stamps. We generate new irregularlysampled time series instead. To obtain this, we implement a timeaware conditional generative adversarial network (TCGAN). Our method works by conditioning the generator and the discriminator with the timestamps. The goal of TCGAN is to discover the latent space of the time series in order to mimic the time series dynamics. We aim at covering a realistic problem setting, and therefore we assume that the time series are noisy.
We evaluate our model in two different scenarios: with synthetic datasets and with three real world datasets with unbalanced classes.
In the synthetic scenario we compare the performance of a classifier trained with data generated by TCGAN against the performance of the same classifier trained on the original data. The classifier is implemented with a simple convolutional neural network (CNN) and the test is always run on the original data. Results show that TCGAN based training enables good results of the classifier, even with very short time series and small training sets.
In the real world experiment, we consider an unbalancedclass classification problem and we use the TCGAN to generate time series in the class which features the smaller training set, so as to move to a perfectly balanced setting. Over the real dataset classification problems, we also compare our method with stateoftheart techniques of data augmentation for time series, such as time slicing and time warping.
Results show that our method always performs better than the other approaches, both in case of regularly sampled and irregularly sampled time series. We achieve particularly good performance in case of small training set and short noisy irregularlysampled time series.
2 Related work
Time series generation is a specific application problem of the broad field of sequential data generation, where a sequence is dictated by a temporal variable. Sequence generation may be applied to continuous or discrete elements. In this section, we discuss the related work for data augmentation techniques for sequential data, and in particular data augmentation for time series.
2.1 Discrete Sequential Token Generation
The major interest in sequential data generation has been on discrete useful tokens in fields like NLP, where the challenge is to generate appropriate sequences of words. For example, Yu et Al. [Yu2016]
proposed a GANbased approach for natural language processing to generate sequences of discrete tokens using a GAN trained by a reinforcement learning approach. Recently, conditional GAN architectures have also been used in NLP, including translation
[DBLP:journals/corr/YangCWX17] and dialogue generation conditioned on a particular sentiment [li2017adversarial]. These methods aim to infer the next value of the sequential series, but they don’t prove the capacity on generating new data in order to augment the dataset.2.2 Temporal Data Generation using Generative Adversarial Networks
In 2018 Hyland et al. [20.500.11850/236194]
proposed a GAN based on a Recurrent Neural Network both for the generator and the discriminator, in order to produce realistic realvalued multidimensional time series for medical data. In this work they introduced the
train on synthetic, test on real methodology to test the quality of the generated data.Mogren et al [Mogren2016] proposed a solution to generate continuousvalued sequences that aims to produce polyphonic music using a GAN with LSTM generator and discriminator. In this work they succeeded in producing data which are realistic, but they did not consider cases with irregular time sampling and noisy signal.
More recently, other works aimed to generate new data from data sources with missing observations. In the recent work of Yoon et al.[pmlrv80yoon18a]
, they proposed a model to reconstruct missing data where the generator (G) received as input the real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The main differences between the model proposed by Yoon et Al. and our approach are two: the aim of this research is not to create new data from data with noise but to reconstruct missing points; their approach is not time series dedicated.
2.3 Data Augmentation
Data augmentation is a data generation strategy typically used for supervised problems in machine learning, with the objective of producing relevant data points for improving the learning of ML solutions (for instance, classifiers). The most commonly used data augmentation method for time series is the time slicing window technique, originally introduced for deep CNNs in [Cui2016MultiScaleCN]. This method takes inspiration from computer vision [Zhang2016] and when used for images it can guarantee, at some levels of cropping, that an image divided in slices maintains the same information as the original image. The method does not give the same guarantees for time series data, because it is not obvious that the discriminator information is maintained when a region of the time series is cropped. Nevertheless, this method was used in several time series classification problems, such as in [KVAMME2018207], where they used CNNs to improve mortgage delinquency prediction with using customers historical transactional data, and in [Lines2015]
where it was used to improve the accuracy of a Support Vector Machines classifier for electroencephalographic time series data.
It is important to note that with the time slicing technique, the model classifies each subsequence alone and then finally classifies the whole time series using a majority voting approach over the set of subsequences. This can cause the loss of important information about the time series data distribution. Contrarily, the method that we propose in this paper does not crop time series into shorter subsequences, using the discriminator properties from the whole time series.
Other techniques for augmenting time series data have been proposed in literature, such as jittering, scaling, warping and permutation. For example the authors in [Um] created an innovative data augmentation method for wearable sensor time series data to classify the motor state of Parkinson’s disease patients. In [LeGuennec], the authors propose a method for data augmentation that is a mixture between time slicing and time warping, using the time warping technique to create new data and time slicing to create time series of the same length. This method was used to improve the classification of their deep CNN for time series classification. Recently Fawaz et al.[fawaz] used dynamic time warping to augment time series dataset in order to increase the classification performance of a deep residual network. This work shows how data augmentation can drastically improve the classification accuracy.
3 Technical Background
In this section we introduce the technical background on Conditional Generative Adversarial Networks (GANs) that we use in the rest of the paper.
GANs were introduced by I. Goodfellow [NIPS2014_5423] as a model to train a generative model. GAN model consists of two models which play a twoplayer minmax game:

a generative model, G, that has the goal of capturing the data distribution;

a discriminative model, D, that has the task of identifying if a sample comes from the training data or from G.
The generative model G has to learn a distribution over data , by building a mapping function from a prior noise distribution to the data space, , where
are the parameters of the model, e.g. the multilayer perceptrons weights implementing
.The discriminator
instead is a second multilayer perceptron implementing a binary classifier, which outputs a single scalar representing the probability that
came form training data rather than .The two models are trained together to play the following twoplayer minmax game:
(1) 
This model can be extended to a conditional model [DBLP:journals/corr/MirzaO14] if both the generator and the discriminator are conditioned on some extra information . The conditioning is performed by feeding into both the discriminator and the generator as additional input:

in the generator, the prior input noise and
are combined in a joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed;

in the discriminator, and are presented as inputs and to a discriminative function.
The objective function of the twoplayer minimax game is now:
(2) 
4 TCGAN Model
In this section, we propose a model based on CGAN framework to solve the problem addressed in the introduction. The model that we are proposing is a Conditional Generative Adversarial Network Model (see Figure 1) composed by two CNNs, one for the generative part (G) and one for the discriminative part (D).
We define now the input and output space of our model, each with an associated probability distribution:

is a noise space used to seed the generative model. Samples of are sampled from a noise distribution . In our experiments is a simple Gaussian noise distribution with mean equals to
and standard deviation equals to
. 
is the space of timestamps used to condition the generative and the discriminative model.

is the data space which represents a time series output from the generator or input to the discriminator. Values are the data of the time series. Using the time series in the training data and their associated conditional data, we can define a density model . This is exactly the density model we wish to replicate with the overall model in this paper.
We can define in this way the objective function of our model as:
(3) 
where is a sorted vector of timestamps sampled at random from . Notice that the model is also able to generate new time series corresponding to timestamps not present in the training set.
4.1 Generative Network
We can define the generative network as a function , which has as input data and and outputs a time series
. The generative network G is a CNN, which takes as input noise and timestamps and outputs the value of the time series for the given timestamp. This transformation is done through four convolutional transpose (or deconvolution) layers with ReLUs activation functions and batch normalization at each layer except for the last one.
The generative network adjusts its parameter to minimize , where is the noise vector and is the timestamp vector.
4.2 Discriminative Network
The Discriminative network implements a function:
. This network takes as input real data or generated data , and their associated timestamps
, and gives as output a binary value, deciding whether the data is real or generated. It is composed by two layers of convolution, each followed by a maxpooling layer. At the end there is a fully connected layer.
The Discriminative Network adjusts its parameter to maximize , where is the time series vector and t is the timestamp vector.
5 Experiments
In this section we validate the performance of TCGAN using synthetic data and three realworld datasets. We first describe the datasets used for the experiments [bagnall16bakeoff], in Section 5.1.
In Section 5.2, we report on the results of the experiment over synthetic irregularly sampled data, by comparing the performance of a binary classifier trained over the original and the generated data to distinguish two curve types (sine and sawtooth).
In Section 5.3, we quantitatively evaluate the performance of TCGAN in improving the classification using three realworld datasets of regularly sampled data[bagnall16bakeoff] modified to create unbalanced datasets. We compare the results of data augmentation performed with TCGAN against the the ones obtained with time slicing and time warping methods for data augmentation.
In Section 5.4 we evaluate the performance of TCGAN against time slicing and time warping, with the same setting as before but considering the case of irregularly sampled datasets. In this setting, the datasets are created starting from the original three real datasets used above, by randomly removing different shares (from 10% to 40%) of points from each time series. We then applying TCGAN to generate new time series.
We run all the experiments with 10fold randomization and we report the Area Under the Receiver Operating Characteristic Curve (AUROC) as the performance metric (mean value across the repeated experiments, along with the standard deviation).
5.1 Datasets
This section describes the synthetic and real world datasets used for experiments.
5.1.1 Synthetic data
We construct as realistic input data one synthetic datasets, composed by two classes: sine waves and sawtooth waves (see Fig. 2). The parameters considered for producing the synthetic input data for the experiment are: the training set size (), i.e., the number of curves used to train the TCGAN; and the time series length (), i.e., the length of each time series.
Each sine and saw wave is constructed as follows:

We select uniformly at random points between , which are the timestamps;

We define an amplitude , where
is selected at random from a normal distribution with mean
and standard deviation ; 
We assume period , with phase shift and vertical shift ;

We add a Gaussian noise (with and ) to each point of the series.
We repeat the process times for generating all the needed curves. Figure 2 reports exemplary real and TCGANgenerated curves for the sine and sawtooth classes.
5.1.2 Real world data
The real world datasets are taken from Time Series Classification UCF [bagnall16bakeoff]. Table 1 summarizes the characteristics of the three datasets used for the experiment. Note that we use shorter time series than the original ones, and we create artificially irregular sampling by removing points at random from the original series.
DATA SETS  
Starlight  Power  ECG  
Curves  Demand  200  
Timeseries length  80  24  96 
1st training class size  20  100  27 
2nd training class size  200  200  93 
Test set class size  100  200  40 
Starlight curves: This dataset is composed by astronomical light curves (brightness of celestial objects). The dataset has phasealigned starlight curves of length [Rebbapragada2009]. The curves are divided in three classes, but we use only two classes (#2 and #3) in our experiments. In the experiments we use only the first points of every curve.
To create a dataset of unbalanced classes we use a training set of curves for class #2 and for class #3. The test set consists of curves for each class.
Power Demand: The data was derived from twelve monthly electrical power demand time series in Italy [Lonardi]. The classification task is to distinguish days from October to March (inclusive) and from April to September. The length of the time series is 24 data points. To create unbalanced classes we use a training set of time series from the first class and from the second class. The test set consists of curves for each class.
ECG200: This dataset includes time series tracing the electrical activity recorded during one heartbeat in human hearts, with 96 data points each [Olszewski:2001:GFE:935627]. The series are classified in two classes, i.e., normal heartbeat and myocardial infarction. To create unbalanced classes we pick a training set of time series from the first class and from the second class. The test set includes series for each class.
5.2 Classification performance over synthetic data
The first experiment aims at evaluating the quality of the data generated by TCGAN over the synthetic dataset described in the previous section. To verify the quality of the curves generated by TCGAN when fed with the synthetic data, we apply the method introduced in [20.500.11850/236194]. In particular, we consider a binary classifier implemented as a CNN for distinguishing curve classes (sine vs. sawtooth waves in our case). In addition, the classifier receives the information about timestamps, in order to calculate the relationship between values and timestamps. We compare the performance of the classifier trained on TCGAN generated data (see Algorithm 1) versus the same classifier trained on real data, evaluating both classifiers on real data over performance measured according to AUROC.
We expect that the performance of the classifier trained on generated data is comparable with the performance of the classifier trained on real data. If the TCGAN trained classifier does not succeed in classifying real data, it means that the generated data points are too different from the real ones, i.e. the TCGAN does not succeed in mimicking the real distribution of the curves.
We report the results on Table 2 and in Figure 3. We used and . Each experiment is repeated 10 times. The average AUROC values reported in the table show that the classifier trained on GANgenerated time series reaches comparable performance of the same classifier trained on real time series.
Training Set  Time Series Length  

Size Type  40  50  60  70  80  90  100 
40 GAN  0.9700.015  0.9660.031  0.9950.005  0.9910.011  0.9950.005  1.00.0  1.00.0 
50 GAN  0.9700.024  0.9930.004  1.00.0  0.9900.008  1.00.0  0.9930.009  1.00.0 
60 GAN  0.9880.010  0.9910.006  0.9970.003  0.9880.010  1.00.0  0.9880.015  0.9940.003 
70 GAN  0.9830.006  0.9490.055  0.9960.003  0.9920.007  0.9920.007  1.00.0  0.9890.010 
80 GAN  0.9810.012  0.9310.031  0.9710.028  0.9930.006  0.9900.003  0.9840.015  0.9810.026 
90 GAN  0.9770.0  0.9720.016  0.9610.033  0.9880.005  0.9970.002  0.9880.011  0.9970.002 
100 GAN  0.9370.027  0.9480.026  0.9550.032  0.9760.022  1.00.0  0.9900.004  0.9980.002 
40 Real  0.9960.005  0.9930.010  0.9930.006  1.00.0  1.00.0  1.00.0  1.00.0 
50 Real  1.00.0  0.9920.008  1.00.0  1.00.0  1.00.0  1.00.0  1.00.0 
60 Real  0.9950.007  0.9930.006  1.00.0  1.00.0  1.00.0  1.00.0  1.00.0 
70 Real  0.9920.008  1.00.0  0.9660.047  1.00.0  1.00.0  1.00.0  1.00.0 
80 Real  1.00.0  0.9960.003  0.9960.003  1.00.0  1.00.0  1.00.0  0.9970.002 
90 Real  0.9970.002  1.00.0  1.00.0  1.00.0  1.00.0  1.00.0  1.00.0 
100 Real  0.9950.004  0.9980.002  0.9980.002  1.00.0  1.00.0  1.00.0  0.9980.002 
5.3 Classification performance over real, regularly sampled time series
Dataset  Real data  Time Slicing  Time Warping  TCGAN 

Starlight curves  0.7127 0.1371  0.7534 0.0082  0.9840 0.0099  0.9851 0.0156 
Power Demand  0.6211 0.1762  0.7152 0.0932  0.7988 0.0836  0.8336 0.1553 
ECG200  0.7014 0.0335  0.6666 0.0836  0.7227 0.0391  0.7882 0.0122 
Data  Ratio  AUROC 

Original  0.1  0.67980.0222 
0.2  0.95740.0082  
0.3  0.97100.0037  
0.4  0.97400.0037  
0.5  0.97500.0035  
Augmented with  0.6  0.97900.0066 
TCGAN generated  0.7  0.97800.0059 
0.8  0.98000.0035  
0.9  0.98400.0037  
1.0  0.98510.0024 
We now compare TCGAN against other methods for time series data augmentation, namely time slicing and time warping, over real world datasets. We apply data augmentation in binary classification problems of time series with unbalanced datasets, and we show that our augmentation approach, applied to the class that features less data points, leads to better performance of the classifier.
Notice that in this experiment the sampling interval in the time series is regular. We perform the test also in this setting because the other methods do not work with irregular time samples. The metric chosen for this purpose is the AUROC. We use the same classification model architecture (CNN) in all cases.
Time slicing is a method inspired by computer vision communities which consists in cutting the time series in slices and performing classification at the slice level. During training, the classifier learns how to classify each slice, where the size of the slice is a parameter of the method. At test time, the classification of the time series is performed by classifying each slice taken from the time series and by applying a majority vote over all the slices to decide a predicted label. In our experiment we decide to divide each time series into only 3 slices, because we consider short time series in the first place. In this way, using time slicing, we triple the number of samples.
Time warping consists of warping a randomly selected slice of a time series by adjusting its speed, i.e. by speeding it up or slowing it down. The size of the slice and the warping ratio are parameters of this method. In this paper, we only consider warping ratios equal to or , inspired by the results of [LeGuennec].
Table 3 shows the performance of the the CNN classifier over each datasets (5.1.2): without modification; augmented with time slicing; augmented with time warping; and augmented with TCGAN. As the table shows, TCGAN achieves the best classification accuracy in all of the three datasets.
We can observe how the rebalancing influences the classification performance in Table 4. In the table and in the figure we report the AUROC when we rebalance the two classes with the data generated by the GAN in the Starlight dataset.
Dataset  Real data  Time Slicing  Time Warping  TCGAN 

Starlight Curves  0.6798 0.0222  0.5200 0.0041  0.9508 0.0041  0.9750 0.0040 
Power Demand  0.5011 0.0042  0.5020 0.1240  0.5322 0.0053  0.6999 0.0356 
ECG200  0.5724 0.2410  0.5233 0.0210  0.6474 0.0341  0.7202 0.0546 
5.4 Classification performance over irregularlysampled time series
We now analyze the performance of TCGAN over irregularlysampled time series. To do so, we generate an irregular version of the real time series by randomly removing 20 of the data in each curve, thus creating irregular time sampling.
Again, we compare the performance of the CNN classifier over the datasets (described in Section 5.1.2
): without modification, augmented with time slicing, augmented with time warping, and augmented with TCGAN. For time slicing and time warping, before using the classifier, we fill the data for the missing points using interpolation, because these approaches do not support irregularity.
Table 5 describes the CNN classifier performance over the three datasets. The table reports a distinct decrease of performance for each of the three methods compared with the performance of Table 3, as expected due to the missing data points. Also in this setting TCGAN outperforms the other methods. TCGAN maintains good its performance here. Note the performance decrease is datasetdependent, and varies in particular with the complexity of the time series and its length.
5.5 Evaluating augmentation over varying percentage of missing values
Finally, we evaluate our method with respect to different imbalance ratios in the training set, still considering irregularly sampled time series. To this aim, we define an experimental setting with different training sets, built from the original one by removing 10, 20, 30, and 40 of the training points one of the two classes of the set. Figure 4 shows the classification results for the different dataset with the increasing amount of missing values. The figure shows that the impact of the augmentation of the less represented class in the data set is more and more important, when the unbalancing increases. It also shows that TCGAN outperforms time warping at all levels of unbalancing.
6 Conclusion
We proposed a generative model for augmentation of time series data with irregular sampling, called TCGAN. This novel architecture generalizes conditional GANs so as to deal with the unique characteristics of the irregularly sampled time series. Various experiments with synthetic and realworld datasets show that TCGAN significantly outperforms stateoftheart data augmentation techniques on time series, especially for what regards the classification problem, obtaining good accuracy also in case of small training set and short, noisy and irregularly sampled time series. In the classification problem over real datasets, our method performs better than any existing approaches to our knowledge in terms of classification AUROC. The proposed TCGAN approach can have significant impact, as it can be helpful in the common scenarios of time series data with irregular sampling, noise, missing points, and insufficient samples.