IB-GAN: A Unified Approach for Multivariate Time Series Classification under Class Imbalance

10/14/2021 ∙ by Grace Deng, et al. ∙ Amazon cornell university 0

Classification of large multivariate time series with strong class imbalance is an important task in real-world applications. Standard methods of class weights, oversampling, or parametric data augmentation do not always yield significant improvements for predicting minority classes of interest. Non-parametric data augmentation with Generative Adversarial Networks (GANs) offers a promising solution. We propose Imputation Balanced GAN (IB-GAN), a novel method that joins data augmentation and classification in a one-step process via an imputation-balancing approach. IB-GAN uses imputation and resampling techniques to generate higher quality samples from randomly masked vectors than from white noise, and augments classification through a class-balanced set of real and synthetic samples. Imputation hyperparameter p_miss allows for regularization of classifier variability by tuning innovations introduced via generator imputation. IB-GAN is simple to train and model-agnostic, pairing any deep learning classifier with a generator-discriminator duo and resulting in higher accuracy for under-observed classes. Empirical experiments on open-source UCR data and proprietary 90K product dataset show significant performance gains against state-of-the-art parametric and GAN baselines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Background and Motivation

Multivariate time series classification (MTSC) is a growing field where complex features are collected in the form of time-indexed sequences with varying lengths. Often a single observation unit (product, video, recommendation, etc.) can be described by multiple time series metrics with strong inter-temporal dependencies and a mixture of numeric, categorical, and semantic features. Many recent works using variants of CNNs or RNNs have shown impressive performance in MTSC tasks [fawaz2019deep, karim2017lstm, zhao2017convolutional, zheng2016exploiting, karim2019multivariate]. However, many real-world datasets consists of strong class imbalance where far fewer samples are observed from minority classes of interest. The three baseline methods for addressing imbalances are class weights, upsampling, and downsampling, each with drawbacks [buda2018systematic, Weiss2007CostSensitiveLV]. Downsampling leads to poor classifier performance when too many samples are discarded, while upsampling leads to overfitting by reusing data from the minority class. Data augmentation techniques such as SMOTE [chawla2002smote, dablain2021deepsmote]

propose creating additional synthetic samples from oversampling and linear interpolation of k-nearest neighbors, but is still a parametric model constrained by computation time and does not generalize well to high-dimensional datasets

[lusa2012evaluation]. Techniques for image augmentation [shorten2019survey] include transformations, cropping, noise injection, and random erasing [zhong2020random]; however, far fewer methods have been developed for time series or more general data types.

Large-scale non-parametric data augmentation [fawaz2018data, shorten2019survey, mariani2018bagan] with Generative Adversarial Networks (GANs) [goodfellow2014generative] poses a promising solution. Our paper presents Imputation Balanced GAN (IB-GAN), a novel imputation balancing approach to data augmentation that can pair any deep learning classifier with a generator-discriminator model to address class imbalance in MTSC.

1.1 Related Works.

Modern MTSC methods involve combinations of Convolutional Neural Networks (CNNs), Recurrent Neural Nets (RNNs), and Long Short-Term Memory (LSTM) networks, with variants such as stacked CNNs and Squeeze-and-Excitation block

[karim2019multivariate], deep CNNs [zheng2016exploiting], and attention networks [zhang2020tapnet]. These works are not designed for imbalanced data. A 2019 survey paper [fawaz2019deep] states that very little has been done for imbalanced classes in time series classification other than class weights. Fawaz et al [fawaz2018data] and Tran et al [tran2017bayesian] showed data augmentation with deep learning models improves image classification. We propose that any of these classifiers with proven track record for MTSC can be paired with GANs to augment the training process, i.e., include synthetic data for under-sampled classes.

The GAN has two competing models: the generator creates synthetic data by learning the true data distribution, and the discriminator distinguishes real and synthetic samples. The vanilla GAN [goodfellow2014generative] has uses in many generative tasks [oza2019progressive, sheng2019unsupervised]. Deep-Convolutional GANs (DC-GAN) use convolution layers [radford2015unsupervised] to improve training stability via learning feature maps and object representations. Conditional GANs [mirza2014conditional]

extends DC-GAN and conditions on auxiliary information such as class labels. Additional techniques such as batch normalization

[salimans2016improved]

, leaky-relu activation, max-pooling, and dropout all can improve GAN training. GANs specifically designed for imputation such as GAIN

[yoon2018gain], Colla-GAN [lee2019collagan], Mis-GAN [li2019misgan, luo2018multivariate], and for sequence generation [yoon2019time, esteban2017real, zhang2021missing, guo2019data] have also been introduced.

More recent works focus on generating better samples. The Balancing GAN (BAGAN) adopts an autoencoder-decoder structure to learn class conditioning in the latent space

[mariani2018bagan]. The Auxiliary Classifier GAN (AC-GAN) [odena2017conditional, gong2019twin] proposed a secondary classifier embedded in the discriminator in order to stabilize GAN training, but does not apply it directly for classification; a second-stage classifier is required. The Rumi formulation for GANs [NEURIPS2020_29405e2a] trains the generator to specifically produce samples from a desired class. The Triple-GAN [li2017triple] uses a generator to produce pseudo-samples and classifier to produce pseudo-labels, and the discriminant tries to predict whether a psuedo-pair of (sample, label) is real or synthetic. All of these methods (BAGAN, AC-GAN, etc.) were designed for image data in mind.

These GAN methods suffer from two drawbacks. First, they are specifically designed for data generation but requires a separate classification step, resulting in a two-stage training process where the generative model does not learn from classification feedback (see Figure 1). Second, these methods generate synthetic data from white or latent noise [odena2017conditional, mariani2018bagan], which risks unrealistic or repetitive samples due to mode collapse. The Imputation Balanced GAN (IB-GAN) will address these two challenges through a joint training process and a two-pronged imputation-balancing approach.

Figure 1: IB-GAN versus the two-step process for existing GAN data augmentation methods. IB-GAN directly utilizes model in triplet for classification; see Figure 2 for detailed set-up.

1.2 Contributions.

We propose the Imputation Balanced GAN (IB-GAN) for imbalanced classification with three key contributions:

  1. We present a unified one-step process (Figure 1) for joint data augmentation and classification under class imbalance, consisting of a triplet of generator , discriminator , and classifier models that is model agnostic to different neural net architectures.

  2. We adapt a novel imputation approach for generating synthetic samples from randomly masked vectors; data quality is improved via direct feedback from classification and imputation losses. A tuning parameter regulates innovations in generator imputations while preventing mode collapse common in standard GANs initialized by white noise.

  3. We balance classifier training via resampling techniques and synthetic samples for under-observed classes, with significant performance gains against state-of-the-art baselines.

Full overview of IB-GAN is outlined in Figure 2. Theoretically, a range of GAN variants (vanilla [goodfellow2014generative], conditional [mirza2014conditional], Info-GAN [chen2016infogan] etc.) and deep learning classifiers (CNN, RNN, LSTM, VGG-16) is possible for IB-GAN given the model-agnostic framework; we explore these options in Section 3.1. Elaborating on contribution (2), IB-GAN generator initializes data generation with randomly masked data vectors using the MCAR missing mechanism [mealli2015clarifying]

, which leverages existing information to produce better quality samples. Resulting imputations are analogous to image perturbations in computer vision applications

[zheng2016improving, poursaeed2018generative]. Empowered by these new weighted resampling and data masking steps, the final IB-GAN classifier generalizes well to hold-out sets with strong class imbalance.

2 Methods and Technical Solutions

We briefly discuss the challenge to train unbiased classifiers from highly imbalanced data. Methods such as class weights may cause numerical instability [gong2016novel] and data augmentation, e.g., via GANs, poses a better solution. We then introduce the main IB-GAN framework and objective functions.

2.1 Data Augmentation for Imbalanced Classification.

Without loss of generality, we assume all random variables involved are discrete with probability measure

. Suppose the observed data is coming from the realization of random pair: . is the class label which is from a finite class label set . Denote the prior label probability as for . is a dimensional random vector (some characteristic features for class ), where . Given , denote the conditional probability as . The task is to train a classifier that maps to or equivalently predicts the class label probability. We will use the latter definition of the task. That is, for , , where and is the predicted probability that belongs to class . We have . The classifier maximizing equal class weight negative cross-entropy loss:

(1)

is known as the Bayes classifier and can be expressed as , where . This is the optimal classifier for the data distribution . However, our focus is on minimizing overall classification error when we treat each class equally. Then under this criterion, the Bayes classifier is suboptimal due to highly unbalanced prior label probability

and is biased towards labels with high prior probability. Let

be a uniform random variable over the class label set

. The joint distribution of

is defined through letting . Then the optimal balanced class classifier is , where .

In order to derive the optimal balanced class classifier for data distribution with class imbalance, a common method is inverse class weights in the loss. The classifier that maximizes the negative cross-entropy loss with inverse class weights:

(2)

is . Solving this empirical optimization problem, e.g., with with highly unbalanced inverse weights , can cause numerical instability issues.

A better solution is in the form of data augmentation. Suppose we have additional samples for classifier training that are realizations of random pair . For example, additional training samples can be derived from resampling of existing data, transformations and rotations, or model based data augmentation. Let the prior label probability of the additional data distribution to be for and the conditional probability . We define the loss that combines the two source of data through a hyperparameter as

(3)

From Jensen’s inequality, the optimal classifier that maximizes is , where . Data augmentation can control over the prior label probability . If , and we choose any such that , then under optimal conditions where the augmented data has the same conditional distribution as the original data, , we have . Thus with suitable choices of and hyperparameter we can effectively train the balanced class classifier through combination of true and GAN-generated samples. We now introduce the IB-GAN which achieves data augmentation and balanced classification in an unified process.

2.2 Imputation Balanced GAN.

A standard GAN consists of two models [goodfellow2014generative], the generator that learns the underlying data distribution for data vector from noise distribution , and the discriminator that classifies whether the sample is from the true data distribution or a synthetic sample. For the IB-GAN (Figure 2), we introduce a new component , the classifier that predicts which class a sample belongs to, regardless of whether it is real or synthetic. Novel to prior work [odena2017conditional][tran2017bayesian][li2017triple], the IB-GAN separates the discriminator and classifier into two different models with separate element-wise imputation loss and classifier loss, corresponding to the “imputation” and “balancing” aspect of IB-GAN respectively.

Time series

Metadata

Labels

Weighted Resample

Data Masking

Generator (Imputation)

Synthetic data

Real data

Discriminator

Classifier
Figure 2: IB-GAN triple-model framework (blue) includes novel resampling and data masking steps (red) for joint data augmentation and classification.

2.2.1 Weighted Resample:

Let be a -dimensional multivariate time series with sequence length with optional metadata and class labels . For a given mini-batch size , the weighted resampling step for will first sample real data and corresponding labels from the true training data. For synthetic , we sample again (before masking) with weighted probability from each class (denote the corresponding random variable as ). Together, and have balanced samples from each class.

2.2.2 Data Masking with

: The proportion of masked values for each sample is specified by hyper-parameter . A low ensures imputed samples will be more realistic with greater similarity to original data; a higher encourages greater innovation and variety. It can be seen as a form of regularization on the generator imputation variability. At , IB-GAN is analogous to combining a Conditional GAN generating data from white noise with a classifier (Naive GAN); at , IB-GAN is equivalent to training on weighted bootstraps of the original data. Optimal choice for will likely depend on class ratio and sample size, and can be found via grid-search in practice.

Let be the component-wise indicator for whether each component of is masked (replaced with random white noise) or keeps the real value. has same dimension as the data and the randomly masked data vectors .

2.2.3 IB-GAN Triplet Training:

The generator takes in as input and impute synthetic samples , where are not synthetic labels but rather the true class labels for the imputations, a key difference from Triple-GAN [li2017triple]. Let the prior label probability of the synthetic data be for and the conditional probability which depends on the generator . Both the true and synthetic samples and are now inputs to the discriminator and classifier. The classifier learns on a class-balanced sample, and the discriminator attempts to recover via the element-wise imputation loss. If the generator is performing well, the discriminator should find it difficult to distinguish between true and imputed values.

We then jointly optimize the triplet through combined losses and hyper-parameter (Eq. 4), where we weigh the augmented samples during training of the classifier through a weight that depends on the current discriminator , that is, . Through these weights, the effect of augmented sample kicks in smoothly during the training of the classifier.

(4)
(5)
(6)

During initial epochs, the generator will impute lower-quality synthetic samples and the discriminator easily identifies fake vs. real components in

. Hence, the weights are approximately zero and the classifier is mainly updated through real samples. When the discriminator is close to the optimal in the original GAN setting where . Then, , and the augmented samples contribute to IB-GAN classifier equivalently to real samples.

3 Empirical Evaluation

3.1 UCR MTS Classification.

We first apply IB-GAN to open-source multivariate time-series datasets from the popular UCR archive [dau2019ucr]. CharacterTrajectories is a 20-class MTSC dataset with  3000 samples and Epilepsy is a smaller 4-class MTSC dataset with  300 samples. Data imbalance is introduced by randomly dropping 75% of half of the classes.

Two classifier architectures are compared to showcase the flexibility of IB-GAN framework: a simple CNN with 2 Conv-1D layers and a state-of-the-art VGG-like network (3 VGG blocks with Conv2D and max pooling layers). Each classifier choice is paired with 3 GAN architectures: conditional GAN, vanilla GAN, and InfoGAN. Standard baselines are classifier only, class weights, upsampling, and downsampling. State-of-the-art GAN baselines are AC-GAN, BAGAN, SMOTE, and Naive GAN (equivalent to generating data from white noise via Conditional GAN). Filter size for convolution layers is , the time series dimension; .

Findings. Each experiment is run for 5 replicates, and average classifier performance is listed in Table 1. The two metrics are Balanced Accuracy [kelleher2020fundamentals] and F1-score; both are macro-averaged across classes and penalizes for low performance in minority classes. The default IB-GAN classifier, with Conditional GAN for imputation as the generative model, outperforms against all baselines. IB-VanillaGAN and IB-InfoGAN demonstrate comparable high performance when utilizing the more powerful VGG classifier.

With both types of classifiers, SOTA GAN baselines such as AC-GAN and BAGAN did not show significant improvements compared to standard baselines, e.g., upsampling. The superior performance of the 3 IB-GAN variants compared to two-stage processes such as AC-GAN and BAGAN testifies that joint training for data augmentation and classification is a far more effective approach. In comparison, the folded classifier in AC-GAN and additional autoencoder-decoder in BAGAN did not generate better quality samples that translated to classification gains. Comparison with Naive GAN also indicates that samples generated via imputation based on randomly masked data vectors (IB-GAN) contributed to a greater performance boost than samples generated via white noise (Naive GAN). Even with a small injection of novelty with

at 10%, the IB-GAN yields significantly higher Balanced Accuracy and F1-score. Finally, the imputation-balancing approach translates to IB-GAN classification with lower variance (see error bars), and stable prediction results are important for downstream applications.

The parametric SMOTE method also has relatively low performance, especially for smaller data size; SMOTE performs poorly for Epilepsy but has comparable performance to BAGAN and AC-GAN for CharacterTrajectories. SMOTE is constrained by forming new samples via linear combinations of existing samples, while IB-GAN functions consistently across sample sizes (Section 3.2.2). Details for additional UCR experiments and Kaggle EEG time series () with LSTM classifier are reported in Table 4 and 5 of Appendix. This testifies to the wide applicability of IB-GAN to time series of different lengths and flexibility of classifier choice.

Figure 3: IB-GAN performance as increases, a form of regularization on the generator and level of novelty or innovations in synthetic samples.
Figure 4: Average Balanced Accuracy, F1-score, and PR-AUC by training sample sizes for Imputation Balanced GAN and baselines evaluated on 30k test set for trending products data.
CNN Classifier: CharacterTrajectories VGG Classifier: CharacterTrajectories
Experiment Balanced Accuracy F1-score Balanced Accuracy F1-score
IB-GAN
IB-InfoGAN
IB-VanillaGAN
Conditional (Naive) GAN
ACGAN + Second Stage Classifier
BAGAN + Second Stage Classifier
SMOTE + Second Stage Classifier
Baseline Class Weights
Baseline Upsample
Baseline Downsample
Baseline Classifier
CNN Classifier: Epilepsy VGG Classifier: Epilepsy
Experiment Balanced Accuracy F1-score Balanced Accuracy F1-score
IB-GAN
IB-InfoGAN
IB-VanillaGAN
Conditional (Naive) GAN
ACGAN + Second Stage Classifier
BAGAN + Second Stage Classifier
SMOTE + Second Stage Classifier
Baseline Class Weights
Baseline Upsample
Baseline Downsample
Baseline Classifier
Table 1: UCR Datasets - Average prediction metrics over 5 replicates for Imputation Balanced GAN and baselines with respect to standard multi-class CNN and VGG classifiers.

3.2 Predicting Trending Products.

We now apply IB-GAN to an empirical problem setting. The motivating application is predicting trending products on a large e-commerce website based on past time series metrics, which make up a minority class of all newly-launched products. This is known as the cold-start problem in search, learning-to-rank, and recommendation systems. An additional challenge is the existence of time-invariant features based on product metadata, which will correspond to in the IB-GAN set-up.

We can frame this problem as a large-scale imbalanced MTSC task with a proprietary 90K dataset of products, split into 60K training and 30K testing. Each item has a binary label which indicates it is a top-ranked item, 15 time-invariant metadata features and 8 daily time series for 3 weeks. The full feature set is the joint vector . B is different from class labels Y; it is additional auxiliary information, e.g., product characteristics. Class imbalance is strong with only 11% samples in the 1-class. Standard accuracy is an inappropriate measure; an initial baseline CNN classifier with 89% testing accuracy has a precision of 0.29 and recall of 0.01 for the 1-class.

Given this large dataset, we conduct ablation experiments for IB-GAN training: (1) How does choice of affect IB-GAN generator quality and classification accuracy? (2) How does IB-GAN perform across different training sample sizes?

3.2.1 Novelty Parameter Tuning.

Using the full product dataset, we compare IB-GAN classifier performance at different levels against benchmarks. We use a state-of-the-art VGG classifier with blocks of Conv1D, batch normalization, and max pooling layers; a similar architecture is adopted for the discriminator-generator duo, which conditions on class labels analogous to Conditional GAN. Each model is run for 100 epochs across 10 replicates. Figure 3 shows that 10% is a good default value, with an average Balanced Accuracy of 78.8%. The nearest benchmark is SMOTE with 64.1%. The three metrics are fairly consistent when is between 20-45%. As novelty in synthetic samples increase past 40-50%, classification performance declines on average. Higher dictates greater variability in generator output, which results in synthetic samples that are less similar to original data. This leads to greater variability in prediction accuracy; see error bands for F1-score and PR-AUC. See Table  2 in Appendix for results of standard and GAN baselines.

3.2.2 Training Data Size.

To understand IB-GAN performance across sample sizes, we fix and randomly sample subsets of the 60K training data. The same VGG classifier and IB-GAN set-up is used as the previous experiment. Figure 4 shows average Balanced Accuracy, F1-score, and PR-AUC over 10 replicates as evaluated on the fixed 30K test set for IB-GAN and select baselines.

The IB-GAN classifier outperforms all benchmarks with consistent Balanced Accuracy and F1-score across sample sizes up to 60000, the maximum number of training samples. As sample size increases, upsampling and SMOTE become more effective due to increased examples from the minority class. When sample sizes are small, upsampling leads to overfitting and SMOTE is unable to interpolate with variety to create meaningful new training samples. Meanwhile, GAN-based methods such as IB-GAN and even Naive GAN largely avoids this issue, and are recommended for applications with small data sizes. Error bars for IB-GAN metrics tend to be narrower than Naive GAN; the imputation mechanism and tuning parameter both regulate IB-GAN generator variability and ensure synthetic samples generated actually enables classifier to learn signals from minority classes. See Table 3 for details.

4 Significance and Impact

We have proposed a novel IB-GAN method for joint data augmentation and multivariate time series classification with highly imbalanced data in an unified one-step process. The framework consists of a triplet of generator , discriminator , and classifier models that works seamlessly with choices of GAN architectures and deep learning classifiers. Compared to prior methods, IB-GAN does not require a cumbersome second stage classifier, and directly incorporates classification loss as feedback to improve synthetic data quality. IB-GAN uses a unique imputation and balancing approach that leverages existing training data to generate higher quality synthetic samples, and enables better classification performance on under-observed classes by learning on a class-balanced training set. The hyperparameter

directly tunes the similarity vs. novelty in synthetic samples while side-stepping mode collapse from standard GAN training. Empirical results from UCR datasets show IB-GAN and its variants achieving significant performance against state-of-the-art parametric and GAN baselines. Ablation studies with a trending product dataset demonstrate IB-GAN performance is robust across sample sizes (narrower confidence intervals) and

levels up to 50% under MCAR.

The IB-GAN framework is quite versatile and can easily be extended to more complex datasets such as images, video, and text; future work can identify the proper design of generator-discriminator and classifier architecture suitable for these tasks. IB-GAN can also be applied to mitigate data bias and fairness issues in many ML applications; see Section 5 in Appendix for a short note on fairness.

References

5 Appendix: A Short Note on Fairness

The IB-GAN can potentially be extended to mitigate data bias in algorithmic and AI fairness [dwork2012fairness, corbett2017algorithmic, mehrabi2019survey]. A key source of data bias [dressel2018accuracy, olteanu2019social] is class imbalance for minority or protected classes (gender, race, age, etc.). Imbalanced classification leads to poor performance in both standard metrics and fairness metrics such as statistical parity, demographic parity or predictive parity. Prior work considered data augmentation techniques (e.g., SMOTE) to generate synthetic samples for minority classes and improve model fairness for financial credit, education, and other social policies [iosifidis2019adafair, hutt2019evaluating]. IB-GAN with the optimal generator also imputes synthetic samples with an emphasis on minority classes of interest, such that , the true data distribution. The IB-GAN classifier learns from a balanced dataset with equal representation from each class, and outperforms SMOTE and other data augmentation techniques for under-observed classes. Given datasets in this work contain no sensitive attributes, the authors consider improvements in specific fairness metrics via IB-GAN as an open problem and a future research direction.

6 Appendix: Additional Experiment Details

6.1 Predicting Trending Products.

Table 2 reports macro-averaged Balanced Accuracy (equivalent to macro-averaged Recall in scikit-learn

), F1-score, and PR-AUC with standard errors evaluated on a fixed 30K test product dataset, for 10 replicates of IB-GAN classifiers with different

levels. Table 3 reports the average Balanced Accuracy, F1-score, and PR-AUC with standard errors, also evaluated on the test dataset, for 10 replicates of IB-GAN classifiers with a fixed and various training sample sizes.

Performance for various baselines are also reported. Naive GAN is analagous to IB-GAN with 100% missingness ( = 1), where the generator is initialized with white noise. The IB-GAN generator and discriminator follow a Conditional GAN architecture where the multivariate time series metrics , metadata , and class labels are all taken as inputs. Each replicate takes about 30 minutes on a ml.m5.24xlarge instance.

Classifier Type Balanced Accuracy F1-score PR-AUC
10% Imputed IB-GAN
20% Imputed IB-GAN
30% Imputed IB-GAN
40% Imputed IB-GAN
50% Imputed IB-GAN
Naive GAN
SMOTE
Baseline
Baseline Class Weights
Baseline Downsample
Metadata Only
Baseline Upsample
Table 2: Trending Products: Classifier Performance by Imputation Level, 10 Replicates.
Sample Size Classifier Type Balanced Accuracy F1-score PR-AUC
IB-GAN
Naive GAN
SMOTE
Baseline Class Weights
Baseline Downsample
Baseline
Baseline Upsample
IB-GAN
Naive GAN
SMOTE
Baseline Class Weights
Baseline Downsample
Baseline
Baseline Upsample
IB-GAN
Naive GAN
SMOTE
Baseline Class Weights
Baseline Downsample
Baseline
Baseline Upsample
IB-GAN
Naive GAN
SMOTE
Baseline Class Weights
Baseline Downsample
Baseline
Baseline Upsample
IB-GAN
Naive GAN
SMOTE
Baseline Class Weights
Baseline Downsample
Baseline
Baseline Upsample
IB-GAN
Naive GAN
SMOTE
Baseline Class Weights
Baseline Downsample
Baseline
Baseline Upsample
Table 3: Trending Products: Classifier Performance by Sample Size, 10 Replicates.

6.2 Multivariate Time Series Classification with UCR Data.

Table 1 reports macro-averaged Balanced Accuracy, F1-score, and PR-AUC with standard errors for IB-GAN classifier with a fixed against select benchmarks for 8 additional open-source MTS datasets in the UCR archive. The data has already been pre-processed and split into training and testing datasets. We utilize the same VGG architecture for all GAN and baseline classifiers, with 2 Conv1D layers each with relu-activation, max pooling and batch normalization. For IB-GAN and Naive GAN, the discriminator and generator models also have 2 Conv1D layers with relu-activation. Each replicate trains for 20 epochs and takes about 20 minutes to train.

Dataset Classifier Type Balanced Accuracy F1-score
FingerMovements IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
HandMovementDirection IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
Handwriting IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
Heartbeat IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
Libras IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
RacketSports IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
SelfRegulationSCP1 IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
SpokenArabicDigits IB-GAN
Naive GAN
SMOTE
Baseline Weights
Baseline Upsample
Baseline
Baseline Downsample
Table 4: UCR MTS Datasets: Classifier Performance, 5 Replicates.

6.3 EEG-Emotion Classification with LSTM.

To replicate the IB-GAN’s success on a much longer sequence and with the popular LSTM classifier for time series, we utilize an open-source Kaggle dataset of EEG brainwave time series to classify the subject’s feelings based on EEG: Neutral, Positive, Negative. There are a total of 2132 samples over 2549 time periods, and classes are roughly equally distributed. 50% of the Neutral class is randomly removed to introduce class imbalance.

Given the long time series, the IB-GAN classifier is a standard LSTM model with 50 hidden neurons, with a Conditional GAN-like generator-discriminator duo with dense layers. The discriminator uses sigmoid activation to predict the final probability of real or fake sample with 0.5 as the cutoff threshold. The generator uses softmax function to predict the class of each sample. Loss functions for generator and discriminator are binary and categorical cross-entropy respectively. We use the Adam optimizer with default parameters instead of stochastic gradient descent for faster convergence. The hyper-parameter

is varied from 10% to 60% missing, and classification metrics are compared standard and GAN baselines. The IB-GAN classifier at any level of imputation outperforms all benchmarks in terms of Balanced Accuracy and F1-score, averaging 0.876 Balanced Accuracy at 30% followed by 0.873 at 10% and 0.865 at 20%.The boxplot in Fig. 5 shows the distribution of the 2 measures across 5 replicates. Table 5 reports macro-averaged Balanced Accuracy, F1-score, and PR-AUC with standard errors for IB-GAN classifier at various levels. Each GAN experiment replicate trains for 100 epochs, about 70 minutes.

Figure 5: EEG-Emotion Classification with LSTM - Performance at various imputation levels.
Classifier Type Balanced Accuracy F1-score
10% IB-GAN
20% IB-GAN
30% IB-GAN
40% IB-GAN
50% IB-GAN
60% IB-GAN
Naive GAN
LSTM SMOTE
LSTM Downsample
LSTM Baseline
LSTM Upsample
LSTM Class Weights
Table 5: EEG-Emotion: Classifier Performance, 5 Replicates.