. However, it is difficult to learn accurate distributions from scarce samples of the minority classes. A common strategy to tackle class imbalance problem is to increase the minority samples in order to have a better representation for the distributions of the minority classes. In particular, previous methods usually increased the size of minority classes by replicating or interpolating the samples from the minority classes. However, repetition may cause the problem of over-fitting because the samples from minority classes are overemphasised. On the other hand, because data are normally sitting in a high dimensional space, interpolation is nontrivial and may generate low-quality samples due to the complexity of the data manifold[Ando and Huang(2017)].
Recently, Generative Adversarial Networks (GANs) have shown some potentials to tackle class imbalance problems because theoretically they are able to reproduce the distributions of minority classes through adversarial learning. In the training process of GANs, the generator learns the mapping from a latent encoding space to the minority class distribution, and the discriminator needs to determine whether an input sample is actually drawn from the minority class or created by the generator [Goodfellow et al.(2014)Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, and Bengio]. As these two networks confront each other constantly, the performances of the generator and discriminator are improved alternately. Finally, the generator reproduces the distributions of the minority classes that the discriminator would not be able to distinguish from the actual minority classes.
There are many successful applications of using GANs. However, GANs can easily get stuck at local optimum when they try to learn the distributions from scarce samples of the minority classes that is also known as mode collapse [Arjovsky and Bottou(2017)] as shown in Figure 1 (a). A more effective training strategy is highly in demand for GANs to avoid the trapping at the local optimum. Previous attempts tried to avoid local optimum by improving adversarial learning objectives [Mao et al.(2017)Mao, Li, Xie, Lau, Wang, and Paul Smolley] [Zhao et al.(2016)Zhao, Mathieu, and LeCun] [Arjovsky et al.(2017)Arjovsky, Chintala, and Bottou]. However, these previous strategies still used a single adversarial learning target in the training process that might still fail to fully overcome the local optimum problem and might also have other limitations, e.g., Wasserstein distance has non-convergent limit cycles near the equilibrium [Nagarajan and Kolter(2017)].
Considering the problems and limitations of those previously proposed methods, we propose a new model, namely Annealing Genetic GAN (AGGAN), to incorporate simulated annealing genetic algorithm into the training process of GANs to avoid the local optimum trapping problem (Figure 1 (b)). The primary contributions of the proposed method are summarised as follows:
We develop a strategy to incorporate simulated annealing genetic algorithm into the training process of GANs to avoid the local optimum trapping.
Through theoretical analysis, we prove that the simulated annealing genetic algorithm enables our proposed AGGAN to reproduce the distributions closest to the minority classes.
We also conduct comprehensive experimental studies and show that our proposed AGGAN method can solve the class imbalance problem efficiently and effectively.
2 Related Work
Class Imbalance Problem
Class imbalance is a common problem in practical classification tasks. The scarce samples of a minority class make it difficult for the classifier to find the boundaries between the distributions of different classes correctly. Therefore, the key to solve class imbalance is to learn an accurate distribution of the minority classes with limited samples. Oversampling can improve the accuracy of the learning on the distributions of minority classes by increasing minority samples, which is a common method to tackle the class imbalance problem [He and Garcia(2009)]. Random oversampling, Synthetic Minority Over-Sampling Technique (SMOTE) [Chawla et al.(2002)Chawla, Bowyer, Hall, and Kegelmeyer] and Border-line SMOTE[Han et al.(2005)Han, Wang, and Mao] are commonly used oversampling methods in classic imbalance problems. However, when dealing with data in high dimensional space, the quality of the synthesised new data points could still be compromised due to noise and poor distance measurement in the high dimensional space [Blagus and Lusa(2013)].
Generative Adversarial Networks
Recently, GANs have achieved great success in image generation [Zhang et al.(2017)Zhang, Xu, Li, Zhang, Wang, Huang, and Metaxas][Mao et al.(2019)Mao, Lee, Tseng, Ma, and Yang], image-to-image synthesis [Isola et al.(2017)Isola, Zhu, Zhou, and Efros]
, image super-resolution[Ledig et al.(2017)Ledig, Theis, Huszár, Caballero, Cunningham, Acosta, Aitken, Tejani, Totz, Wang, et al.] and other applications due to its excellent capability of learning the data distributions by providing abundant training samples. In addition, GANs have also shown some potentials to solve class imbalance problems by learning the distributions of minority classes. Although GANs have been successfully applied in many tasks, due to the limited number of samples in the minority class, GANs may only be able to learn part of the minority class distribution at the end of training, and therefore could be trapped by the local optimum. Some studies have been done to enable GANs to learn a more accurate distribution during training process by utilising improved adversarial learning objectives (e.g. LSGAN [Mao et al.(2017)Mao, Li, Xie, Lau, Wang, and Paul Smolley], energy-based GAN [Zhao et al.(2016)Zhao, Mathieu, and LeCun] and WGAN [Arjovsky et al.(2017)Arjovsky, Chintala, and Bottou]. Nevertheless, there still exist limitations when using fixed adversarial training objectives in the training of GANs. More recently, Evolutionary-GAN (E-GAN) [Wang et al.(2019)Wang, Xu, Yao, and Tao] was proposed and multiple generators were created by different adversarial objectives to overcome the limitations of the fix adversarial objectives, and always kept the well-performed generator in the training process. However, local optimum trapping problem has yet been addressed.
3 Annealing Genetic GAN (AGGAN)
In this paper, we propose AGGAN that aims to learn the accurate distribution from the minority class. First, our AGGAN uses different adversarial learning objectives to improve performance of the generator. Second, our AGGAN incorporates the mechanism of simulated annealing into the training so that the model can converge to the distribution that is closest to the minority class. In particular, in lieu of normal training of GANs, our AGGAN is trained as an evolutionary process, with the discriminator as the environment and the generator as the individual. Each iteration of the individual is divided into two steps including (1) generating the best-fit offspring and (2) updating the generator as illustrated in Figure 2.
3.1 Generating the Best-Fit Offspring
In each iteration, gives birth to different offspring by various adversarial learning objectives. Each represents a solution in the parameter space of the generator network. The individual fitness of offspring is evaluated based on the diversity and quality of the generated samples. Then the best-fit offspring is retained while other offspring are eliminated. This process of generating the best-fit offspring reflects the concept of ‘survival of the fittest’ in the GA. The strategy of using different adversarial objectives overcomes the limitations of using a fixed objective and helps the final learned generator to achieve a better performance.
3.2 Updating the Generator
In our study, we propose to use the mechanism of SA to update the generator. If the individual fitness of was higher than , will be updated to with a probability of 1. If the individual fitness of was lower than the previous generation , will be updated to with a probability of . The probability is determined based on the current temperature and the difference between the two individual fitness. The temperature gradually decreases from the initial temperature conditioning on the annealing coefficient . In doing so, updating with a decreasing probability in a worse direction enables AGGAN to asymptotically converge to the global optimum.
Finally, after updating the individual, the environment (i.e., the discriminator) is updated and the training loop of our AGGAN starts the next evolutionary iteration. As the training progresses, the data generated by G gradually close to the true distribution, which helps D to continuously improve classification accuracy.
4 Theoretical Analysis
In this section, we perform a theoretical analysis of the proposed AGGAN. We will prove that incorporating simulated annealing genetic algorithm into the training process of GANs can reinforce our AGGAN to learn the distribution closest to the minority class, that is, the AGGAN can converge to the global optimum solution with a probability of 1.
For elaboration purpose, we consider the training of GANs as a combinatorial optimisation problem. We consider this combinatorial optimisation problem as a pair of , where is a finite set of the solution space of the generator and is the object function. The aim is to find a global optimum to minimise . It is of note that the finiteness of implies that has at least one minimum over .
Below we provide the definitions of generating the best-fit offspring and updating the generator in AGGAN from a mathematical perspective and prove that the simulated annealing genetic algorithm will make AGGAN to converge to the optimum solution as the training progressing.
|The best offspring of||The best so far|
|Solution space of||Solution space for the iteration|
|The solution space sequence with initial state g|
For readability, we first define the symbols we used in Table 1. Then we give the mathematical definition of ‘generating the best-fit offspring’ and ‘updating the generator’ in AGGAN. In addition, in order to obtain the monotonicity without changing the training mechanism, we add a second element to each individual, which represents the best so far.
Generating the best-fit offspring
First, we use a choice function to find the parent from . Then generates offspring under the production function. Finally, we retain the best-fit offspring from an individual fitness function. All the steps above describe the whole process of ‘generating the best-fit offspring’. Since the choice function, production function and individual fitness function will not be mentioned in the analysis below, we collectively denote the above process using
As the parameters of the choice function, production function and individual fitness function do not depend on the number of iterations, will follow the same distribution with different number of iterations.
Updating the generator
The process of ‘updating the generator’ uses the mechanism of SA. We define it as follows
and is the difference between the individual fitness of and ,
is a random variable between 0 and 1, anddenotes the temperature parameter for our AGGAN.
We combine and to obtain for the representation of the iterative process of the generator
and we use to denote the in the iteration.
4.2 Proof of Convergence
In this section, according to Corollary 1, we will prove that satisfies the following properties, which can ensure that our AGGAN will converge to the global optimum with a probability of 1.
Monotonicity The minimum value of on decreases as becomes larger.
is a Markov chain, and thehave the same distribution then the chain is homogeneous.
First, we discuss the monotonicity of . Since always keeps the best value, for any we have
Then we discuss the homogeneity. Due to the memory-less property of GANs, for any
, the conditional probability distribution of(conditional on both past and present states) depends only upon the , not on the sequence of that preceded it. Then we have
so that has Markov property. Since have the same distribution then
for any and . So that is homogeneous.
Let and the following conditions be satisfied:
(a) is monotone
(b) is homogeneous
(c) for every there exists at least one accessible optimum.
Then surely reaches an optimum.
As shown above, we can prove that our AGGAN can converge to the global optimum, which is the closest solution to the minority class distribution with a probability of 1.
5 Experimental Studies and Discussion
We have used a collection of 6 image datasets for our experiments, namely MNIST [LeCun et al.(1998)LeCun, Bottou, Bengio, and Haffner], Fashion-MNIST [Xiao et al.(2017)Xiao, Rasul, and Vollgraf], SVHN [Netzer et al.(2011)Netzer, Wang, Coates, Bissacco, Wu, and Ng], CIFAR-10 [Krizhevsky et al.(2009)Krizhevsky, Hinton, et al.], CelebA[Liu et al.(2015)Liu, Luo, Wang, and Tang], and LSUN[Yu et al.(2015)Yu, Seff, Zhang, Song, Funkhouser, and Xiao]. We evaluate our method in two imbalanced environments: two-class and multi-class. In binary classification, because all the selected datasets are multi-class datasets, we randomly select two classes from each dataset. Choose Digit 5 (positive) and Digit 6 (negative) from MNIST, Sandal (positive) and Sneaker (negative) from Fashion-MNIST, and Airplane (positive) and Automobile (negative) from CIFAR-10, Digit 8 (positive) and Digit 9 (negative) from SVHN, Eyeglasses (positive) and No-eyeglasses (negative) from CelebA, and choose Church(positive) and Classroom (negative) from LSUN. In multi-classification, we transform the original balanced training in the same way as the paper of Mullick et al[Mullick et al.(2019)Mullick, Datta, and Das]. And we define the Imbalance Ratio (IR) as the number of training samples in the largest class divided by the smallest one.
5.1 Implementation Details
In the binary experiment, we have compared AGGAN, against baseline classifier network (CN), Os+CN (training set is random oversampled), ACGAN[Odena et al.(2017)Odena, Olah, and Shlens] and Evolutionary-GAN[Wang et al.(2019)Wang, Xu, Yao, and Tao] (the version of the discriminator with classifier) to prove the effectiveness of our method. The same network structures are used for these different methods to achieve a fair comparison. In particular, our AGGAN and E-GAN use the same adversarial learning objectives to generate multiple offspring generators in the experiment, including the minimax loss, the modified minimax loss and the least-squares loss. The same evaluation function has been used to measure the individual fitness of generators. In the multi-classification, we compare the proposed method with three the state-of-the-art algorithms which are Class-Balanced[Cui et al.(2019)Cui, Jia, Lin, Song, and Belongie], DOS[Ando and Huang(2017)] and GAMO[Mullick et al.(2019)Mullick, Datta, and Das] respectively. All our experiments have been repeated 5 times to mitigate any bias generated due to randomization and the means of the index values are reported.
5.2 Classification Performance
Table 2 shows the accuracy in binary classification under different imbalance ratios. The experimental results of the original imbalance data and the random oversampled data indicate that data imbalance can significantly affect the performance of the classifier, and simply repeating the minority data could not lead to better performance. The experimental results using GANs for oversampling indecate that GANs can mitigate class imbalance problem. Meanwhile, we can observe that when the IR is low, e.g10, different GANs perform equally well. However, as the degree of imbalance increasing, the advantages of AGGAN become obvious. When the IR reaches 100, the proposed AGGAN outperforms all other GANs significantly. Figure 3 shows the three indicators (precision, recall, and F1-score) of the majority and minority classes in the testing dataset under different methods. We can see that for imbalanced data, the recall of the minority class and the precision of the majority class are low. This is because the classifier will judge the data as the majority class as much as possible to improve the accuracy. The proposed method outperforms the other three methods, which is particularly evident in the recalls of minority class. This provides solid evidence that the ability of AGGAN to reconstruct the distribution of the minority class can make more minority data be classified correctly and improve the performance of the classifier significantly. Table 3 shows the experimental results on multi-classification. It indicates that AGGAN can still perform well on more complex imbalanced datasets and achieve the state-of-the-art level. Compared with ACGAN, E-GAN uses the idea of evolution, and our method further combines the methods of genetic and simulated annealing. The experimental results of these three methods fully indicate the effectiveness of the modules of genetic and simulated annealing in AGGAN.
Figure 4 (a) we visualize the features of MNIST and Fashion-MNIST datasets before and after using AGGAN to balance the data, respectively. We obtain features by forwarding images to a classifier pre-trained on the original training set, and features with a specific category in each figure are represented in the same color. The test set of each category is represented by translucent dots of corresponding colors. The first column shows the original imbalanced training datasets, in which the training samples of minority class can only cover part of the minority distribution of the test set (which is balanced). As a result, many samples belong to minority class will be misclassified. However, in the second column of Figure 4 (a), we can see that after using AGGAN for over-sampling, the minority samples in the training set can almost cover the complete distribution of the minority class in the test set, so the performance of the classifier can be significantly improved. These results indicate that AGGAN can learn realistic distribution from scarce minority samples, and in turn prove the superior performance of AGGAN from the perspective of data distribution.
5.4 Hyper-parameters analysis
For a better understanding of the role of the initial temperature and the annealing coefficient
proposed in AGGAN, we use the CIFAR-10 dataset with IR 10 to show the accuracy and training epochs of the proposed method in Figure4 (b). The search for hyper-parameters are , . We have the following observations that at the beginning of training, smaller value of and make the model converge faster, while the larger and (e.g,) make the model converge more slowly. However, with an increasing training epoches, the model can achieve higher accuracy finally. These results can indicate that AGGAN is robust to the different values of hyper-parameters.
In this work, we propose a novel training strategy for GANs, dubbed AGGAN, which aims to reproduce the distributions closest to the ones of the minority classes using limited data samples. Both theoretical analysis and comprehensive experimental studies have shown the robustness and efficacy of our AGGAN.
- [Aarts et al.(1989)Aarts, Eiben, and Van Hee] Emile Hubertus Leonardus Aarts, Ágoston E Eiben, and KM Van Hee. A general theory of genetic algorithms. 1989.
- [Ando and Huang(2017)] Shin Ando and Chun Yuan Huang. Deep over-sampling framework for classifying imbalanced data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 770–785. Springer, 2017.
- [Arjovsky and Bottou(2017)] Martin Arjovsky and Léon Bottou. Towards principled methods for training generative adversarial networks. Stat, 1050, 2017.
- [Arjovsky et al.(2017)Arjovsky, Chintala, and Bottou] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
- [Blagus and Lusa(2013)] Rok Blagus and Lara Lusa. Smote for high-dimensional class-imbalanced data. BMC Bioinformatics, 14(1):106, Mar 2013.
[Chawla et al.(2002)Chawla, Bowyer, Hall, and
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer.
Smote: synthetic minority over-sampling technique.
Journal of artificial intelligence research, 16:321–357, 2002.
- [Cui et al.(2019)Cui, Jia, Lin, Song, and Belongie] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In
- [Galar et al.(2011)Galar, Fernandez, Barrenechea, Bustince, and Herrera] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems Man & Cybernetics Part C Applications & Reviews, 42(4):463–484, 2011.
- [Goodfellow et al.(2014)Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, and Bengio] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
- [Han et al.(2005)Han, Wang, and Mao] Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, pages 878–887. Springer, 2005.
- [He and Garcia(2009)] Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009.
- [Isola et al.(2017)Isola, Zhu, Zhou, and Efros] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
- [Krizhevsky et al.(2009)Krizhevsky, Hinton, et al.] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
- [LeCun et al.(1998)LeCun, Bottou, Bengio, and Haffner] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- [Ledig et al.(2017)Ledig, Theis, Huszár, Caballero, Cunningham, Acosta, Aitken, Tejani, Totz, Wang, et al.] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
- [Liu et al.(2015)Liu, Luo, Wang, and Tang] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- [Mao et al.(2019)Mao, Lee, Tseng, Ma, and Yang] Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, and Ming-Hsuan Yang. Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- [Mao et al.(2017)Mao, Li, Xie, Lau, Wang, and Paul Smolley] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2794–2802, 2017.
- [Mullick et al.(2019)Mullick, Datta, and Das] Sankha Subhra Mullick, Shounak Datta, and Swagatam Das. Generative adversarial minority oversampling. In Proceedings of the IEEE International Conference on Computer Vision, pages 1695–1704, 2019.
- [Nagarajan and Kolter(2017)] Vaishnavh Nagarajan and J Zico Kolter. Gradient descent gan optimization is locally stable. In Advances in Neural Information Processing Systems, pages 5585–5595, 2017.
- [Netzer et al.(2011)Netzer, Wang, Coates, Bissacco, Wu, and Ng] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
- [Odena et al.(2017)Odena, Olah, and Shlens] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2642–2651. JMLR. org, 2017.
[Wang et al.(2019)Wang, Xu, Yao, and Tao]
Chaoyue Wang, Chang Xu, Xin Yao, and Dacheng Tao.
Evolutionary generative adversarial networks.
IEEE Transactions on Evolutionary Computation, 2019.
- [Xiao et al.(2017)Xiao, Rasul, and Vollgraf] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- [Yu et al.(2015)Yu, Seff, Zhang, Song, Funkhouser, and Xiao] Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- [Zhang et al.(2017)Zhang, Xu, Li, Zhang, Wang, Huang, and Metaxas] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5907–5915, 2017.
- [Zhao et al.(2016)Zhao, Mathieu, and LeCun] Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126, 2016.