CDE-GAN: Cooperative Dual Evolution Based Generative Adversarial Network

08/21/2020 ∙ by Shiming Chen, et al. ∙ University of Tasmania 2

Generative adversarial networks (GANs) have been a popular deep generative model for real-word applications. Despite many recent efforts on GANs have been contributed, however, mode collapse and instability of GANs are still open problems caused by their adversarial optimization difficulties. In this paper, motivated by the cooperative co-evolutionary algorithm, we propose a Cooperative Dual Evolution based Generative Adversarial Network (CDE-GAN) to circumvent these drawbacks. In essence, CDE-GAN incorporates dual evolution with respect to generator(s) and discriminators into a unified evolutionary adversarial framework, thus it exploits the complementary properties and injects dual mutation diversity into training to steadily diversify the estimated density in capturing multi-modes, and to improve generative performance. Specifically, CDE-GAN decomposes the complex adversarial optimization problem into two subproblems (generation and discrimination), and each subproblem is solved with a separated subpopulation (E-Generators and EDiscriminators), evolved by an individual evolutionary algorithm. Additionally, to keep the balance between E-Generators and EDiscriminators, we proposed a Soft Mechanism to cooperate them to conduct effective adversarial training. Extensive experiments on one synthetic dataset and three real-world benchmark image datasets, demonstrate that the proposed CDE-GAN achieves the competitive and superior performance in generating good quality and diverse samples over baselines. The code and more generated results are available at our project homepage https://shiming-chen.github.io/CDE-GAN-website/CDE-GAN.html.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 5

page 6

page 7

page 9

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction and Motivation

Generative adversarial networks (GANs) [13]

are new popular methods for generative modeling, using game-theoretic training schemes to implicitly learn a given probability density. With the potential power of capturing high dimensional probability distributions, GANs have been successfully deployed for various synthesis tasks, e.g., image generation

[54], video prediction [46, 49], text synthesis [51, 22]

, and image super resolution

[21].

A GAN consists of a framework describing the interaction between two different models, i.e., generator(s) and discriminator(s), which are used to solve a min-max game optimization problem. During learning, generator tries to learn real data distribution by generating realistic looking samples that are able to fool discriminator, while discriminator attempts to differentiate between samples from the data distribution and the ones produced by generator. Although the pioneered GAN provided some analysis on the convergence properties of the approach [13, 14], it assumed that updates occurred in pure function space, allowed arbitrarily powerful generator and discriminator networks, and modeled the resulting optimization objective as a convex-concave game, therefore yielding well-defined global convergence properties. In addition, this analysis assumed that the discriminator network is fully optimized between generator updates, an assumption that does not mirror the practice of GAN optimization. In practice, however, there exist a number of well-documented failure models for GANs such as mode collapse111the generator can only learn some limited patterns from the large-scale given datasets, or assigns all of its probability mass to a small region in the space [4] or instability222the discriminator is able to easily distinguish between real and fake samples during the training phase[31, 5, 3].

Therefore, many recent efforts on GANs have contributed to overcome these optimization difficulties by developing various adversarial training approaches, i.e., modifying the optimization objective, training additional discriminators, training multiple generators, and using evolutionary computation. The first method is a typical way to control the optimization gradient of discriminator or generator parameters, which would help GANs to steadily arrive at the equilibrium points of optimization procedure under proper conditions [3, 27, 48, 15, 28]. Compared to the traditional GANs performed single-objective optimization, the second method revisited the multiple-discriminator setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Thus it would overcome the problem involving the lack of useful gradient signal provided by discriminator [32, 10, 31, 9, 1]. The third method simultaneously trains multiple generators with the object that mixture of their induced distributions would approximate the data distribution, and thus mode collapse problem can be alleviated [43, 4, 18, 11]. However, the aforementioned GAN methods are limited by the specified adversarial optimization strategy. The last method introduced evolutionary computation for the optimization of GANs to improve generative performance [36, 47, 6, 44, 22]. In fact, the existing evolutionary GANs evolve a population of generators (e.g., E-GAN [47], CatGAN [22]) or GANs (Lipizzar [36], Mustangs [44]) to play the adversarial game, which will result that GANs evolve in a relatively static environment and the potential power of GANs is limited. In essential, training GANs is a large-scale optimization problem, which is challenging [47, 1]. This is why the most existing GAN methods usually fall in instability, and mode collapse. To overcome these challenges, we propose a novel insight for GANs, in which cooperative dual evolution paradigm is proposed for dynamic multi-objective optimization of a GAN.

Recently, evolutionary computation (EC), as a powerful method to complex real-word optimization problems, has been used to solve many deep learning challenges, e.g., optimizing deep learning hyper-parameters

[42, 40, 39, 37, 45] and designing network architecture [41, 38]. Meanwhile, researchers also attempt to apply EC on GANs to improve the robust of GANs for mitigating degenerate GAN dynamics [36, 47, 6, 44, 22]. Among them, Lipizzaner [36] and Mustangs [44] use a spatial distributed competitive co-evolutionary algorithm, to provide diversity in the genome space for GANs. In [47], Wang introduced E-GAN that injects diversity into the training with the evolutionary population. Accordingly, we attempt to train GANs using cooperative dual evolutionary paradigm, which is effective for large-scale optimization task and distribution diversity learning [12, 33, 16, 24, 53, 55, 8, 52].

In light of the above observation, we propose a cooperative dual evolution based generative adversarial network, called CDE-GAN, to train model effectively and to improve generation performance steadily. In essence, CDE-GAN incorporates dual evolution with respect to generator(s) and discriminators into a unified evolutionary adversarial framework, thus it exploits the complementary properties and injects dual mutation diversity into training to steadily diversify the estimated density in capturing multi-modes. Our strategy towards achieving this goal is to decompose the complex adversarial optimization problem into two subproblems (generation and discrimination), and each subproblem is solved with a separated subpopulation (E-Generators and E-Discriminators), evolved by an individual evolutionary algorithm (including individual mutations and cooperative fitness functions). Unlike the existing EC based GANs [36, 47, 44], CDE-GAN simultaneously evolves a population of generators (population) and an array of discriminators (environment) by operating their various objective functions (mutations), which are interpretable and complementary. Intuitively, CDE-GAN integrates multi-discriminator and EC into a unified framework, which takes advantage of their complementary pros for adversarial optimization of GANs. During training E-Generators, acting as parents, generators of CDE-GAN undergo different mutations to produce offspring to adapt to the dynamic environment. Meanwhile, we term the quality and diversity of samples generated by the evolved offsprings as fitness score. According to the fitness score, poorly performing offspring are removed and the remaining well-performing offspring are preserved and used for next generation training. Given current optimal generators, the similar mechanism holds for E-Discriminators with its own mutations and fitness function, and thus discriminators provide more promising gradient to generators for distribution diversity learning. To keep the balance between E-Generators and E-Discriminators, we proposed a Soft Mechanism to bridge them to conduct effective and stable adversarial training. In this way, CDE-GAN possesses three key benefits: 1) the cooperative dual evolution (E-Generators and E-Discriminators) injects diversity into training so that CDE-GAN can cover different data modes, which significantly mitigates mode collapse of GANs; 2) CDE-GAN terms adversarial training as a multi-objective dynamic optimization problem, the multiple discriminators provide informative feedback gradient to generators for stabilizing the training process; 3) the complementary mutations in E-Generators and E-Discriminators will help model place fair distribution of probability mass across the modes of the data generating distribution.

To summarize, this study makes the following salient contributions:

  • We propose a novel GAN method, cooperative dual evolution based generative adversarial network (CDE-GAN). Benefitting from the proposed E-Generators, E-Discriminators and Soft Mechanism, CDE-GAN incorporates dual evolution with respect to generator(s) and discriminators into a unified evolutionary adversarial framework to overcome adversarial optimization difficulties of GANs, i.e., mode collapse and instability.

  • We introduce suitable evaluation and selection mechanisms for E-Generators and E-Discriminators to improve the whole training process more efficient and stable, i.e., complementary mutations and interpretable fitness functions.

  • We carry out extensive experiments on several benchmark datasets to verify our claims, and to show that our proposed method achieved obvious advantages over the existing methods.

The remainder of this paper is organized as follows. Section II gives related works in the field of generative adversarial networks. The proposed method CDE-GAN is illustrated in Section III. The performance and evaluation are provided in Section IV. Section V presents the discussion. Section VI provides a summary and the outlook for future research.

Ii Related Work

In the following, we provide a comprehensive review of GANs that developed various adversarial training approaches to overcome the optimization difficulties based on different methods, i.e., modifying training objective for GANs, multi-discriminator for GANs, multi-generator for GANs, evolutionary computation for GANs.

Ii-a Modifying Training Objective for GANs

Modifying training objective of GANs is a typical way to improve and stabilize the optimization of GANs. Radford et al. [34] introduced DCGAN to improve training stability. In [3], Arjovsky not only proposed Wasserstein-GAN to minimize a reasonable and efficient approximation of the Earth Mover (EM) distance for promoting the stability of training, he but also theoretically showed the corresponding optimization problem. Meanwhile, Metz et al. [27]

introduced a method to stabilize GANs and increase diversity by defining the generator objective with respect to an unrolled optimization of the discriminator. Based on idea that gradient signals from Denoising AutoEncoder (DAE) can guide the generator towards producing samples whose activations are close to the manifold of real data activations, Denoising Feature Matching (DFM) is proposed to improve GANs training

[48]. Served as a new normalization technique in GANs, SN-GAN called spectral normalization to stabilize the training of the discriminator and achieved promising generative performance [28]. In general, although some of these methods are practically and theoretically well-founded, convergence still remains elusive in practice.

Ii-B Multi-discriminator for GANs

Unlike to the traditional GANs performed single-objective optimization, some works attempt to revisit the multiple discriminators setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Thus it will overcome the problem of lacking of useful gradient signal provided by discriminator. Nguyen et al. [32] proposed D2GAN, combing the KL and reverse KL divergences into a unified objective function, to exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. Durugkar et al. [10] simultaneously introduced multiple discriminators into the GAN framework to weaken the discriminators of GMAN, which provides informative feedback to generator and better guides generator towards amassing distribution in approximately true data region. In [31], Neyshabur proposed an array of discriminators, each of which looks at a different random low-dimensional projection of the data, to play the adversarial game with a single generator. Thus the individual discriminators are fail to reject generated samples perfectly and provide meaningful gradients to the generator throughout training continuously. In [9], Doan argued that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide region of the true data distribution. To this end, he proposed an on-line adaptative curriculum learning framework that trains the generator against an ensemble of discriminators. Albuquerque et al. [1] framed the training of multi-discriminator based GANs as a multi-objective optimization problem and analysed its effectiveness. Overall, although multi-discriminator based GANs perform promising results, they neglect to mine the prominence of generator.

Ii-C Multi-generator for GANs

Multi-generator based GANs simultaneously trained multiple generators with the object that mixture of their induced distributions would approximate the data distribution, and thus mode collapse problem can be alleviated. Motivated by boosting method, Tolstikhin et al. [43] trained a mixture of generators by sequentially training and adding new generators to the mixture. Arora et al. [4] introduced a MIX+GAN framework to optimize the minimax game with the reward function being the weighted average reward function between any pair of generator and discriminator. In [11], Ghost proposed MAD-GAN that trains a set of generators using a multi-class discriminator, which predicts which generator produces the sample and detecting whether the sample is fake. Additionally, MGAN [18] was developed to overcome the mode collapsing problem and its theoretical analysis was provided. Indeed, multi-generator based GANs break the balance of generator-discriminator and fail to explore the potential power of discriminator.

Ii-D Evolutionary Computation for GANs

In fact, the aforementioned GAN methods are limited by the specified adversarial optimization strategy, which heavily limits optimization performance during training. Since EC has been successfully applied to solve many deep learning challenges [42, 40, 39, 37, 45], some researchers attempt to overcome different training problems of GANs using EC technique. In [36], Schmiedlechner proposed Lipizzaner, which provides population diversity by training a two-dimensional grid (each cell contains a pair of generator-discriminator) of GANs with a distributed evolutionary algorithm. Wang et al. [47] introduced E-GAN to inject mutation diversity into adversarial optimization of GANs by training the generator with three independent objective functions then selecting the resulting best performing generator for the next batch. Based on E-GAN, Liu et al.

developed CatGAN with hierarchical evolutionary learning for category text generation. In

[44], Toutouh proposed Mustangs, hybridizing E-GAN and Lipizzaner, to combine mutation and population approaches to diversity improvement of GANs. Costa et al. [6]

developed COEGAN, using neuro-evolution and coevolution in the GAN training, to provide a more stable training method and the automatic design of neural network architectures. However, the aforementioned evolutionary GANs evolve a population of generators or GANs to play the adversarial game, which will result that GANs evolve in a relatively static environment and the potential power of GANs is limited. To this end, we propose a cooperative dual evolution based GAN, which simultaneously evolves a set of generators and an array of discriminators by operating different objective functions (mutations) to explore better balance between generator and discriminator of GANs.

Iii Proposed Method

Fig. 1: The pipeline of CDE-GAN. In brief, CDE-GAN decomposes the complex adversarial optimization problem into two subproblems (generation and discrimination), and each subproblem is solved with a separated subpopulation (i.e., E-Generators and E-Discriminators), evolved by an individual evolutionary algorithm (including individual Variations, Evaluations and Selections). The best offspring of E-Generators and E-Discriminator are served as new parents to produce the next generation’s individuals (i.e., children). Furthermore, a Soft Mechanism is proposed to cooperate E-Generators and E-Discriminators to conduct effective adversarial training.

Motivated by the success of cooperative co-evolutionary algorithm in large-scale optimization and diversity learning [12, 33, 16, 24, 53, 55, 8, 52], in this section, we propose a Cooperative Dual Evolution based Generative Adversarial Network (CDE-GAN) to circumvent drawbacks (i.e., instability and mode collapse) of GANs. In essence, CDE-GAN incorporates dual evolution with respect to generator and discriminator into a unified evolutionary adversarial framework, thus it exploits the complementary properties and injects dual mutation diversity into training to steadily diversify the estimated density in capturing multi-modes. Therefore, it will improve generative performance. As shown in Figure 1, CDE-GAN decomposes the complex adversarial optimization problem into two subproblems (generation and discrimination), and each subproblem is solved with a separated subpopulation (i.e., Evolutionary Generators and Evolutionary Discriminators, which are termed as E-Generators and E-Discriminators respectively), evolved by an individual evolutionary algorithm. To this end, we first propose a Soft Mechanism to keep the balance between E-Generators and E-Discriminators and to cooperative them to conduct effective adversarial training. Then, we introduce E-Generators and E-Discriminators, including their own Variations, Evaluations and Selections.

Iii-a Soft Mechanism

For the sake of keeping the balance between E-Generators and E-Discriminators, we proposed a Soft Mechanism to bridge them to conduct effective adversarial training. In practice, generator’s learning will be impeded when it trains over a far superior discriminator. That is, generator is unlikely to generate any samples considered "realistic" according to the discriminator’s standards, and thus the generator will receive uniformly negative feedback [10]. To this end, we use a Soft Mechanism of the classical Pythagorean method parameterized by to weaken the discriminators of CDE-GAN. Specifically, the generator(s) trains against a softmax weighted arithmetic average of different discriminators, while each discriminator maximizes its own objective function. It can be formulated as

(1)

where , , and . When , trains against the best discriminator (i.e., a single weak discriminator); when , trains against an ensemble with equal weights. These two scenes are not desired. Equation 1 is a formulation of multi-objective optimization[7], thus CDE-GAN terms adversarial training as a multi-objective dynamic problem, and the multiple discriminators provide informative feedback gradient to generators for stabilizing the training process. According to [10], we set for our all experiments.

Iii-B E-Generators

In fact, the most existing GAN methods (e.g., modifying training objective based GANs, multi-discriminator based GANs and multi-generator based GANs) are limited by the specified adversarial optimization strategy, which heavily affects optimization performance during training. In addition, multi-generator based GANs simultaneously trained multiple generators with the object that mixture of their induced distributions would approximate the data distribution, and thus mode collapse problem can be alleviated. To this end, we build E-Generators that evolves a set of generators with number of in a given dynamic environment (E-Discriminators) based on an evolutionary algorithm. E-Generators is a individual subpopulation to solved a subproblem (generation), which is cooperative with E-Discriminators to solve adversarial multi-objective optimization problem of GANs. Based on the success of EGAN [47], we take similar variations (mutations), evaluation function (fitness function) and selection for the evolution of E-Generators. The evolutionary process of E-Generators is presented in Algorithm 1.

1:The generator ; the discriminator ; the batch size ; the number of iterations ; the discriminator’s updating steps per iteration ; the number of parents for generator ; the number of parents for discriminator ; the number of mutations for generator ; the number of mutations for discriminator ; Adam hyper-parameters , , ; the hyper-parameter of fitness function of generator.*
2:Initialize generator’s parameter , initialize discriminator’s parameter .
3:for  do          
4:     for  do E-Discriminators Evolution
5:         for  do
6:              for  do
7:                  Sample a batch of .
8:                  Sample a batch of .
9:                  
10:                  
11:                  Caculate .
12:              end for
13:         end for
14:         
15:         
16:     end for
17:     for  do E-Generators Evolution
18:         for  do
19:              Sample a batch of .
20:              
21:              
22:              
23:         end for
24:     end for
25:     
26:     
27:end for

*Default values: , , , , , and .

Algorithm 1 The algotithm of CDE-GAN.

CDE-GAN applies three complementary mutations corresponding with three different minimization objective w.r.t generator , i.e., G-Minimax mutation (),

G-Heuristic mutation

(), and G-Least-Square mutation (). They are corresponding to original GAN (GAN) [13], non-saturated GAN (NS-GAN) [13], and least square GAN (LSGAN) [25, 26], respectively. In contrast to mutations of E-GAN involving generator(s) against a single specified discriminator, the mutations of E-Generators train over multiple evolutionary discriminators. The G-Minimax mutation is corresponding to the minimization objective of generator in original GAN [13], defined as

(2)

In fact, G-Minimax mutation is to minimize the Jensen-Shannon Divergence (JSD) between the data and model distributions. If discriminators distinguish generated samples with high confidence, the gradient tend to vanish, G-Minimax mutation fails to work; if discriminators cannot completely distinguish real/fake sample, the G-Minimax mutation will provide informative gradient for adversarial training. Thus, G-Minimax mutation typically evolves the best offspring in the latter training process for CDE-GAN. Additionally, G-Heuristic mutation is non-saturating when the discriminator convincingly rejects the generated samples, and thus it avoids gradient vanish. It is formulated as follow:

(3)

However, G-Heuristic mutation may direct to training instability and generative quality fluctuations because it pushes the data and model distributions away each other. As for G-Least-Square mutation, it is inspired by LSGAN [25], which applies this criterion to adapt both generator and discriminator. It can be written as

(4)

Similar to G-Heuristic, G-Least-Square mutation will effectively avoid gradient vanish when the discriminator easily recognizes the generated samples. Meanwhile, G-Least-Squares mutation will partly avoid mode collapse, because it not assigns an extremely high cost to generate fake samples and also not assigns an extremely low cost to mode dropping. Thus, these different mutations provide various training strategy for E-Generators, which injects mutation diversity into training to diversify the estimated density in capturing multi-modes and constructs a complementary population of generators for steadily training. See [13, 25, 2, 47] for more theoretical analysis of these three objective functions.

After producing the next generation’s children with different mutations, we evaluate the individual’s quality for each child using a fitness function that depends on the current environment (i.e., discriminators ). Considering two typical properties (quality333the generated samples are so realistic enough that it will fool the superior discriminator.[35] and diversity444the model distribution is more possible to cover the real data distribution. It could largely avoid mode collapse.[17]) of generated sample, consists of two fitness scores, i.e., quality fitness scores and diversity fitness score , for evaluating the generative performance of CDE-GAN. On the one hand, the generated samples produced by generator are feed into discriminators and the sum value of the output are calculated, which is termed as

(5)

The higher quality score generators achieved, the more reality generated samples gotten. Reflecting the quality performance of generators at each evolutionary step, discriminators are constantly upgraded to be optimal during the training process. On the other hand, we also focus on the diversity of generated samples and attempt to gather a better group of generators to circumvent the mode collapse issue in adversarial optimization of GAN. According to [30], a gradient-based regularization term can stabilize the GAN optimization and suppress mode collapse. To this end, the minus log-gradient-norm of optimizing is used to measure the diversity fitness score of generated samples

(6)

When an evolved generator obtains a relatively high value, which corresponds to small gradients of , its generated samples tend to spread out enough and to avoid the discriminators from having obvious countermeasures. Therefore, is formulated as

(7)

is used for balancing quality and diversity of generated samples. Generally, a higher fitness score directs to higher training efficiency and better generative performance. Finally, a simple yet useful survivor selection strategy -selection [19] is employed to select the next generation evolution according to the highest fitness score of existing individuals.

Iii-C E-Discriminators

Recently, some works attempt to revisit the multiple discriminators setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem [32, 10, 31, 9]. Thus it would overcome the problem of lacking of useful gradient signal provided by discriminators. To this end, we develop E-Discriminators, that evolves a population of discriminators, to provide meaningful gradient for E-Generators of CDE-GAN. In fact, E-Discriminators possesses two advantages to help CDE-GAN steadily achieve promising generative performance: 1) the evolutionary mechanism provides a dynamic strategy to discriminators, thus the trade-off between generator(s) and discriminators is well adjusted during training; 2) individual discriminators are unable to reject generated samples perfectly and continue to provide meaningful gradients to the generator throughout training. Specifically, given the optimal , the similar evolutionary mechanism of E-Generators holds for E-Discriminators with its own mutations and fitness function. The evolutionary process of E-Discriminators is presented in Algorithm 1.

Here, we take various objective functions as the mutations for E-Discriminators. To keep the objective functions of E-Discriminators are corresponding to the objective functions of E-Generators, we take two different objective w.r.t discriminators , including D-Minimax mutation (), and D-Least-Square mutation (). According to [13], D-Minimax mutation is a shared objective function for original GAN and NS-GAN, it can be formulated as

(8)

Indeed, D-Minimax mutation adopts the sigmoid cross-entropy loss for the discriminator, which will lead to the problem of vanishing gradients when updating the generator using the fake samples that are on the correct side of the decision boundary, but are still far from the real data [26]. Specifically, the non-saturating loss will saturate when the input is relatively large; the minimax objective will saturate when the input is relatively small. Thus, the generative performance of E-GAN [47] will be limited when it bases on single discriminator with minimax objective. In light of this observation, we further adopt least squares objective function for the second mutation of E-Discriminators:

(9)

Benefiting the least square objective function penalizes samples that lie in a long way on the correct side of the decision boundary, is capable of moving the fake samples toward the decision boundary. To this end, will help CDE-GAN generate samples that are closer to the real data. Overall, and are complementary to each other and will provide a promising optimization direction for the evolution of E-Discriminators, which effectively adjusts the trade-off between generator(s) and discriminators.

To evaluate the subpopulation evolved by E-Discriminators, we take the minus log-gradient-norm of optimizing as fitness function , which is corresponding to the diversity fitness score of generated samples in :

(10)

There are two reasons for this setting: 1) the gradient significantly reveals the train status of GANs. When the generator is able to generate realistic samples, the discriminators will not reject the generated sample confidently (i.e., updated with small gradient); when the generator collapses to a small region, the discriminator will subsequently label collapsed points as fake with obvious countermeasure (i.e., updated with big gradient); 2) the cooperative fitness function (Equation 6 and Equation 10) between E-Generators and E-Discriminators will keep the adversarial consistence of them, which will improve the training stability of CDE-GAN. Therefore, is effective for representing whether model falls in mode collapse, and thus it will guide E-Discriminators to evolve with the meaningful direction.

After evaluation, the new parents of evolution can be selected following the principle of survival-of-the-fittest, which is similar to the selection of E-Generators. Specifically, we first initialize discriminators. In one training step, each existing discriminator will evolve a set of children with different mutations, thus there are individuals (including current parents and children) for competing. After sorting, individuals possessing the lowest fitness score can be survived for next evolution during adversarial training.

Fig. 2:

Experiments on the CIFAR-10 dataset for hyper-parameters analysis. (a) Inception score evaluation for different CDE-GANs with various balance factor

. (b) Inception score evaluation for different CDE-GANs with various numbers of discriminators .
Generative network Discriminative network
DCGAN[34, 15, 47]
Input: Noise , 100 Input: Image, ()
layer 1 Fully connected and Reshape to (

); ReLU;

layer 1

Convolution (4, 4, 128), stride=2; LeakyReLU;

layer 2 Transposed Convolution (4, 4, 512), stride=2; ReLU; layer 2 Convolution (4, 4, 256), stride=2; LeakyReLU;
layer 3 Transposed Convolution (4, 4, 256), stride=2; ReLU; layer 3 Convolution (4, 4, 512), stride=2; LeakyReLU;
layer 4 Transposed Convolution (4, 4, 128), stride=2; ReLU; layer 4 Fully connected (1); Sigmoid/Least squares;
layer 5 Transposed Convolution (4, 4, 3), stride=2; Tanh; Output: Real or Fake (Probability)
Output: Generated Imgage, ()
MLP with 3 Layers [25, 26]
Input: Noise , 256 Input: Point
layer 1 Fully connected (128); ReLU; layer 1 Fully connected (128); LeakyReLU;
layer 2 Fully connected (128); ReLU; layer 2 Fully connected (128); LeakyReLU;
layer 3 Fully connected (2); Linear; layer 3 Fully connected (1); Sigmoid/Least squares;
Output: Generated Point Output: Real or Fake (Probability)
MLP with 4 Layers [15]
Input: Noise , 256 Input: Point
layer 1 Fully connected (128); ReLU; layer 1 Fully connected (128); ReLU;
layer 2 Fully connected (128); ReLU; layer 2 Fully connected (128); ReLU;
layer 3 Fully connected (128); ReLU; layer 3 Fully connected (128); ReLU;
layer 4 Fully connected (2); Linear; layer 4 Fully connected (1); Sigmoid/Least squares;
Output: Generated Point Output: Real or Fake (Probability)
TABLE I: Architectures of the Generative and Discriminative Networks Used in This Work, i.e., DCGAN Model, MLP with 3 Layers, and MLP with 4 Layers.

Iv Experiments and Evaluation

In subsequent sections, we first introduce the implementation details and hyper-parameter analysis of experiments. Furthermore, we qualitatively and quantitatively analyse the generative performance of CDE-GAN to verify our claims. Finally, we demonstrate advantages of our method over 16 state-of-the-art methods, including modifying training objective based GANs, multi-generator based GANs, multi-discriminator based GANs, and evolutionary computation based GANs.

Iv-a Implementation Details

In the following experiments, we use the default hyper-parameter values listed in Algorithm 1. We conduct extensive experiments on one synthetic dataset (i.e., a mixture of 8 Gaussians arranged in a circle) and three real-word benchmark datasets (i.e., CIFAR10 [20], LSUN-Bedrooms [50] and CelebA [23]) to prove the effectiveness of CDE-GAN. Furthermore, we adopt the same network architectures (DCGAN) with existing in works [34, 15, 47] to conduct real data experiments for facilitating direct comparison. For the sake of fair comparison, we select MLP (with 3 layers [25, 26] or 4 layers [15]) as the model architecture of CDE-GAN to conduct toy experiments and generate 512 points to cover modes. The model architectures are clearly displayed in Table I

. The noisy vector

are sampled from uniform distribution

with 100 dimensions and 256 dimensions for real-word datasets and synthetic dataset, respectively. All experiments are performed on a single NVIDIA 1080Ti graphic card with 11GB memory. We use PyTorch for the implementation of all the experiments.

Additionally, we use inception score (IS) [35]

to quantitatively evaluate the performance of the proposed method. It is a common quantitative evaluation metrics in image generation of GANs. The higher IS is got, the better quality of samples is generated. Note that IS is calculated by Tensorflow code version using randomly generated 50k samples in this paper. Meanwhile, we also qualitatively evaluate our method.


(a) Gaussian kernel estimation with MLP of 3 layers
(b) Gaussian kernel estimation with MLP of 4 layers

Fig. 3: Dynamic results of Gaussian kernel estimation over generator iteration for different GANs. For each pair of images, the left one is data distribution (real data is represented in blue, generated data is represented in red), and the right one is KDE plots of the generated data corresponded to its left generated data. From top to bottom, the rows are the results of original GAN, NS-GAN, LSGAN, E-GAN, and CDE-GAN (Ours).

Iv-B Experiment 1: Hyper-Parameters Analysis

In principle, there are two hyper-parameters closely corresponding to the performance of CDE-GAN, i.e., balance factor (see Equation 7) and number of discriminators . Since the hyper-paramter is used for balancing the measurement of sample quality () and diversity (), it directs generator selection of E-Generators and thus it will affect the effectiveness of CDE-GAN. Meanwhile, we analyze how the number of discriminators affects the sample diversity of the corresponding generator and select proper discriminators for balancing time consuming and generative performance of CDE-GAN.

Iv-B1 Balance Factor

In fact, quality and diversity of synthesis object are two key goals of generative task. Analogously, we also consider these two measurements for CDE-GAN evaluation in image generation task. Here, we embed a balance factor into fitness score of generator to balance the quality and diversity of generated samples during generator updation. Similar to [47], is considered. If is set as too small, the diversity fitness score is almost not considered during the training process; while is set too large, the model is not stable since the gradient-norm of discriminators could vary largely during training. To this end, is considered for setting of our experiments.

To select a proper for CDE-GAN, we run grid search to find its value on CIFAR-10. As shown in Fig. 2(a), we take various balance factor to conduct experiments. Results show that CDE-GAN is out of work at the beginning and achieves convergence with slow speed, when is set as 1; while it gains promising generative performance and comparable convergence speed, if is set as a relative small value, i.e., 0.1 and 0.01. Based on these observations, we take to conduct later experiments on real-word datasets.

Iv-B2 Number of Discriminators

Multi-discriminator based GANs frame the training of GAN as multi-objective optimization problem, which will overcome the problem of lacking of meaningful gradient signal provided by discriminator. Here, we can also extend CDE-GAN as a multi-discriminator based GAN. Specifically, we take E-Discriminators to evolve multiple discriminators () with various mutations. According to [10, 1, 29], the number of discriminators is closely corresponding to the diversity of generated sample. Therefore, we take experiments in aiming for analyzing how the number of discriminators affects the generative performance of CDE-GAN and selecting a promising for later experiments.

In Fig. 2(b), we report the box-plots of inception score evaluation for different CDE-GANs with different numbers of discriminators on CIFAR-10 across 3 independent runs. Results clearly show that increasing the number of discriminators yields better generative performance for CDE-GAN. Note that CDE-GANs is more stable when more numbers of discriminator are survived for evolution. This is benefiting from effectiveness of multi-objective optimization methods [1]. Moreover, we convince that CDE-GAN can further improve generative performance if more discriminators are survived during training. To take trade-off between time consuming and efficacy, we conduct latter experiments with two discriminators for CDE-GAN, unless otherwise claimed.

Iv-C Experiment 2: Generative Performance Evaluation

In this section, we qualitatively and quantitatively analyse the generative performance of CDE-GAN to support our claims. Here, we take original GAN (GAN) [13], non-saturated GAN (NS-GAN) [13], and least square GAN (LSGAN) [25, 26], wasserstein GAN (WGAN) [3], and evolutionary GAN (EGAN) [47] as baselines for comparison and discussion, because these GAN models are closed to our method.

Iv-C1 Qualitative Evaluation

Learning on a Gaussian mixture distribution to evaluate the diversity of GANs is a popular experiment setting, which intuitively reveals GANs whether suffer from mode collapse. When the model suffers from mode collapse problem, it will generate samples only around few of modes. To validate the effectiveness of our proposed method, analogously, we first qualitatively compare CDE-GAN with different baselines on synthesis dataset of 2-D mixture of 8 gaussian mixture distributions. For conducting a fair comparison, we adopt the experimental design proposed in [25, 26], which trains GANs with 3 layers MLP network architecture. Meanwhile, the survived parents number of E-Discriminators () and E-Generators () of CDE-GAN are set as 1, i.e., during each evolutionary step, only the best one candidature is kept. We train each method over 400k generator iterations with same network architecture. As shown in Fig. 3

(a), the dynamic results of data distribution and the Kernel Density Estimation (KDE) plots on different baselines is reported. We can see that all of baselines only generate samples around a few of valid modes of the data distribution, i.e., 6 modes for original GAN and LSGAN, 2 modes for NS-GAN, 8 for E-GAN (parts of modes are weakly covered), which shows that they suffer from mode collapse to a greater or lesser degree. But CDE-GAN can successfully learn the Gaussian mixture distribution for all modes. These experiments demonstrate that the cooperative dual evolutionary strategy well circumvents the mode collapse of GANs training.

Furthermore, we also try 4 layers MLP network architecture to further conduct same experiment to evaluate the stability of CDE-GAN. In Fig. 3(b), The results show that all methods can speed up their convergence speed compared to CED-GAN with MLP of 3 layers. Notably, all baselines tend to generate few of modes, while CDE-GAN are less prone to this problem and achieve better convergence speed. It reveals that CDE-GAN possesses another advantage of architecture robustness.

For the sake of proving the potential power of generative performance of CDE-GAN, we show several samples generated by our proposed model trained on three datasets with resolution in Fig. 4, i.e., CIFAR-10, LSUN-Bedrooms, and CelebA. Note that the presented samples are fair random drawed, not cherry-picked. It can be seen, on CIFAR-10, that our proposed CDE-GAN is capable of generating visually recognizable images of frogs, airplanes, horses, etc. CDE-GAN can produce bedrooms with various styles and different views, and bed and window in the rooms are clear displayed on LSUN-Bedrooms, and it is able to synthesize face images possessing various attributes (e.g., gender, age, expression, and hairstyle). These results confirm the quality and diversity of samples generated by our model.

Fig. 4: Samples generated by our proposed CDE-GAN trained on various natural image datasets. Please see many more results in our project homepage.
Fig. 5: Inception score of different GANs methods on CIFAR-10. is the number of discriminators, is the number of generators, and "with GP" denotes that gradient penalty (GP) is used.

Iv-C2 Quantitative Evaluation

Our qualitative observations above are confirmed by the quantitative evaluations. For the sake of demonstrating the merits of the proposed CDE-GAN over the baseline methods, we train these methods on CIFAR-10 and plot inception scores over the training process with same network architecture. As shown in Fig. 5, CDE-GAN can get higher inception score within 100k generator iterations. After 40k iterations, specifically, CDE-GANs with different setting consistently perform better over all the baselines. Meanwhile, the baseline methods fall in different training problems, e.g., instability at convergence (wgan and lsgan), invalid (GAN). This shows that cooperative dual evolution method is benefiting for adversarial training of GANs.

Methods Inception Score
E-GAN (, without GP) 6.88 0.10
E-GAN (, without GP) 6.71 0.06
E-GAN (, without GP) 6.96 0.09
E-GAN (, without GP) 6.72 0.09
E-GAN (, with GP) 7.13 0.07
E-GAN (, with GP) 7.23 0.08
E-GAN (, with GP) 7.32 0.09
E-GAN (, with GP) 7.34 0.07
(Ours) CDE-GAN (, without GP) 6.85 0.05
(Ours) CDE-GAN (, without GP) 6.93 0.09
(Ours) CDE-GAN (, without GP) 7.06 0.09
(Ours) CDE-GAN (, without GP) 7.35 0.06
(Ours) CDE-GAN (, with GP) 7.05 0.05
(Ours) CDE-GAN (, with GP) 7.18 0.05
(Ours) CDE-GAN (, with GP) 7.48 0.10
(Ours) CDE-GAN (, with GP) 7.51 0.05
TABLE II: Comparison with E-GAN on CIFAR-10 with or Without GP. [47]

Since EGAN [47] is most similar to our method, we further take E-GAN as a baseline to compare with CDE-GAN for stability analysis. In Table II, we take various number of generators for E-GAN and various number of discriminators for CDE-GAN and train each model in 150k generator iterations to conduct experiments. Note that we implement results of E-GAN using the experimental setting of literature [47] and codes provided by authors555https://github.com/WANG-Chaoyue/EvolutionaryGAN-pytorch. Results show that CDE-GAN achieve better performance on CIFAR-10 than E-GAN totally. Furthermore, if E-GAN uses gradient penalty (GP) term during training, its generative performance is greatly improved compared to the results of its original version; while our CDE-GAN just achieves a little of improvement if GP is used. This reveals that E-GAN is unstable during training due to its single evolution, and thus GP term is effective for regularizing the discriminator to provide significant gradients for updating the generators. Benefiting from cooperative dual evolution, CDE-GAN injects diversity into training and thus it can covers different data modes. Furthermore, with the number of discriminator increasing, the balance of generator and discriminators of CDE-GAN is well adjusted and thus it continuously achieves obvious improvement. This shows that multi-objective optimization is effective for stabilizing the training process of CDE-GAN. To this end, the discriminators can continuously provide informative gradient for generator updation, which performs the function of GP.

Fig. 6: Samples generated by different methods on various natural image datasets. The samples generated by different methods are provided by the original literatures, i.e., MAD-GAN [11], Lipizzaner [36], Mustangs [44], Stabilizing-GAN [31], and acGAN [9].
Methods Addit. Superv. Info. Inception Score
Real data 11.24 0.12
GAN-GP [13] 6.93 0.08
DCGAN [34] 6.64 0.14
WGAN-GP [3] 6.68 0.06
SN-GAN [28] 7.42 0.08
GMAN () [10] 6.40 0.19
D2GAN [32] 7.15 0.07
HV() [1] 7.32 0.26
MicroGAN ()[29] 6.77 0.00
MIX+WGAN () [4] 4.36 0.04
MGAN () [18] 8.33 0.10
E-GAN () [47] 7.34 0.07
(Ours) CDE-GAN () 7.51 0.05
TABLE III: Comparison with State-of-the-Art Methods on CIFAR-10. The , , , , and Represent the Numbers of Discriminators or Generators for Different Methods, Addit. Superv. Info. Denotes That Additional Supervised Information is Used by Mehtod. The Best Two Results are Marked with Bold, and Underline.

Iv-D Experiment 3: Comparisons With State-of-the-Art Methods

In this section, we compare CDE-GAN with 16 state-of-the-art GAN methods, including modifying training objective based GANs (GAN [13], DCGAN [34], WGAN [3], and SN-GAN [28]), multi-discriminator based GANs(D2GAN [32], GMAN [10], Stabilizing-GAN [31], acGAN [9], HV [1], and MicroGAN [29]), multi-generator based GANs (MIX+WGAN [4], MGAN [18], MAD-GAN [11]), and evolutionary computation based GANs (Lipizzaner [36], E-GAN [47], and Mustangs [44]).

Specifically, we first report the inception scores obtained by our CDE-GAN and baselines on CIFAR-10 in Table III. Overall, experimental results show that CDE-GAN outperforms almost all baselines on the optimal settings (except for MGAN). The reason is that MGAN takes additional supervised information (generators’ labels) for supporting learning, while CDE-GAN is a completely unsupervised manner. Indeed, this technique is compatible with our method, thus integrating it could be a promising avenue for our future work.

It is worthy to note that CDE-GAN terms adversarial training as a multi-objective optimization problem, when its number of discriminator more than two, i.e., . Thus we further discuss the advantage of CDE-GAN compared with multi-discriminator based GANs. In the second group of Table III, all multi-discriminator based GANs using various numbers of discriminators are inferior to our CDE-GAN, i.e., GMAN (), D2GAN (2 discriminators), HV(), MicroGAN (). This further demonstrates that cooperative dual evolution is effective for multi-objective optimization and adversarial training of GANs.

Finally, we display visual quality comparisons between several state-of-the-art methods (i.e., Stabilizing-GAN [31], acGAN [9], MAD-GAN [11], Lipizzaner [36], and Mustangs [44]) and our method on CelebA. See from Fig. 6, the most of face images generated by Lipizzaner and Mustangs are incompleted or blurred, while our CDE-GAN generates high-fidelity face images. This demonstrates that the proposed cooperative dual evolutionary strategy performs better on adversarial training compared to other evolutionary computation based GANs. Furthermore, CDE-GAN also performs advantages over other GAN methods, i.e., MAD-GAN with 3 generators, Stabilizing-GAN with 24 discriminators, acGAN with 5 discriminators. Note that MAD-GAN trained its model using additional supervised information (generators’ labels) and Stabilizing-GAN used the cropped version of the images of CelebA.

V Discussion

We would like to have more discussion here about the advantages and limitation of the proposed CDE-GAN method. First, we analyze the mechanism of how and why our method can circumvent mode collapse and instability problem of GANs.

  • Compared to single evolution strategy, CDE-GAN injects dual diversity into training benefiting from the cooperative dual evolution. It decompose the complex adversarial optimization problem into two subproblems (i.e., generation and discrimination), and each subproblem is sovled by a seperated subpopulation (i.e., E-Generators and E-Discriminators), evolved by an individual evolutionary algorithm. In this way, the dual evolutionary population injects dual diversity into the training, and thus it can effectively covers different data modes. This significantly mitigates training pathologies of mode collapse for GANs. Experiments in Section IV-C and IV-D intuitively verified this claims.

  • CDE-GAN terms adversarial training as a multi-objective dynamic optimization problem when , thus the multiple discriminators provide informative feedback gradient to generator for stabilizing training process. In ideal, we prefer that generator always has strong gradients from the the discriminator during training. Since the discriminator quickly learns to distinguish real and fake samples, however, the single-objective optimization based GANs make this difficult to ensure. To this end, they cannot provide meaningful error signal to improve the generator thereafter. In contrast, multi-objective optimization simultaneously minimizes the losses provided by different models to favor worse discriminators, thus providing more useful gradients to the generator. The experiments in Section IV-B2 and IV-C2 supported this conclusion.

  • The complementary mutations in E-Generators and E-Discriminators support the model place fair distribution of probability mass across the modes of the data generating distribution. The complementary mutations are helpful for CDE-GAN to evolve in different possible directions during various training stages, which guides CDE-GAN to fairly cover the data distribution. As shown in Fig. 3

    (a), CDE-GAN effectively covers the 8 modes of gaussian distribution, while the compared baselines suffer from mode collapse to a greater or lesser degree. Section

    III-B and III-C provided more theoretical analysis for this merit.

Indeed, CDE-GAN limits in costing more time in each iteration. Theoretically, if we take updating a generator iteration as an training step, CDE-GAN will cost operations for one step (all notations are defined in Algorithm 1, and is set as 1 in this paper). In practice, time consuming of CDE-GANs at each iteration are reported in Table IV. Specifically, to train a CDE-GAN model for images using DCGAN architecture with different numbers of discriminators (), it will cost around , , , and seconds respectively for one generator iteration (excluding generating images for score test) on a single GPU. Since the cooperative dual evolutionary strategy is effective for adversarial training, however, CDE-GAN performed significantly fewer training steps to achieve the same generative performance compared to other baselines (see Fig. 5).

Methods Time/Iteration (Seconds)
CDE-GAN() 0.039 0.000
CDE-GAN() 0.079 0.003
CDE-GAN() 0.158 0.021
CDE-GAN() 0.303 0.002
TABLE IV: Time Consuming of CDE-GANs at Each Generator Iteration.

Vi Conclusion

In this paper, we proposed a novel GAN (CDE-GAN), incorporating cooperative dual evolution with respect to generator(s) and discriminators into a unified evolutionary adversarial frameworks, to circumvent adversarial optimization difficulties of GANs, i.e., mode collapse and instability. Notably, the dual evolution provides a dynamic strategy to generator(s) and discriminators, exploits the complementary properties, and injects dual mutation diversity into learning, which diversify the estimated density in capturing multi-modes and improve generative performance of CDE-GAN. Additionally, we introduced a Soft Mechanism to balance E-Generators and E-Discriminators for conducting effective and stable training. The competitive results on one synthetic dataset (i.e., 2D mixture of 8 Gaussian dataset) and three real-world image datasets (i.e., CIFAR-10, LSUN-Bedrooms, and CelebA), demonstrate the superiority and great potentials of cooperative dual evolution for GANs. Extensive experiments also show that CDE-GAN performs obvious advantages over all the compared state-of-the-art methods.

In the future, we will futher improve CDE-GAN for speeding up learning. Meanwhile, we will also apply evolutionary computation based GANs to other generative tasks, e.g., text synthesis, video prediction.

Vii Acknowledgment

The authors would like to thank Dr. Chaoyue Wang at the School of Computer Science, University of Sydney, for his assistance with coding and theoretical suggestions.

References

  • [1] I. Albuquerque, J. Monteiro, T. Doan, B. Considine, T. H. Falk, and I. Mitliagkas (2019) Multi-objective training of generative adversarial networks with multiple discriminators. In

    Proceedings International Conference on Machine Learning (ICML)

    ,
    pp. 202–211. Cited by: §I, §II-B, §IV-B2, §IV-B2, §IV-D, TABLE III.
  • [2] M. Arjovsky and L. Bottou (2017) Towards principled methods for training generative adversarial networks. In Proceedings International Conference on Learning Representations (ICLR), pp. 1–17. Cited by: §III-B.
  • [3] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein generative adversarial networks. In Proceedings International Conference on Machine Learning (ICML), pp. 214–223. Cited by: §I, §II-A, §IV-C, §IV-D, TABLE III, footnote 2.
  • [4] S. Arora, R. Ge, Y. Liang, T. Ma, and Y. Zhang (2017) Generalization and equilibrium in generative adversarial nets (gans). In Proceedings International Conference on Machine Learning (ICML), pp. 224–232. Cited by: §I, §II-C, §IV-D, TABLE III, footnote 1.
  • [5] D. Berthelot, T. Schumm, and L. Metz (2017) BEGAN: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:abs/1703.10717. Cited by: footnote 2.
  • [6] V. Costa, N. Lourenço, J. Correia, and P. Machado (2019) COEGAN: evaluating the coevolution effect in generative adversarial networks. In Proceedings The Genetic and Evolutionary Computation Conference (GECCO), pp. 374–382. Cited by: §I, §I, §II-D.
  • [7] K. Deb (2001) Multi-objective optimization using evolutionary algorithms. 16. Chichester, U.K.: Wiley. Cited by: §III-A.
  • [8] W. Ding, C. Lin, and Z. Cao (2019) Deep neuro-cognitive co-evolution for fuzzy attribute reduction by quantum leaping pso with nearest-neighbor memeplexes. IEEE Transactions on Cybernetics 49, pp. 2744–2757. Cited by: §I, §III.
  • [9] T. Doan, J. Monteiro, I. Albuquerque, B. Mazoure, A. Durand, J. Pineau, and R. D. Hjelm (2019) Online adaptative curriculum learning for gans. In

    Proceedings AAAI Conference on Artificial Intelligence

    ,
    Cited by: §I, §II-B, §III-C, Fig. 6, §IV-D, §IV-D.
  • [10] I. Durugkar, I. Gemp, and S. Mahadevan (2017) Generative multi-adversarial networks. In Proceedings International Conference on Learning Representations (ICLR), pp. 1–14. Cited by: §I, §II-B, §III-A, §III-C, §IV-B2, §IV-D, TABLE III.
  • [11] A. Ghosh, V. Kulharia, V. P. Namboodiri, P. H. S. Torr, and P. K. Dokania (2018) Multi-agent diverse generative adversarial networks. In

    Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    pp. 8513–8521. Cited by: §I, §II-C, Fig. 6, §IV-D, §IV-D.
  • [12] C. K. Goh and K. C. Tan (2009) A competitive-cooperative coevolutionary paradigm for dynamic multiobjective optimization. IEEE Transactions on Evolutionary Computation 13, pp. 103–127. Cited by: §I, §III.
  • [13] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio (2014) Generative adversarial nets. In Proceedings Advances in Neural Information Processing Systems (NIPS), Cited by: §I, §I, §III-B, §III-B, §III-C, §IV-C, §IV-D, TABLE III.
  • [14] I. J. Goodfellow (2016) NIPS 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160. Cited by: §I.
  • [15] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Proceedings Advances in Neural Information Processing Systems (NIPS), pp. 5767–5777. Cited by: §I, TABLE I, §IV-A.
  • [16] S. He, G. Jia, Z. Zhu, D. A. Tennant, Q. Huang, K. Tang, J. Liu, M. Musolesi, J. K. Heath, and X. Yao (2016) Cooperative co-evolutionary module identification with application to cancer disease module discovery. IEEE Transactions on Evolutionary Computation 20, pp. 874–891. Cited by: §I, §III.
  • [17] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings Advances in Neural Information Processing Systems (NIPS), pp. 6629–6640. Cited by: footnote 4.
  • [18] Q. Hoang, T. D. Nguyen, T. Le, and D. Q. Phung (2018) MGAN: training generative adversarial nets with multiple generators. In Proceedings International Conference on Learning Representations (ICLR), pp. 1–24. Cited by: §I, §II-C, §IV-D, TABLE III.
  • [19] O. Kramer (2016) Machine learning for evolution strategies. Vol. 20, Cham, Switzerland: Springer. Cited by: §III-B.
  • [20] A. Krizhevsky (2009) Learning multiple layers of features from tiny images. Cited by: §IV-A.
  • [21] C. Ledig, L. Theis, F. Huszár, J. A. Caballero, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114. Cited by: §I.
  • [22] Z. Liu, J. Wang, and Z. Liang (2020) CatGAN: category-aware generative adversarial networks with hierarchical evolutionary learning for category text generation. In Proceedings AAAI Conference on Artificial Intelligence, Cited by: §I, §I, §I.
  • [23] Z. Liu, P. Luo, X. Wang, and X. Tang (2015) Deep learning face attributes in the wild. Proceedings IEEE International Conference on Computer Vision (ICCV), pp. 3730–3738. Cited by: §IV-A.
  • [24] X. Lu, S. Menzel, K. Tang, and X. Yao (2018) Cooperative co-evolution-based design optimization: a concurrent engineering perspective. IEEE Transactions on Evolutionary Computation 22, pp. 173–188. Cited by: §I, §III.
  • [25] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley (2017) Least squares generative adversarial networks. In Proceedings IEEE International Conference on Computer Vision (ICCV), pp. 2813–2821. Cited by: §III-B, §III-B, TABLE I, §IV-A, §IV-C1, §IV-C.
  • [26] X. Mao, Q. Li, H. Xie, Z. Wang, R. Y. K. Lau, and S. P. Smolley (2019) On the effectiveness of least squares generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, pp. 2947–2960. Cited by: §III-B, §III-C, TABLE I, §IV-A, §IV-C1, §IV-C.
  • [27] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein (2017) Unrolled generative adversarial networks. In Proceedings International Conference on Learning Representations (ICLR), pp. 1–25. Cited by: §I, §II-A.
  • [28] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida (2018) Spectral normalization for generative adversarial networks. In Proceedings International Conference on Learning Representations (ICLR), pp. 1–26. Cited by: §I, §II-A, §IV-D, TABLE III.
  • [29] G. Mordido, H. Yang, and C. Meinel (2020) MicrobatchGAN: stimulating diversity with multi-adversarial discrimination. In Proceedings Winter Conference on Applications of Computer Vision (WACV), pp. 3050–3059. Cited by: §IV-B2, §IV-D, TABLE III.
  • [30] V. Nagarajan and J. Z. Kolter (2017) Gradient descent gan optimization is locally stable. In Proceedings Advances in Neural Information Processing Systems (NIPS), pp. 5591–5600. Cited by: §III-B.
  • [31] B. Neyshabur, S. Bhojanapalli, and A. Chakrabarti (2017) Stabilizing gan training with multiple random projections. arXiv preprint arXiv:1705.07831. Cited by: §I, §II-B, §III-C, Fig. 6, §IV-D, §IV-D, footnote 2.
  • [32] T. D. Nguyen, T. Le, H. Vu, and D. Q. Phung (2017) Dual discriminator generative adversarial nets. In Proceedings Advances in Neural Information Processing Systems (NIPS), pp. 2670–2680. Cited by: §I, §II-B, §III-C, §IV-D, TABLE III.
  • [33] M. N. Omidvar, X. Li, Y. Mei, and X. Yao (2014) Cooperative co-evolution with differential grouping for large scale optimization. IEEE Transactions on Evolutionary Computation 18, pp. 378–393. Cited by: §I, §III.
  • [34] A. Radford, L. Metz, and S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings International Conference on Learning Representations (ICLR), pp. 1–15. Cited by: §II-A, TABLE I, §IV-A, §IV-D, TABLE III.
  • [35] T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Proceedings Advances in Neural Information Processing Systems (NIPS), pp. 2234–2242. Cited by: §IV-A, footnote 3.
  • [36] T. Schmiedlechner, A. Al-Dujaili, E. Hemberg, and U. O’Reilly (2018) Towards distributed coevolutionary gans. In Proceedings AAAI 2018 Fall Symposium, pp. 1–6. Cited by: §I, §I, §I, §II-D, Fig. 6, §IV-D, §IV-D.
  • [37] Y. Sun, H. Wang, B. Xue, Y. Jin, G. G. Yen, and M. Zhang (2020)

    Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor

    .
    IEEE Transactions on Evolutionary Computation 24, pp. 350–364. Cited by: §I, §II-D.
  • [38] Y. Sun, B. Xue, G. G. Yen, and M. Zhang (2020)

    Automatically designing cnn architectures using genetic algorithm for image classification

    .
    IEEE Transactions on cybernetics. Cited by: §I.
  • [39] Y. Sun, B. Xue, G. G. Yen, and M. Zhang (2020)

    Evolving deep convolutional neural networks for image classification

    .
    IEEE Transactions on Evolutionary Computation 24, pp. 394–407. Cited by: §I, §II-D.
  • [40] Y. Sun, B. Xue, M. Zhang, and G. G. Yen (2019)

    A particle swarm optimization-based flexible convolutional autoencoder for image classification

    .
    IEEE Transactions on Neural Networks and Learning Systems 30, pp. 2295–2309. Cited by: §I, §II-D.
  • [41] Y. Sun, B. Xue, M. Zhang, and G. G. Yen (2020) Completely automated cnn architecture design based on blocks. IEEE Transactions on Neural Networks and Learning Systems 31, pp. 1242–1254. Cited by: §I.
  • [42] Y. Sun, G. G. Yen, and Z. Yi (2019) Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Transactions on Evolutionary Computation 23, pp. 89–103. Cited by: §I, §II-D.
  • [43] I. O. Tolstikhin, S. Gelly, O. Bousquet, C. Simon-Gabriel, and B. Schölkopf (2017) AdaGAN: boosting generative models. In Proceedings Advances in Neural Information Processing Systems (NIPS), pp. 5424–5433. Cited by: §I, §II-C.
  • [44] J. Toutouh, E. Hemberg, and U. O’Reilly (2019) Spatial evolutionary generative adversarial networks. In Proceedings The Genetic and Evolutionary Computation Conference (GECCO), pp. 472–480. Cited by: §I, §I, §I, §II-D, Fig. 6, §IV-D, §IV-D.
  • [45] J. Toutouh, E. Hemberg, and U. O’Reilly (2020) Re-purposing heterogeneous generative ensembles with evolutionary computation. In Proceedings The Genetic and Evolutionary Computation Conference (GECCO), pp. 425–434. Cited by: §I, §II-D.
  • [46] S. Tulyakov, M. Liu, X. Yang, and J. Kautz (2018) MoCoGAN: Decomposing Motion and Content for Video Generation. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1526–1535. Cited by: §I.
  • [47] C. Wang, C. Xu, X. Yao, and D. Tao (2019) Evolutionary generative adversarial networks. IEEE Transactions on Evolutionary Computation 23, pp. 921–934. Cited by: §I, §I, §I, §II-D, §III-B, §III-B, §III-C, TABLE I, §IV-A, §IV-B1, §IV-C2, §IV-C, §IV-D, TABLE II, TABLE III.
  • [48] D. Warde-Farley and Y. Bengio (2017) Improving generative adversarial networks with denoising feature matching. In Proceedings International Conference on Learning Representations (ICLR), pp. 1–11. Cited by: §I, §II-A.
  • [49] J. Xie, Y. Lu, R. Gao, S. Zhu, and Y. Wu (2020) Cooperative training of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, pp. 27–45. Cited by: §I.
  • [50] F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365. Cited by: §IV-A.
  • [51] L. Yu, W. Zhang, J. Wang, and Y. Yu (2017) SeqGAN: sequence generative adversarial nets with policy gradient. In Proceedings AAAI Conference on Artificial Intelligence, Cited by: §I.
  • [52] Q. Zhang, S. Yang, S. Jiang, R. Wang, and X. Li (2020) Novel prediction strategies for dynamic multiobjective optimization. IEEE Transactions on Evolutionary Computation 24, pp. 260–274. Cited by: §I, §III.
  • [53] X. Y. Zhang, Y. Gong, Y. Lin, J. Zhang, S. Kwong, and J. Zhang (2019) Dynamic cooperative coevolution for large scale optimization. IEEE Transactions on Evolutionary Computation 23, pp. 935–948. Cited by: §I, §III.
  • [54] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017)

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    .
    In Proceedings IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251. Cited by: §I.
  • [55] J. Zou, Q. Li, S. Yang, J. Zheng, Z. Peng, and T. Pei (2019) A dynamic multiobjective evolutionary algorithm based on a dynamic evolutionary environment model. Swarm Evolution Computation 44, pp. 247–259. Cited by: §I, §III.