Generative adversarial networks (GAN) (Goodfellow et al., 2014)
gained relevance for presenting impressive results, mainly for image synthesis in the field of computer vision. A GAN combines two components, a discriminator and a generator, trained as adversaries in an algorithm designed to minimize a previously defined cost function. The generative component is trained without the direct awareness from the data distribution they are trying to capture. Hence, the discriminator learns to distinguish between fake samples and the real input data and the generator learns to synthesize samples based on the input dataset.
Recently, GANs were improved to generate high-resolution images in large-scale datasets (Karras et al., 2018; Brock et al., 2018). However, there are still open problems regarding the training of GANs. Vanishing gradient and the mode collapse are the most common issues, making the training of GANs hard. There are strategies to minimize these problems, but they remain fundamentally unsolved (Gulrajani et al., 2017; Salimans et al., 2016).
In a GAN, the discriminator and the generator are deep neural networks that should have architectures defined previously. In this case, the topology and hyperparameters are usually empirically chosen, thus spending human time in repetitive tasks such as fine-tuning. However, there are approaches that can automatize the design of neural network architectures.
Neuroevolution is a technique that applies evolutionary algorithms to provide the automatic design of neural networks. In neuroevolution, both the network architecture (e.g., topology, hyperparameters and the optimization method) and the parameters (e.g., weights) can be evolved. NeuroEvolution of Augmenting Topologies (NEAT)(Stanley and Miikkulainen, 2002) is a well-known algorithm that evolves the weights and topologies of neural networks. NEAT was also successfully applied in a coevolution context (Stanley and Miikkulainen, 2004). The NEAT model was also expanded to work on larger search spaces, such as deep neural networks, in the DeepNEAT (Miikkulainen et al., 2017) method.
This paper presents a model called coevolutionary generative adversarial networks (COEGAN), first proposed in (Costa et al., 2019), that combines neuroevolution and coevolution in the coordination of the GAN training algorithm. Our evolutionary algorithm is based on the approach used on DeepNEAT. We extended and adapted DeepNEAT to work on the context of GANs, making use of the competitive characteristic between the generator and discriminator to apply a coevolution model. Hence, each subpopulation of generators and discriminators evolve following its own evolutionary path. To validate our model, experiments were conducted using MNIST (LeCun, 1998) and Fashion-MNIST (Xiao et al., 2017) as input datasets for the discriminator component. We show that our model is better than a random search to discover architectures. A comparison was made of our model with a reference architecture based on DCGAN (Radford et al., 2015). We also show that the training stability is improved and our results are better when compared with manually designed networks with similar power 111Code available at https://github.com/vfcosta/coegan..
This section introduces the concepts of evolutionary algorithms and generative adversarial networks employed in this paper and presents works related to the proposed model.
2.1. Evolutionary Algorithms
Evolutionary algorithms (EAs) are a family of algorithms inspired by biological evolution, simulating the evolutionary mechanism found in nature (Sims, 1994). There are several variations and applications related to evolutionary algorithms proposed to solve a diverse variation of problems. In this context, neuroevolution was proposed to apply evolutionary algorithms to evolve neural networks (Stanley and Miikkulainen, 2002). Neuroevolution can be used to evolve weights, topology and hyperparameters of a neural network. In this paper, we are particularly interested in the use of neuroevolution to automate the design of the network architecture and its parameters. This automation is even more relevant for bigger models such as deep neural networks, which produces large search spaces (Assunção et al., 2018; Miikkulainen et al., 2017).
NeuroEvolution of Augmenting Topologies (NEAT) is a well-known method used to evolve the topology and weights of neural networks. NEAT encodes in the genotype the structure and weights of the neural network. The genes represent neurons and connections between them, including the weights used in the transformation to the phenotype (i.e., the resulting neural network). The evolution occurs through mutation and crossover. The growth strategy follows a complexifying mechanism where the genome starts small and gradually grows with the generations. In NEAT, not only the final architecture is important, but the intermediary solutions also contribute to the final solution, since the weights are transferred through generations(Stanley and Miikkulainen, 2002). DeepNEAT (Miikkulainen et al., 2017) was proposed to extend the NEAT model to deep neural networks. In DeepNEAT, each gene in the genotype represents an entire layer of the neural network. This approach makes it possible to discover deeper models.
Coevolution is the simultaneous evolution of at least two distinct species (Hillis, 1990; Rawal et al., 2010). In (Stanley and Miikkulainen, 2004), NEAT was applied in a competitive coevolution environment. In competitive coevolution, individuals of two or more species are competing between them. Therefore, the fitness function represents the competition in order to represent a score that is inversely related between different species (Stanley and Miikkulainen, 2004; Sims, 1994; Rawal et al., 2010).
2.2. Generative Adversarial Networks
Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) became relevant for the performance achieved in generative tasks, mainly in the field of computer vision. A GAN combines a discriminator and a generator , trained simultaneously as adversaries, to create strong generators and discriminators. The discriminator aims to distinguish between real data and fake samples, given an input distribution. The generator has the objective to outputs fake samples to deceive the discriminator, capturing the data distribution used as input for .
The loss function of the discriminator is defined as following:
For the generator, the non-saturating version of the loss function is defined by:
Vanishing gradient and mode collapse are the most common problems regarding the training stability in GANs. The vanishing gradient occurs when the discriminator became powerful enough to not be fooled by the generator anymore, avoiding the gradient to flow through the generator. This causes the whole training progress to stagnate. On the other hand, the mode collapse problem makes the generator to capture only a portion of the input distribution. Several approaches tried to minimize those problems, but they remain unsolved (Salimans et al., 2016; Gulrajani et al., 2017).
Distinct variations of the original GAN were proposed in order to improve the stability and the performance of the model, such as WGAN (Arjovsky et al., 2017) and LSGAN (Mao et al., 2017). However, a study found no empirical evidence that those proposals are superior to the original GAN (Lucic et al., 2017).
Other models propose an expansion to the original GAN proposal, modifying aspects of the training algorithm. The method described by (Karras et al., 2018) uses a progressive strategy to progressively grow a GAN during the training phase. This mechanism increases the number of layers in the discriminator and generator as the training proceeds, augmenting the resolution of images at each phase. The model proposed in (Karras et al., 2018) evolves in a preconfigured way during the training, without the use of an evolutionary algorithm to guide this process. Therefore, we can consider this predefined progression in the number of layers as a first step towards the evolution of generative adversarial models.
Recently, a model was proposed to use evolutionary algorithms in GANs (Wang et al., 2018). Their approach used a simple model to evolve GANs, using a mutation operator that can switch only the loss function of the individuals. They also use very small populations of individuals, only to capture the possibilities of losses predefined in the definition of the model. Our proposal differs from them by modeling the GAN as a coevolution problem. Besides, in our case, the evolution does not take into account the loss function and occurs only in the network architecture. Coevolution was also used to train GANs in (Schmiedlechner et al., 2018). In this case, the training method focused on the evolution of weights and does not evolve the network architecture.
In (Costa et al., 2019), a model called Coevolutionary Generative Adversarial Networks (COEGAN) was proposed. In this model, neuroevolution and coevolution are used in the coordination of the training algorithm. So, COEGAN extended and adapted the approach of DeepNEAT (Miikkulainen et al., 2017) to the context of GANs in a coevolution environment.
The genome of the COEGAN model is represented as an array of genes. The phenotype transformation maps this array into a sequence of layers in a deep neural network. Three types of genes are used in our method: linear, convolution and transpose convolution. Each gene has an activation function, randomly chosen from the following set: ReLU, LeakyReLU, ELU, Sigmoid and Tanh.
The convolution and transpose convolution layers only have the number of output channels as a random parameter. The stride and kernel size are dynamically calculated based on the requirements of the output size of each layer. In addition, the number of input channels is also dynamically calculated, based on the previous layer. The linear layer only has the number of output features as the random parameter. The number of input features is defined based on the setup of the previous layer. Therefore, genes belonging to a genome have only the activation function, output features and output channels subject to the mutation operation.
Figure 1 and Figure 2 represent examples of the genotype of a discriminator and a generator, respectively. In Figure 1, the genotype contains a convolutional section (defined by the Conv2d layer) and is followed by a linear section. In Figure 2
, the genotype contains a linear section followed by a transpose convolutional section (expressed by the Deconv2d layer). Following the classical GAN approach, the output of discriminators is the probability of the input sample be a real sample, i.e., a sample drawn from the input dataset. On the other hand, the generator output is a fake sample, with the same characteristics (i.e., dimension and channels) of a real sample.
In COEGAN, the population is composed of two subpopulations of generators and discriminators . Inside each subpopulation, a speciation mechanism inspired on the strategy used in NEAT is applied in order to promote innovation in each subpopulation. This mechanism ensures that individuals with new layers will have the chance to survive long enough to be as powerful as individuals from previous generations.
For the current proposal of COEGAN, we are only interested in the evolution of the neural network architecture. The number of parameters internal to the neural network, such as weights and bias, are too large and evolving them will increase the computational complexity. Therefore, the parameters of the resulting neural networks will be trained by the gradient descent method and will not be part of the evolution.
The fitness used in discriminators is based on the loss obtained from the regular GAN, i.e., the fitness is equivalent to Eq. 1. For generators, preliminary experiments demonstrated that using the loss (Eq. 2) does not represent a good measure for quality in this case. The loss for generators is not stable enough during the training, making it not suitable to be used as selection criteria in COEGAN.
Therefore, the Fréchet Inception Distance (FID) (Heusel et al., 2017) was used in COEGAN as the fitness for generators. FID is becoming the standard measurement to compare the performance of the generative component of GANs (Lucic et al., 2017), having a better representation of diversity and quality than other metrics, such as the Inception Score (Salimans et al., 2016). Using the FID score we put selection pressure in generators and direct the evolution of the population towards strong generators with respect to this metric. In FID, a hidden layer of Inception Net (Szegedy et al., 2016)
(trained on ImageNet(Russakovsky et al., 2015)
) is used in the transformation of images into the feature space, interpreted as a continuous multivariate Gaussian. This transformation is applied to a subset of the real dataset and fake samples created by the generator. The mean and covariance of the two resulting Gaussians are estimated and the Fréchet distance between these Gaussians is given by:
3.2. Variation Operator
The variation operator used in COEGAN to breed new individuals is the mutation operator. We also experimented with a crossover operator, but the results indicated that this brings too much instability into the system. Thus, we choose to keep only the mutation for the COEGAN proposal.
The mutation consists of the following kinds of operations: add a new layer, remove a layer, and change an existing layer. The addition operator adds a new layer into the genotype. This new layer is randomly drawn from a set of possible layers: linear and convolution for discriminators; linear and transpose convolution for generators.
The remove operation randomly chooses an existing layer and excludes it from the genotype. On the other hand, the change operation modifies the attributes and the activation function of an existing layer. In this case, the activation function is randomly chosen from the set of possibilities listed before. Furthermore, specific attributes for layers can also be changed. For the dense and convolution layers, the number of output features and the number of output channels can be mutated, respectively. The mutation of these attributes follows a uniform distribution, delimited by a predefined range.
In the breeding process, the parameters (weights and bias) are copied when the genes involved in the mutation are compatible. So, the new individual will keep the training information from the previous generation. However, when the specific attributes of a linear or convolution layer change, the trained parameters are not copied and the layer will be trained from the beginning. This is caused by the change in the shape of weights, making them incompatible with the new layer.
3.3. Pairing Strategy
In a competitive coevolution environment, discriminators and generators must be paired to calculate the fitness for individuals. In this context, several approaches can be used to pair individuals, such as all vs. all, random, and all vs. best (Sims, 1994). The all vs. all approach pairs each discriminator with each generator to calculate the fitness for each individual. In this case, the fitness for discriminators will be the average of the losses obtained by each training pair. The random approach randomly pairs individuals from the discriminator and generator populations. In the all vs. best strategy, each individual in one population is paired with the best individual from the other population.
Preliminary experiments indicated that the all vs. all strategy is the most stable for COEGAN. This strategy improves the variability of the environment for both discriminators and generators during the training, helping to avoid common problems in the training of GANs. The trade-off is the complexity of this strategy in respect to the execution time.
The selection phase of COEGAN is based on the original proposal of NEAT (Stanley and Miikkulainen, 2002). The population of generators and discriminators are divided into species, which contains individuals with similar network structures. The similarity criterion between individuals is based on the parameters of the genome that are related to the evolutionary algorithm. Thus, we do not consider the weights and bias of each layer in this calculation.
The criterion is represented by the distance between two genomes and , defined as the number of genes that exist exclusively in or . The species are grouped based on the distance and a threshold . The parameter is adjusted automatically by the COEGAN algorithm to fit a predefined number of species. Tournament was also applied to select the best individuals inside each species.
To validate the performance of our method, we experiment COEGAN with the MNIST (LeCun, 1998) and Fashion-MNIST (Xiao et al., 2017) datasets. We evaluate COEGAN against a random search method and a reference architecture based on DCGAN. The random search method is similar to COEGAN, but instead of the fitness described in Section 3.1, we use a random method to represent the fitness of individuals in the population. All other characteristics of the random method, such as the pairing strategy, remain the same as used in COEGAN. The DCGAN model is a well-defined set of architectural constraints, used as reference in several works related to evaluations of GANs (Lucic et al., 2017; Karras et al., 2018). We follow this approach to build a reference architecture (based on DCGAN) to compare our results with commonly used models in the context of GANs.
4.1. Experimental Setup
Table 1 presents the parameters used in our experiments. The number of generations used in all experiments is . We used 10 individuals for each population of generators and discriminators. These evolutionary parameters do not apply to the DCGAN experiments as DCGAN is not trained by an evolutionary algorithm. In order to limit the computational resources used in our experiments, the size of the genome was limited to six layers. To emulate a network with similar power, the DCGAN architecture used in the experiments also contains six layers. We use three species for each population of generators and discriminators, which allow an average of individuals per species. We empirically defined a probability of 20%, 10% and 10% for the add, remove and change mutations, respectively. A higher probability for these mutations causes the premature convergence of the system, leading to performance issues and instability on the GAN training process. Hence, the probability rates were kept low but sufficient to create diversity in the population through generations.
For each pair of , , batches were executed per generation, with the batch size of . Therefore, in our scenario of a population composed of generators and discriminators with the all vs. all pairing strategy, each individual will execute batches per generation. The DCGAN experiment is not an evolutionary algorithm and contains only one discriminator and one generator. In this case, we set the number of batches to to keep it comparable with COEGAN and the random search method. The optimizer used in the training method was Adam (Kingma and Ba, 2015) with a learning rate of .
|Number of generations||50|
|Population size (generators)||10|
|Population size (discriminators)||10|
|Add Layer rate||20%|
|Remove Layer rate||10%|
|Change Layer rate||10%|
|Output features range||[32, 1024]|
|Output channels range||[16, 128]|
|Root mean squared error samples||1000|
|Batches per generation||20|
To compare the results of COEGAN, the random search method and the DCGAN based network, we use the FID score (Heusel et al., 2017), Inception score (Salimans et al., 2016) and the root mean squared error. The root mean squared error is calculated between samples created by the generator and real samples randomly drawn from the input dataset.
All figures in this section contain plots with curves representing the average of the results from
repeated executions, with a confidence interval of.
Figure 3 displays the average of losses for the best generator and discriminator found for COEGAN in each generation. As stated in Section 3.1, this figure indicates that the use of the loss function as the fitness for generators is not a good metric to assess the performance of an individual. We can see the value of the loss increases with generations as well as some instability in the values.
Figure 4 presents the average progression of layers in the genome of individuals belonging to the population of generators and discriminators. The number of layers gradually increases with generations, demonstrating that the speciation mechanism used in COEGAN protects the innovation and creates a propitious environment for individuals with more layers.
Figure 5 displays the average number of times a gene was reused during the training process on the MNIST dataset. The results for this metric proves the information is kept through generations described in Section 3.2.
The root mean squared error is displayed in Figure 6, comparing the results for COEGAN, the random method and DCGAN. We introduced this metric to ensure that the samples created by generators are different from the data contained in the input dataset. Thus, Figure 6 indicates that all methods create some innovation in the new samples, with the DCGAN method being better for this metric.
Figure 7 shows the average of the Inception Score (higher is better) for generators in COEGAN, the random method and DCGAN. For this metric, the DCGAN provides the best results. However, COEGAN is better than the random approach, demonstrating that our choice for fitness is relevant for the evolutionary algorithm proposed in this paper.
The Fréchet Inception Distance (FID) (Heusel et al., 2017) of the generators in COEGAN, the random method and DCGAN are displayed in Figure 8 (lower is better). We can see that the FID for COEGAN is better than the results of the random method and DCGAN. Moreover, the random method displayed a lot of variability in the FID results, mainly caused by the stochastic process introduced by this approach. The study made in (Lucic et al., 2017) found that the FID score is a better representation of diversity and quality of generated samples when compared to real samples. Thus, based on this study and the results displayed in Figure 8, the best generator found in COEGAN outperforms the generator in the reference architecture based on DCGAN.
Figure 9 contains samples generated by the best generator found in COEGAN trained with the MNIST dataset after
generations. We can observe a good representation of the MNIST dataset in the generated samples. We found no evidence of the vanishing gradient and the mode collapse problem in all executions of COEGAN. As individuals with these issues perform worse than others, they will eventually not be selected by the evolutionary algorithm, preventing these problems to persist through generations. Furthermore, a diverse population of generators and discriminators can increase the variability provided in the training process when compared to a regular GAN. This variation contributes to a stronger training algorithm, preventing the mode collapse and the vanishing gradient problems.
Figures 10 and 11 represent the best architecture found by COEGAN after generations. Both architectures are composed by a combination of linear and convolutional layers (represented in the images by Conv2d and Deconv2d). It is relevant to note that not only the final architecture is important but also the process to construct the final models because of the mechanism of transference of the learned weights through generations. Therefore, COEGAN found models for the generator and the discriminator with less layers than the reference architecture based on DCGAN, but with better performance with respect to the FID metric.
The same methodology to assess the performance of COEGAN (used with the MNIST dataset in Section 4.2.1) was applied with the Fashion-MNIST dataset. Therefore, Figures 12, 13, 14, 15, 16 and 17 present results with similar characteristics of the previous results on the MNIST dataset. As the Fashion-MNIST dataset is slightly more complex than MNIST, we can conclude that our method can be applied in more elaborated datasets. However, experiments with larger datasets such as CelebA (Liu et al., 2015) and CIFAR-10 (Krizhevsky and Hinton, 2009) should be conducted to support this statement.
As in the MNIST results, we can see in Figure 17 that the FID score for COEGAN outperforms the other methods. The Inception Score is still better for DCGAN on the Fashion-MNIST dataset. The progression in the number of layers, presented in Figure 13 is still similar to the MNIST results.
Figure 18 contains samples generated by the best generator found in COEGAN trained with the Fashion-MNIST dataset after generations. We can see a variety of images being generated, following the distribution of the input dataset. As in the results with the MNIST dataset, we also found no evidence of the vanishing gradient and the mode collapse problem in all executions.
Generative adversarial networks (GAN) gained relevance for presenting impressive results in the field of computer vision. However, stability problems such as the vanishing gradient and the mode collapse problem make the training of a GAN a difficult task.
We present in this paper a model called COEGAN, first proposed in (Costa et al., 2019), which uses neuroevolution and coevolution in the coordination of the GAN training process. COEGAN makes use of the adversarial characteristics of a GAN to apply a coevolution environment. The model was designed with inspiration on NEAT (Stanley and Miikkulainen, 2004) and DeepNeat (Miikkulainen et al., 2017), and also on recent advances in GANs, such as (Karras et al., 2018).
In this paper, we presented experiments made with the MNIST and Fashion-MNIST datasets to assess the efficiency of COEGAN. We found no evidence of the vanishing gradient and the mode collapse problem in all executions of the experiments with COEGAN in both MNIST and Fashion-MNIST datasets. The selection process and the variation introduced by a diverse population of generators and discriminators contributed to preventing these issues. Thus, COEGAN presented a more stable training solution than regular GANs. We compare our results with a random search method and also with a reference architecture based on DCGAN. The results displayed that COEGAN achieved a FID better than DCGAN and the random method for both datasets. However, the Inception Score of the DCGAN model was better than COEGAN. The Inception Score is a metric that has issues to represent the diversity and quality of the samples, being gradually replaced by the FID score in the analysis of the quality of GANs. We also show that COEGAN is better than a random search model, demonstrating the efficiency of the evolutionary algorithm proposed in this paper. It is also important to note that COEGAN discovered models for generators and discriminators with less layers than the DCGAN used in our experiments. Therefore, the evolutionary process that leads to the final models in COEGAN is also relevant to the performance of the models, mainly because of the mechanism of weights transference through generations.
As future works, we will apply the same experiments in larger datasets, such as CelebA (Liu et al., 2015) and CIFAR-10 (Krizhevsky and Hinton, 2009). We will also expand the parameters used in the experiments in this paper to enable the generator of larger networks. Thus, a larger population of generators and discriminators can be used with a larger limit in the number of genes in the genome.
Wasserstein generative adversarial networks.
International Conference on Machine Learning, pp. 214–223. Cited by: §2.2.
Evolving the topology of large scale deep neural networks.
European Conference on Genetic Programming, pp. 19–34. Cited by: §2.1.
- Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096. Cited by: §1.
- Coevolution of generative adversarial networks. In International Conference on the Applications of Evolutionary Computation, Cited by: §1, §3, §5.
- Generative adversarial nets. In NIPS, Cited by: §1, §2.2.
- Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pp. 5769–5779. Cited by: §1, §2.2.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6629–6640. Cited by: §3.1, §4.2.1, §4.2.
- Co-evolving parasites improve simulated evolution as an optimization procedure. Physica D: Nonlinear Phenomena 42 (1-3), pp. 228–234. Cited by: §2.1.
- Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations, Cited by: §1, §2.2, §4, §5.
- Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §4.1.
- Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §4.2.2, §5.
The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Cited by: §1, §4.
- Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738. Cited by: §4.2.2, §5.
- Are gans created equal? a large-scale study. arXiv preprint arXiv:1711.10337. Cited by: §2.2, §3.1, §4.2.1, §4.
- Least squares generative adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2813–2821. Cited by: §2.2.
- Evolving deep neural networks. arXiv preprint arXiv:1703.00548. Cited by: §1, §2.1, §2.1, §3, §5.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §1.
- Constructing competitive and cooperative agent behavior using coevolution. In Computational Intelligence and Games (CIG), 2010 IEEE Symposium on, pp. 107–114. Cited by: §2.1.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115 (3), pp. 211–252. Cited by: §3.1.
- Improved techniques for training gans. In Advances in Neural Information Processing Systems, pp. 2234–2242. Cited by: §1, §2.2, §3.1, §4.2.
- Towards distributed coevolutionary gans. arXiv preprint arXiv:1807.08194. Cited by: §2.2.
- Evolving 3d morphology and behavior by competition. Artificial life 1 (4), pp. 353–372. Cited by: §2.1, §2.1, §3.3.
- Evolving neural networks through augmenting topologies. Evolutionary computation 10 (2), pp. 99–127. Cited by: §1, §2.1, §2.1, §3.4.
Competitive coevolution through evolutionary complexification.
Journal of Artificial Intelligence Research21, pp. 63–100. Cited by: §1, §2.1, §5.
Rethinking the inception architecture for computer vision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. Cited by: §3.1.
- Evolutionary generative adversarial networks. arXiv preprint arXiv:1803.00657. Cited by: §2.2.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §1, §4.