1 Introduction
Generative Adversarial Networks (GANs) have attracted much research interest since its introduction [1] with a wide range of applications, such as image generation, texttoimage synthesis, style transfer [2, 3, 4]
among many others. GAN learns a target distribution by involving two deep neural networks, namely the generator G and the discriminator D, in a minimax game. The generator G aims to generate samples that resemble real samples from the target distribution, while the discriminator D aims to distinguish the generated samples from the real samples. The model is then trained with simultaneous SGD until a Nash equilibrium is achieved.
Due to the process of minimax optimization and simultaneous SGD, GAN is known to suffer from training instabilities. To mitigate the issue, a string of works focus on the choice of GAN objective function. Notably, in Wasserstein GAN [5], the authors propose to minimize the Wasserstein distance between the model and target distributions, instead of the original JensenShannon divergence. In LSGAN [6], the authors consider a least square loss which corresponds to minimizing the Pearson divergence between the distributions. In GAN [7], the authors show that any divergence can be used for GAN objective. Another line of works focus on regularization and normalization techniques, especially the Lipschitz continuity of the discriminator and the conditioning of the generator [8]. Prominent examples include gradient penalty [9] which penalizes the model when the gradient norm moves away from , and spectral normalization [10]
which normalizes the largest singular value by layer using power iteration method.
Different from existing approaches, we investigate the direction of automating the design of neural architectures to stabilize GAN training and improve performance. There are empirical evidences [9, 10] suggesting that generator and discriminator architectures may have impacts on the stability of GAN training, and hence quality and diversity of images generated by GAN. Despite those early evidences, we observe that DCGANstyle [11] and ResNet [9] architectures are by far the most prevailing architectures in the GAN literature. Such architectures are built upon highly successful modules used primarily in discriminative tasks, and their optimality in generative model construction is questionable.
Neural architecture search (NAS) has emerged as a promising research direction in recent years. On benchmark data sets including Penn Treebank, CIFAR10 and ImageNet, NAS algorithms are proven to be capable of designing architectures that rival or even outperform the best humaninvented architectures
[12, 13, 14]. The direct application of NAS to GAN architecture design is, however, nontrivial, due to at least two factors. First, the generator of GAN consists of upsampling modules which are almost never used in any image classifications. Typical image classifications only use downsampling modules and hence we could not borrow experience from wellstudied NAS directly. Second, architectures of GAN have been much less explored. Comparing to traditional NAS application of image classification we hereby aim at searching through a large variety of topological structures with less human prior knowledge imposed. To the best of our knowledge, we are the first group that aims to perform automated architecture design of deep generative models.In order to design an automated neural architecture search, we used reinforcement learning. In our algorithm we used an RNN module to encode the architectures for the upsampling, downsampling, and normal modules in GAN. We carefully crafted the search space and proposed a new form of reward shaping functions so that the algorithm is guided faster towards promising architectures. We have performed comprehensive experimental study to evaluate architecture novelty, their performance, and the transferability of the identified GAN architectures.
In sum, our main contributions are described below.

We presented the first automated neural architecture search algorithm, AGAN, that is specifically designed for the optimization of neural network architectures in deep generative models.

We have identified novel, modularized architectures, AGANA, AGANB, and AGANC with distinct architectures.

In our comprehensive experimental study we found that AGANA, AGANB, and AGANC have comparable performance to the best GAN models designed by humanexperts. In addition AGANC outperforms the stateoftheart models under same regularization techniques for unsupervised image generation tasks on CIFAR10.

We empirically evaluated and confirmed that the modules learned by AGAN are transferable to other data sets such as STL10.
The rest of the paper is organized with the following sections. In Section 2, we present an overview of GAN and NAS. We discuss our methodology in Section 3 and present our experimental evaluation of AGAN in Section 4. In Section 5 we provide a brief discussion of the differences that we observed between our search and traditional NAS search and conclude there.
2 Related Work
Equipped with multilayer perceptrons as generator and discriminator, the original GAN
[1] can successfully learn the data distribution of MNIST, but fails at more complicated image generation tasks. In DCGAN [11], the authors propose a novel class of CNNs as generator and discriminator, together with a set of architecture guidelines for stable convolutional GAN training. Most notably, in generator, the spatial activation size is doubled every layer while the number of output channels is halved; the discriminator much resembles the reverse of generator. Gulrajani et al. [9] propose a ResNet architecture for GAN on CIFAR10. In particular, the residual blocks in the generator perform nearestneighbor upsampling before the second convolution while some blocks in the discriminator perform average pooling after the second convolution. Many later GAN models are built upon DCGANstyle or ResNet architecture, such as SNGAN [10], SAGAN [15] and BigGAN [16]. In SAGAN, the authors propose a selfattention layer that models the nonlocal dependency between highresolution and lowresolution feature maps. They also make a minor modification of the discriminator by altering the number of hidden layer output channels in residual blocks. In BigGAN, the authors introduce further architectural change including shared class embedding and skip connections in latent variable .Concerning the architecture design of GAN, a particular line of works focus on how to make use of label information to improve the performance of GAN. In classic conditional GAN [17]
framework, label information is concatenated to the input or hidden representations to model a conditional distribution. Miyato et al.
[18] propose to use projection based way to incorporate label information into the discriminator. On the other hand, De Vries et al. [19]introduce Conditional Batch Normalization to visual question answering tasks, which learns a scale and a shift for each class label. Conditional Batch Norm is widely used in image generation models
[10, 15, 16] to provide label information for the generator of GAN.Neural architecture search algorithm requires an objective, quantitative metric to measure the performance of the underlying models. In case of GAN, wellstudied methods such as kernel density estimation (KDE, or Parzen window estimation) have been questioned as a suitable indicator of visual fidelity of generated images
[20]. Inception Score (IS) [21]and Frechet Inception Distance (FID) are arguably the most popular evaluation metrics in the literature. IS uses a pretrained Google Inception model
[22]to classify generated samples. It is defined as
where is the generated distribution, is the conditional label distribution through the Inception model, and is marginal of over . Similarly FID uses Google Inception model as a feature extractor and computes the distance between the real distribution and as
where , , , are the mean and covariance of the real and generated distributions of the extracted features.
NAS became a mainstream research topic since Zoph and Le [12] found stateoftheart recurrent cell on Penn Treebank and highly competitive architecture on CIFAR10 using Reinforcement Learning (RL). Various RL methods have been successfully applied to NAS including vanilla policy gradient [23, 24], Proximal Policy Optimization (PPO) [13, 25] and Qlearning [26, 27]. An alternative approach is to use evolution algorithm [28, 14, 29], maintaining and evolving a large population of neural architectures. In contrast to aforementioned gradientfree optimization methods, Liu et al. [30] propose a gradientbases search strategy based on continuous relaxation of architecture representation. Other gradientbased approaches include Neural Architecture Optimization (NAO) [31] and ProxylessNAS [32].
Inspired by Google Inception model, Zoph et al. [13] and Zhong [27] propose a search space based on two types of convolutional cells, named normal and reduction cell. This design leads to a simplified yet quality search space and enables the transferability of resulting architecture found by NAS. It is widely adopted by many later works [33, 25, 14, 30, 31]. Our work also falls in the category of searching cell topology but differs in the following ways:

In all previous RLbased NAS algorithms, the convolutional cell solely consists of unitary and binary operations except for the final concatenation. In another word, candidate cell topology can only be DAG with indegree no greater than . Our architecture representation allows searching through cells with arbitrary topology.

Previous works search for discriminative models with normal or downsampling modules, we search for generative models where upsampling modules play a significant role.
3 Method
Our work makes use of the Neural Architecture Search with Reinforcement Learning framework proposed by [12]
. A controller recurrent neural network (RNN) samples architectures of the generator and discriminator of GAN simultaneously. The sampled architectures are then sent to computation nodes for training and evaluation using Inception Score. The resulting performance is used as feedback to update controller RNN parameters using REINFORCE rule
[34]. Below we provide detailed description of the three critical components in our design: (i) controller architectures, (ii) the set of operations that we use to construct a GAN (a.k.a. the search space), and (iii) how to train a reinforcement learning.3.1 Controller architecture
The controller is a twolayer LSTM consisting of three segments (Figure 1
), programming the upsampling module in the generator, the downsampling and normal modules in the discriminator, respectively. In each segment, the controller iteratively outputs a candidate operation in the module and an adjacency vector indicating tensors that will be fed into the incoming operation; the output, either an operation or an adjacency vector, is then fed into next step as input.
All operations are sampled through a softmax classifier with sample temperature
and logit clipping constant
[35]where is the output operation and is the last hidden layer at current time step. is fed into the controller RNN through an embedding layer; the embedding parameters are only shared within the same segment.
The adjacency vector is sampled from elementwise independent multivariate Bernoulli distribution
where is the adjacency vector and is the sigmoid activation. is fed into the controller RNN through a linear projection layer whose parameters are similarly shared within the same segment.
3.2 The Search Space
The outputs of each segment in the controller RNN will be used to program a module in the child model. At each time step, we select tensors according to the sampled adjacency vector, and feed their sum into next operation.
More precisely, each module takes the output of last two modules, and , as inputs. The output sequence of controller RNN segment always starts with and ends with an operation. The module is constructed as follows:
Apply the first operation to to form the skip connection. Let , where and .
For each , select tensors from according to . If , input of the module will be selected.
Apply to the sum of selected tensors and add resulting tensor to .
Repeat Step 2 and Step 3.
Concatenate tensors in who have never served as an input to form the final output.
In addition, we adopt the following heuristics to ensure the computation graph is welldefined:

The first operation is interpreted as an upsampling (downsampling) operation if the previous module is an upsampling (downsampling) module.

For upsampling modules, the operations applied to , will be interpreted as upsampling operations.

For downsampling modules, the operations applied right before concatenation will be interpreted as downsampling operations.

convolutions are applied to the final output to keep the number of channels constant.
The metaarchitectures of the generator and the discriminator are manually determined as follows. Starting with a linear layer, the generator consists of upsampling modules, followed by a convolution and a activation. The discriminator starts with a convolution, followed by downsampling modules, normal modules, a global sum pooling layer and a linear layer. For conditional version of the model, the discriminator logits is augmented with a projection layer as in [18].
We use hinge loss [36] as the objective function, where
To cover a large variety of candidate architectures, we collect the following set of operations as our normal operations:

[topsep=0pt,itemsep=1ex,partopsep=1ex,parsep=1ex]

identity

convolution

convolution

dilated convolution

depthwiseseparable convolution

depthwiseseparable convolution

depthwiseseparable convolution

then convolution

then convolution

then convolution

max pooling

max pooling

average pooling

average pooling

average pooling
For upsampling modules, based on stateoftheart GAN architectures we consider two different types of upsampling operations: 1) , or
transposed convolution 2) nearestneighbor interpolation followed by any convolution in the list above. For downsampling modules, motivated by optimized residual blocks in
[9], we include 1) convolution followed by stride
average pooling 2) stride average pooling followed by convolution as two types of atomic operations.We use BN  ReLU  Conv for all convolutional operations in G, and ReLU  Conv for all convolutional operations in D. There is no Batch Normalization nor ReLU in between
then convolutions.3.3 Training with Reinforcement Learning
We use REINFORCE rule [34] to udpate controller RNN parameters . Let be the output sequence of controller, including both operational and connectivity choices. We have the following update rule for :
where is the reward for taking actions and
is the baseline for variance reduction. In particular, when
is an operation or adjacency vector, can be computed through softmax or sigmoid crossentropy.We measure the performance of GAN using Inception Score. More precisely, we propose the following reward shaping
where and are constants, making the rewards more sensitive when IS approaching optimal value. Due to the instability of GAN training, the Inception Score needs to be averaged over multiple run of GAN to ensure reliable measurement. In practice, however, we found that the proposed NAS algorithm works with a single run of training per sampled architecture.
4 Experiments
4.1 Data Sets
We used two data sets in our experimental study. The CIFAR10 data set consists of color images in different classes. The data set is divided into a training data set of images and a testing data set with the rest images. Only training set is used for our experiment. The STL10 data set is an image data set of color images. It is composed of images in different classes with training images and testing images per class, and an additional
unlabeled images for unsupervised learning.
For data preprocessing, we follow the setup in [10] by scaling the images to then adding random noise for both data sets.
4.2 Experimental Procedure
The controller used in our experimental study is a twolayer LSTM with units, consisting of three segments. Each segment outputs a sequence of actions ( operations and adjacency vectors), encoding a DAG of nodes. We use sample temperature and logit clipping when sampling operations, and when sampling adjacency vectors. The controller is trained using policy gradient with learning rate . We compensate the loss with an entropy temperature to ensure better exploration. The controller is updated whenever rewards are collected from child models. We use Titan X GPUs training for days, with an overall sample complexity of .
When constructing the GAN model, we fix the number of channels in both the generator and discriminator to be . We find that using global sum pooling instead of global average pooling in the penultimate layer of the discriminator stabilizes the training. We use Adam optimizer [37] for optimization with , and . The discriminator is updated steps per one generator update step. The number of samples generated is per G update and per D update. We use batch size for real samples. To evaluate the architecture, the model is trained for steps. Inception Score is then calculated based on generated samples divided into groups. For reward shaping, we choose (Inception Score of the real data) and .
Model with the highest Inception Score (when trained for steps) is generated as early as step . The controller, however, continues to learn the distribution that samples better performing models on average. In fact, the best models when trained to full size are generated at the later stage of the architecture search.
4.3 Learning GAN architecture on CIFAR10
For the task of supervised image generation on CIFAR10, we take top candidate models discovered in the architecture search and train for steps. We scale up the models by doubling the number of channels in both the generator and the discriminator. The label information is fed into G via Conditional Batch Normalization (CBN) [19] and into D via projection [18]. We use Spectral Normalization [10] for the discriminator but not the generator. The best architectures are reported in Table 1.
Method  Inception Score  FID 

Real data  
DCGAN style  
SteinGAN  
DCGAN  [38]  
Salimans et al. [21]  
ACGAN  
SGAN  
ResNet  
WGANGP  
SNGAN  [18]  [18] 
BigGAN  
Ours  
AGANA  
AGANB  
AGANC 
AGANA and AGANB outperform all DCGANstyle architectures. The best architecture we found, AGANB, also outperforms all ResNet architectures with input resolution or less than M parameters. In particular, BigGAN [16] architecture resides on input images and has parameters. The architectures we proposed have much less parameters (M, M and M, respectively) in comparison.
We also train models with the same topology for unsupervised image generation tasks. We drop the projection layer in D and use Batch Normalization in place of CBN in G. We find that scaling up does not guarantee performance gain in this setting. All of the architectures proposed outperform DCGANstyle architectures and AGANC outperforms all ResNet architectures in terms of Inception Score.
Method  Inception Score  FID 

Real data  
DCGAN style  
BEGAN  
DCGAN  [39]  [40] 
MMD GAN  [41]  
WGANGP  [10]  24.8 [40] 
Salimans et al.  
LRGAN  
SNGAN  
DFM  
Coulomb GANs  27.3  
ResNet  
WGANGP  
SNGAN  
Ours  
AGANA  
AGANB  
AGANC 
In Figure 6 we decipher the architecture of the learned model AGANA. Note that topology of all three modules: upsampling, downsampling, and normal ones, are quite different from modules used in existing models. Such architecture is a hybrid between Inception and Resnet in that each cell, as deciphered here, contains multiple branches. Cells are stacked together in a way resembling Resnet as shown previously in Figure 3. We believe this is the first time that we see inceptionresnet hybrid architectures that are used for GAN. Also the cells that we see here are quite different from inception cells that we typically use in discriminative models, which provide evidence supporting our original idea that optimal GAN architecture could be quite different from those in discriminative models. Architectures of AGANB and AGANC bear some resemblance to AGANA and we omit their diagrams for brevity.
Note that the upsample (downsample) operations following prev will only be applied when the module is preceded by an upsampling (downsampling) module.
4.4 Transferability of AGAN
One potential advantage of modularzied search space is that it enables the transferability of the learned architecture: modules generated on smaller data sets could be used as building blocks to construct networks on larger data sets, where direct neural architecture search may be infeasible or unfavorable. In this experiment, we empirically evaluated the transferability of some of our learned modules, namely AGANA and AGANC.
Our STL10 network has the same metaarchitecture as the one for CIFAR10, with the distinction that the first upsampling module in G takes input size of (instead of ). We resize the STL10 data set to images. As in Table 3, despite that their topology are not optimized for STL10, AGANA and AGANC achieve highly competitive performances, outperforming all DCGANstyle architectures. The experiment provide evidences suggesting that the architectures that we identified might be applicable to a wide range of data sets.
Method  Inception Score  FID 

Real data  
DCGAN style  
DCGAN  [42]  
DFM  
SNGAN  
ResNet  
WGANGP  [43]  [10] 
SNGAN  
Splitting GAN  
CAGAN  
Ours  
AGANA  
AGANB  
AGANC 
Learned operations: (a) upsampling module: conv , conv then (b) downsampling module: conv , dilated conv (c) normal module: average pooling , ,
5 Discussion & Conclusion
As illustrated in Figure 7, in our search of GAN architectures, the controller RNN learns drastically different distributions over operations for three module types. The upsampling modules predominately favor conv and conv then ; the downsampling modules favor conv and dilated conv ; the normal modules favor average pooling , and . This justifies our choice of segmentation of controller RNN. We point out that it is at least in contrast to RLbased NAS algorithms over image classifiers [25, 27, 13], where both the normal cell and reduction cell choose among depthwiseseparable convolutions, max and average poolings.
In addition, we observe that

the upsampling modules prefer upsamplethenconvolution operations, over transposed convolutions;

the downsampling modules prefer convolutionthendownsample operations over downsamplethenconvolutions;

the normal modules mostly consist of average poolings, and hence have very few parameters;

depthwiseseparable convolutions are not present at most networks in later stage.
In our experiments we observe that the order of operations in the same module matters much. For example, in downsampling modules, whether we perform downsampling at the beginning of the module or at the end of the model may have significant impact on the overall performance though the exact mechanism of the effect is not clear.
In conclusion, we present AGAN, the first neural architecture search algorithm on deep generative models. We demonstrate that, by careful design of controller architecture and search space, RLbased NAS algorithm can discover highly competitive architectures that rival the best humaninvented GAN architecture. Further reducing model size and enabling fast inference are on our future research agenda.
References
 [1] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
 [2] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photorealistic single image superresolution using a generative adversarial network. CoRR, abs/1609.04802, 2016.
 [3] Scott E. Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text to image synthesis. CoRR, abs/1605.05396, 2016.
 [4] JunYan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks. CoRR, abs/1703.10593, 2017.

[5]
Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein generative adversarial networks.
In Doina Precup and Yee Whye Teh, editors,
Proceedings of the 34th International Conference on Machine Learning
, volume 70 of Proceedings of Machine Learning Research, pages 214–223, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.  [6] Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, and Zhen Wang. Multiclass generative adversarial networks with the L2 loss function. CoRR, abs/1611.04076, 2016.
 [7] Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. fgan: Training generative neural samplers using variational divergence minimization. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 271–279. Curran Associates, Inc., 2016.
 [8] Augustus Odena, Jacob Buckman, Catherine Olsson, Tom Brown, Christopher Olah, Colin Raffel, and Ian Goodfellow. Is generator conditioning causally related to GAN performance? In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3849–3858, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
 [9] Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, and Aaron C. Courville. Improved training of wasserstein gans. CoRR, abs/1704.00028, 2017.
 [10] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. CoRR, abs/1802.05957, 2018.
 [11] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.
 [12] Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. CoRR, abs/1611.01578, 2016.
 [13] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning transferable architectures for scalable image recognition. CoRR, abs/1707.07012, 2017.
 [14] Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Regularized evolution for image classifier architecture search. CoRR, abs/1802.01548, 2018.
 [15] Han Zhang, Ian J. Goodfellow, Dimitris N. Metaxas, and Augustus Odena. Selfattention generative adversarial networks. arXiv:1805.08318, 2018.
 [16] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. CoRR, abs/1809.11096, 2018.
 [17] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014.
 [18] Takeru Miyato and Masanori Koyama. cgans with projection discriminator. CoRR, abs/1802.05637, 2018.
 [19] Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, and Aaron C. Courville. Modulating early visual processing by language. CoRR, abs/1707.00683, 2017.
 [20] Lucas Theis, Aäron van den Oord, and Matthias Bethge. A note on the evaluation of generative models. CoRR, abs/1511.01844, 2016.
 [21] Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. CoRR, abs/1606.03498, 2016.
 [22] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.
 [23] Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In AAAI, 2018.
 [24] Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Pathlevel network transformation for efficient architecture search. arXiv preprint arXiv:1806.02639, 2018.
 [25] Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. CoRR, abs/1802.03268, 2018.
 [26] Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. CoRR, abs/1611.02167, 2016.
 [27] Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, and ChengLin Liu. Blockqnn: Efficient blockwise neural network architecture generation. CoRR, abs/1808.05584, 2018.
 [28] Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Quoc V. Le, and Alex Kurakin. Largescale evolution of image classifiers. CoRR, abs/1703.01041, 2017.
 [29] Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. Hierarchical representations for efficient architecture search. CoRR, abs/1711.00436, 2017.
 [30] Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. CoRR, abs/1806.09055, 2018.
 [31] Renqian Luo, Fei Tian, Tao Qin, and TieYan Liu. Neural architecture optimization. CoRR, abs/1808.07233, 2018.
 [32] Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. CoRR, abs/1812.00332, 2018.
 [33] Chenxi Liu, Barret Zoph, Jonathon Shlens, Wei Hua, LiJia Li, Li FeiFei, Alan L. Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. CoRR, abs/1712.00559, 2017.
 [34] Ronald J. Williams. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. In Machine Learning, pages 229–256, 1992.
 [35] Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning. CoRR, abs/1611.09940, 2016.
 [36] Jae Hyun Lim and Jong Chul Ye. Geomtric gan. arXiv preprint arXiv:1705.02894, 2017.
 [37] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.

[38]
Yihao Feng, Dilin Wang, and Qiang Liu.
Learning to draw samples with amortized stein variational gradient
descent.
In
Proceedings of the ThirtyThird Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 1115, 2017
, 2017.  [39] Xun Huang, Yixuan Li, Omid Poursaeed, John E. Hopcroft, and Serge J. Belongie. Stacked generative adversarial networks. CoRR, abs/1612.04357, 2016.
 [40] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, and Sepp Hochreiter. Gans trained by a two timescale update rule converge to a nash equilibrium. CoRR, abs/1706.08500, 2017.
 [41] Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, Martin Heusel, Hubert Ramsauer, and Sepp Hochreiter. Coulomb gans: Provably optimal nash equilibria via potential fields. CoRR, abs/1708.08819, 2017.
 [42] David WardeFarley and Yoshua Bengio. Improving generative adversarial networks with denoising feature matching. In ICLR, 2017.
 [43] Guillermo L. Grinblat, Lucas C. Uzal, and Pablo M. Granitto. Classsplitting generative adversarial networks. CoRR, abs/1709.07359, 2017.
Comments
There are no comments yet.