1 Introduction
Generative Adversarial Networks (GAN) [13] have become the most popular approach to train the generative model. It gets much attention from the community because of its ability to generate visual appealing samples, but not require the explicit analytic form of objective functions. The idea behind GAN is to use a binary classifier, socalled the discriminator. Discriminator learns to distinguish the data (real) versus generated (fake) samples, and as a result, it represents this manifold via its scalar scores in the form of likelihood. Training generator of GAN is to maximize discriminator’s likelihood scores computed over fake samples. In other words, it confuses the discriminator to accept its outputs as the real ones. Training GAN is an the adversarial process, in which the discriminator and generator compete with each other to improve themselves. Although GAN is an attractive approach, using the real/fake label to train GAN is challenging because this supervisory signal is a weak constraint. Hence, the generator can easily cheat the discriminator by, eg., always creating the identical samples but recognized with high likelihood by the discriminator. It explains why GAN has many serious issues, such as the gradient vanishing and mode collapse [12, 1], which prevent the model to possibly cover all modes of the data distribution. Many variants of GAN have proposed new constraints to overcome this illpose problem.
In the literature, many constraints have been proposed for the discriminator. These constraints force discriminator’s gradients not to be vanishing so that the generator can use them to learn and improve itself. Intuitively, these constraints smoothen out the decision boundary of the discriminator between real and fake samples in order to avoid the sharp gradients along this region and enable distant samples to contribute more to generator training. One of the most noticeable regularization techniques are towards enforcing Lipschitz conditions [2, 14, 29, 17, 27, 19, 23]. However, these techniques have their own disadvantages, for example, the divergence issue [33] as the regularization becomes overstrength at the end. Overcoming this requires careful designs of training procedure [32, 15].
The alternative constraints, which are also commonlyused, are via autoencoder. It reconstructs the real samples, hence guides the generator to produce samples resembling the real modes. It increases the chance to occur the competition between discriminator and generator on many modes of data distribution. Therefore, it potentially encourages the discriminator can create better gradients that lead to a better generator. However, the downside of autoencoder is the blurry issue. Although some recent works [18, 31] overcame this problem by using the highlevel features of discriminators, the texture and shape of objects in generated images does not look realistic.
A recent GAN [6] proposed new constraints via selfsupervised learning strategy [11]. The authors argument new samples via image rotation and assign them with pseudolabels. In addition to training discriminator to distinguish real and fake samples, they train discriminator to predict the correct labels of rotated images. They train the generator to minimize the classification loss as the discriminator recognizes the transformation of generator’s outputs. In other words, they train the generator to create images whose correct pseudolabels of their transformed samples are easily recognized by the discriminator. Although its results are encouraging, the discriminator does not take into account the generated samples for classification task and it’s not precise how selfsupervised tasks helped to improve GAN in this work. In fact, the proposed generator objective [6]
minimizes the crossentropy loss, which does not necessarily help to create samples resembling real samples. For example, like original GAN, the generator may create collapsed samples, but recognized as real by the discriminator with high probability and its rotated samples are still classified correctly according to their pseudolabel groundtruth.
In this work, we propose an improved selfsupervised GAN, which introduces the adversarial way of using selfsupervised learning. In particular, we first propose to train discriminator to classify correct pseudolabels of real transformed samples (obtained from data samples via geometric transformation) as the classification task. This classification task improves GAN model when being combined with the original GAN task [13] that learns to distinguish data (real) versus generated (fake) samples. Then, we propose two further improvements: (i) We propose to train the discriminator to simultaneously classify the class of generated samples from pseudoclasses of real samples. We consider it as the adversarial training for the discriminator. This adversarial training significantly improves the discriminator, hence improves the generator and the model performance. (ii) In addition to confusing the GAN task, we propose a new generator objective to fool the classification task of the discriminator by creating samples that the discriminator recognizes their transformed ones as real pseudoclasses. Importantly, instead of minimizing the crossentropy of transformed fake samples like the previous work, we do match the crossentropy loss computed over fake transformed samples to that of the real transformed ones. We exploit that it stabilizes the training, boosts the significantly performance as being combined with adversarial training of discriminator. We investigate our proposed techniques with the stateoftheart AEbased GAN model [31]. Although [31] demonstrated that the combination of autoencoder and gradient penalty constraints combined together improve the training of GAN and achieve stateoftheart performance, integrating our techniques can further boost the performance of this baseline model. We see that benefiting all kind of constraints in a good way will stabilize GAN, and establish a new stateoftheart performance on CIFAR10 and STL10 datasets.
2 Related Work
While training GAN with conditional signals (e.g., class labels) [25, 33, 4] are attaining promising results, training GAN in the unconditional setting is still challenging. In the original GAN [13], the single signal (real or fake) of samples are provided to train discriminator and use the discriminator to guide the generator. With these signals, the generator or discriminator may fall into illpose settings, where easily being stuck at bad local minimums though still satisfying the signal constraints. Therefore, many regularizations have been proposed to reduce this problem, and the most popular technique is to enforce (or towards) Lipschitz condition of the discriminator by weightclipping [1], gradient penalty constraints [14, 29, 17, 27, 19], consensus constraint, [22, 21], or spectral norm [23]. Constraining the discriminator in such ways to prevent its gradients vanishing, and avoid the sharp boundary decision between real and fake classes. Otherwise, because the data points are very sparse in a highdimensional manifold, without strong constraints, the discriminator is able to always find the perfect decision boundary between real and generated data points as it is powerful enough. It is likely the main reason causing the gradient vanishing issues of GAN.
Although regularizations improve the stability of GAN, using a single supervisory signal like original GAN [13] still leads to challenging optimization problems. It is because that discriminator scores are highly dependent on generated samples. Therefore, if the generator is collapsed to some particular modes of data distribution, it is only able to create samples around these modes. Subsequently, there is no competition to train the discriminator around other modes. As a result, the gradients of these modes may be vanishing, and it is impossible to guide the generator to model the entire data distribution. Using more supervisory signals simplifies the optimization process. For example, using selfsupervised learning in the form of autoencoder. AAE [20] guides the generator towards creating more realistic samples. It is a potential solution to partly prevents the generator from generating identical samples. It steers the generated samples towards real samples to reduce the disjoint issue between two distributions, therefore, less be overfitting and gradient vanishing. However, the problem of using autoencoder is that pixelwise reconstruction with norm would cause the blurry issue. VAE/GAN [18], which combined VAE [16] and GAN, suggest a better solution: while the discriminator of GAN enables the usage of featurewise reconstruction to overcome the blur, the VAE constrains the generator better to reduce the mode collapse. ALI [10] and BiGAN [9] jointly train the data/latent samples in GAN framework like to put more constraints on the discriminator and the generator. InfoGAN [7] infer the disentangled representation of latent code by maximizing the mutual information. In addition to using featurewise, [31, 30] combine the two different types of supervisory signals: real/fake signals and selfsupervised signal in the form of autoencoder, which lead to stable convergence and bettergenerated images and prevent the model from the mode collapse. Although featurewise distance for autoencoder is often good to reconstruct the sharper images, its reconstructed images still cannot produce realistic detail of textures or shapes.
Recently, selfsupervised learning is getting much attention from the community as it helps to close the gap between supervised and unsupervised models in classification tasks [8, 26, 34, 35, 24, 11]. This technique encourages the classifiers to learn better feature representation with pseudolabels, which has been also applied for GAN [6]. However, the usage of the selfsupervised task in this work is simply following the idea of [11]. It’s unclear how the classification tasks help the model. Moreover, although the usage of selfsupervised learning to train discriminator is simple, making use of selfsupervised learning effective for the generator is not trivial.
3 Proposed Method
In our work, we adopt an autoencoder based method, DistGAN [31], to be our baseline model because it has already demonstrated the combination of gradient penalty and autoencoder constraints achieves the stateofthe artresults of GAN. We discuss adversarial selfsupervised learning (in short of training selfsupervised learning with the adversarial process) for the discriminator and the generator and how to integrate them into the baseline model. Our model consists of three main components: we use the regularized autoencoder (consisting of the encoder (E) and decoder (G)) like [31], and we propose new objectives of the discriminator (D) and the generator (G) to improve the model. The decoder and the generator share all parameters. In our model, we first train the autoencoder, after that we train the discriminator to distinguish real and fake samples (GAN task) and also learn to predict correct augmented labels (classification task) and finally we train the generator to match real and fake scores in combination with matching the crossentropy losses computed over transformations of these samples. Our components and the training algorithm are represented in Fig. 1 and Alg. 1. To highlight our main contributions, we will first discuss our proposed discriminator and generator objectives and then remind the regularized autoencoder.
3.1 Discriminator Objective
Our discriminator objective (Eq. 1) consists of two parts: (i) The GAN objective to train discriminator to distinguish between real/fake samples . (ii) The classification objective to train the classifier to predict the correct labels of the argumented samples via image transformations, . The discriminator and classifier are the same (shared parameters), excepts two different heads: the last fullyconnected layer which returns dimension (real or fake) for the discriminator and the other returns dimensions of pseudoclasses for the classifier respectively. is the constant selected through empirical experiments.
(1) 
3.1.1 GANbased Objective
The discriminator part for GAN is written in Eq. 2. It’s different from GAN objective [13] that our model considers the reconstructed samples as “real” represented by the term , so that the gradients from discriminator are not saturated too quickly. This constraint slows down the convergence of discriminator and couples the convergence between discriminator and autoencoder. It’s likely another regularization technique, which has the similar goal as [3], [23] and [31, 30]. In our method, we use a small weight for with for the discriminator objective. We observe that is important at the beginning of training. However, towards the end, especially for complex image datasets, the reconstructed samples are less useful as it may result in lower quality than the real samples. We also observe that after training iterations, most of models does not much significantly improve the quality of images when continue the training with the same . From this point, we start to decay the value of according to the iterations , where start to be counted from . Here, is the expectation, and may be written as and respectively for short. and are data distribution and prior noise distribution. is a constant, and , is a uniform random number . enforces sufficient gradients from the discriminator to train the generator. For hinge loss, replacing by in Eq. 2.
(2) 
3.1.2 Classificationbased Objective
The second part of the discriminator objective is for the classification task. We apply the selfsupervised learning techniques to argument samples with geometric transformations and train the discriminator to predict correct pseudolabels of these samples. In particular, we apply geometric transformations on original input to create new samples , and assign the transformed with pseudolabels . We consider these argumented data samples are real transformation classes (from st to th classes), and simultaneously the generated samples are the fake transformation class (th class). In order to train D as the multiclass classifier, we add another head into in addition to the conventional real/fake output. It is a fullyconnected (FC) layer with softmax outputs. Therefore, the discriminator can be also called as a classifier in this case. The goal in this section is to train the classifier to predict the geometric transformation applied to the image. We train the classifier to distinguish the real classes and fake class by minimizing the objective of Eq. 3. Note that we do not rotate the generated samples when training the discriminator, because enforcing the discriminator to recognize the correct classes of transformed fake samples makes the discriminator getting worse. It’s due to that the generated samples themselves can be very noisy, especially at the beginning of the training. In addition, it seems to have some overlapping between GAN task and classification task because they both learn to classify the fake samples, however, it is important to have both tasks because each task may have its responsibility. GAN task is to distinguish between real and fake samples to approximate the distribution and classification task is to learn the useful feature representation to improve the first one. Indeed, If one of them is removed, the performance gets significantly worse. It’s also worth noting that [6] only proposes the first term of our objective (Eq. 3) and does not get benefits of generated samples in the training.
(3) 
Here, is the softmax predicted probability of th class on data sample which is transformed by . Training the classifier to predict the pseudolabels of real transformation classes encourages the discriminator to learn the useful feature representation of images and therefore leads to a better decision as distinguishing the real and fake samples. In addition, we train the classifier simultaneously distinguish with the fake transformation class, which is a type of adversarial training like original GAN [13]. The classifier learns to recognize the fake samples from the pseudoclasses of real ones is probably to create better gradients to guide the generator. Here, it’s an adversarial training because there is a competition between discriminator and generator for the classification task. It’s an important finding of our work, which is helpful to further improve the baseline model. It’s worth noting that when we discuss adversarial training in our work, we would mean for selfsupervised learning (classification task). The adversarial training for GAN task is a default. In our model, the welltrained discriminator/classifier also produces good featurewise distance for the reconstruction task (Section 3.3) to train better autoencoder for our model because we’re using discriminator features to form the reconstruction objective. It was shown in previous experiments [31] that on synthetic data as the reconstruction is nearly perfect, this autoencoder based model can approximate well the data distribution. We constrain the discriminator by the reconstruction; therefore, if the higherquality reconstruction leads to better quality and convergence of discriminator and hence generated samples are more realistic.
3.2 Generator Objective
A recent work [6] proposed a way to integrate the selfsupervised technique into GAN via image rotations [11]. However, it is unclear how much the discriminator and generator contribute to these improvements. Not mentioning that this technique is not always applicable to other GAN methods. For example, using this selfsupervised technique [6] to our generator causes our model diverged and reduces the quality of generated images (Section 4.1).
(4) 
In this work, we propose a new generator objective (Eq. 4) including two terms. The first term is the GAN task , which is motivated from [31] as shown in (Eq. 5). The intuition of this term is that the discriminator can model the data manifold by its scalar values. To approximate the data distribution in general, we match the two manifolds together. However, it’s challenging due to highdimensions. Therefore, we indirectly align the distribution of real discriminator scores to the distributions of generated discriminator scores.
(5) 
The second term is the classification task, . In [6], the generator aims to create samples that the discriminator can easily predict its pseudolabels for the transformed sample . In contrast, our term is to match the selfsupervised tasks to train the generator. Our intuition is that if generator distribution is similar to the real distribution, the classification performance on its transformed samples should be similar to that of those transformed from real samples. In other words, if real and fake samples are from similar distributions, the same tasks applied for real and fake samples should have resulted in similar behaviors. In particular, given the crossentropy loss computed on real samples, we train the generator to create samples that are able to match this loss. We form the crossentropy loss of multiclass classification as shown in Eq. 6. Here, we train the generator to confuse the classifier to recognize fake transformed samples as the same performance as it recognizes transformed classes obtained from the real ones. When the classifier learns to distinguish the real versus fake transformation classes, it learns to create good gradients and the generator gets benefits of these gradients to learn and confuse the classifiers. This adversarial process is similar to original GAN [13], but now applied for multiclasses. Here, is a constant selected through empirical experiments, and we use norm for both the GAN task and the classification task. In our implementation, we randomly select a geometric transformation for each data sample when training the discriminator. And the same are applied for generated samples when matching the selfsupervised tasks to train the generator.
(6) 
3.3 Regularized Autoencoder
We use the regularized autoencoder (AE) in our model to prevent the generator from being severely collapsed and guide the generator in producing samples that resemble real ones as shown in recent works [31, 30]. We propose to use the similar autoencoder objective function [31]:
(7) 
Eq. 7 is the objective of our regularized AE. The first term is reconstruction error in conventional AE. The second term is the distance constraint, similar to [31], to regularize the mapping from latent to data samples. Here, is GAN generator (decoder in AE), is the encoder and the constant as suggested by the original work. is the features of the sample computed through the last convolution layer of the discriminator , is the dimension of latent samples . Here, we reuse parameters of autoencoder from the original model and focus the analysis on our main contributions as discussed in previous sections (3.1, 3.2).
4 Experimental Results
We conduct experiments to investigate the effectiveness of our proposed adversarial selfsupervised learning on CIFAR10 and STL10 datasets. Images of STL10 are resized into like [23]. We use DCGAN [28] architecture with standard “log” loss, and SNGAN [23] and ResNet [14] architectures with “hinge” loss. We use “hinge” loss for SNGAN and ResNet because it attains better performance than standard “log” loss as shown in [23]. We remind these networks in the supplementary material. In our model, the encoder network is the mirror of the generator network. We measure the diversity and quality of generated samples via the Fréchet Inception Distance (FID) [15]. FID is computed with 10K real samples and 5K generated samples like SNGAN [23] if not precisely mentioned. FID is computed every 10K iterations in training and visualized with the smoothening windows of 5. We train our method with 300K iterations, and report the FID of the last iteration excepts the standard SNGAN for CIFAR10 where we report it at about 120K because continuing the training does not improve the FID. We conduct the ablation studies and finetuning parameters on DCGAN, SNGAN and ResNet architectures, and will use their best settings to compare to the stateoftheart methods. DistGAN [31] is our main baseline in ablation studies. We train models using Adam optimizer with learning rate , , for DCGAN and SNGAN architectures and , for ResNet architecture [14]. We set , latent dimension is and minibatch size is 64 for our all experiments.
4.1 Ablation Study
At first, we aim to seek good and
of our proposed method. However, estimating both at the same time is expensive. Therefore, we propose to first seek the good
of the classification task for the discriminator (Eq. 1). We train the classification task of the discriminator with only the real transformed samples like [6]. We follow the geometric transformation of [11], which is simple but effective and achieved the best performance in selfsupervised tasks, to argument images and their pseudo labels. In particular, we train discriminator to recognize the 2D rotations which are applied to the input image. We rotate the input image with rotations () and assign them the pseudolabels from 1 to . In this experiment, we set . This ablation study is with DCGAN on CIFAR10. Fig. 2a shows that training discriminator with the selfsupervised learning task stabilizes the baseline and makes the model converging faster. This technique helps to improve FID score from our baseline. This study suggests the good for DCGAN architecture.The second study seeks a good of the selfsupervised task for the generator (Eq. 4). Experimental conditions are exactly the same as the first one, excepts we fix the best from the previous experiment. We consider the version with the best from the previous study as the selfsupervised baseline (SS). First, we investigate the influence of our selfsupervised task for the generator on the overall performance. For that, we train the classification task of the discriminator without adversarial training (no fake class) as the previous experiment and train the generator for two cases: the similar objective of SSGAN [5] and our proposed objective in Eq. 6. We carry out the investigation with DCGAN architecture on CIFAR10 dataset as shown in Fig. 2b. The results show that training our generator with a similar objective of [5] causes the divergence issue and this generator objective does not help to improve the performance. However, when we use our proposed generator objective (Eq. 4), the performance () is better than the selfsupervised baseline. This confirms the usefulness of our proposed generator objective.
Method  CIFAR10  STL10  CIFAR10 (R)  STL10 (R)  CIFAR10 (R) (10K10K) 
GANGP [23]  37.7         
WGANGP [23]  40.2  55.1       
SNGAN [23]  25.5  43.2  21.70 .21  40.10 .50  19.73 
SSGAN [6]          15.65 
DistGAN [31]  22.95  36.19  17.61 .30  28.50 .49  13.01 
GNGAN [30]  21.70  30.80  16.47 .28     
Ours (SS)  21.40  29.79  14.97 .29  27.98 .38  12.37 
Ours (SS + adversarial, G)  19.05  28.70  14.75 .28  28.24 .23  12.15 
Third, we want to understand the influence of adversarial training (with a fake class) for the classification task given best , from previous studies. This experiment is also with DCGAN architecture on CIFAR10 dataset (Fig. 2b). We now train the discriminator with adversarial learning (simultaneously distinguishing fake class in the classification task). Note in our experiments, when we mention about adversarial training, we mean using it for the classification task. We first carry out experiments with our proposed generator objective. When considering the adversarial training (with the fake class) for the discriminator, our method improves FID significantly as comparing the nonadversarial version. We also try with the generator objective of [5]. Although FID is slightly improved from the selfsupervised baseline, using this generator objective still gets collapsed. In contrast, our proposed objective is stable and achieve the best FID than other versions (Fig. 2b). This confirms the importance of the combination of our adversarial selfsupervised learning and our proposed generator objective. The reason of training generator objective like [5] leads to the corruption is perhaps because of maximizing it violates the GAN task of our generator, which does not support the match of D(x) and D(G(z)) of the first term. This violation is similar to the gradient penalty [14] although it may be useful at the beginning but diverge at the end. Intuitively, our new objective (Eq. 6) does not violate because when data and generator distributions are matched, their classification should be similar either. This study again verifies the hypothesis of our proposed techniques.
Fourth, in previous experiments, we figured out how the classification task helps to improve the GAN task. Although seemly there is overlapping between GAN task and classification task as they both classify the same fake sample, having both tasks in the model is important. For instance, if removing the GAN task in our model (for both discriminator and generator), the model gets immediately collapsed at first iterations as shown in Fig. 2c. It means that the GAN task still plays an important role in our GAN model. We also consider the adversarial training of discriminator objective like (Eq. 3) but we now rotate the fake samples and consider these rotated samples belonging to the fake class. The result in Fig. 2c does not suggest to rotate the fake samples when training the discriminator, because it likely creates noise and degrades its learning. We conduct this study with DCGAN on CIFAR10 in the similar experimental setup previous studies, and and if the classification task is used.
Fifth, we also investigate proposed techniques for other network architectures CIFAR10 and STL10 datasets. At first, we repeat the first experiment for SNGAN and ResNet architectures to select their best as shown in the first row of Fig. 3. The results suggest for SNGAN and for Resnet. We realize that when the network is powerful (eg. ResNet), the best gets smaller. Perhaps, the more powerful network has better capability to learn good feature representation via the GAN task. In contrast, the smaller networks (DCGAN, SNGAN) are harder to train, therefore needs more contribution from the classification task. Then, we study the good for these architectures as shown in the second row of Fig. 3. Here, we seek for our generator objective in the case of that the classifier is trained with the fake class (adversarial training) similar to our third study with DCGAN architecture. The generator objective helps to boost significantly the performance (if the choice of is good), especially for SNGAN on CIFAR10. Our proposed techniques also reduce the divergence issue as shown in the first column of Fig. 3. Although the baseline with ResNet achieves almost saturated performance, our techniques are still able to improve this model further. It’s worth noting that the FID of our selfsupervised baseline (SS) already reaches the similar performance of SAGAN [33]  the stateoftheart conditional GAN (see the discussion in the supplementary material)  it’s hard to make the improvement higher even though being combined with the adversarial training and our proposed generator objective. This study again confirms the effectiveness and robustness of our proposed techniques on various architecture. We observe that are good choices for DCGAN and ResNet on CIFAR10, and is good for SNGAN on CIFAR10 and STL10. With the combination of adversarial selfsupervised learning for discriminator and our proposed generator objective, our best versions significantly outperform the baseline for various network architectures and datasets.
4.2 Compared to stateoftheart methods
In this section, we compare the best settings of our proposed method to the state of the art on benchmark datasets: CIFAR10 and STL10 as shown in Table 1. We compare results obtained with SNGAN [23] and ResNet [14, 23] architectures. As shown in Table 1, our method significantly outperforms the baseline DistGAN and other GAN methods, especially on the STL10 dataset. This confirms the effectiveness of the combination GAN task and classification task into a unique model. It’s worthnoting that SNGAN attains best results at about 100K iterations, yet this model diverges if continue the training. The similar observation is also discussed in [6]. We also compare to the recent work, SSGAN [6], which also integrates the selfsupervised technique to improve GAN model. For this case, to be a fair comparison, we compute the similar FID with 10K real samples and 10K fake samples like this work. Our model achieves much better FID score than SSGAN with same ResNet architecture on CIFAR10 dataset. Fig. 4 show some generated examples of our model of ResNet architectures on CIFAR10 and STL10 datasets.
5 Conclusion
We propose to train the model with adversarial selfsupervised learning. First, we show that training selfsupervised learning helps to improve the discriminator (selfsupervised baseline) and hence enhance the quality of generated images. Then we propose to train the discriminator with adversarial and a new generator objective via matching the crossentropy loss between real and fake samples. The combination of adversarial training (discriminator) and crossentropy matching (for generator) further boosts the performance of selfsupervised baseline over with various network architectures on CIFAR10 and STL10 datasets. The best version of our proposed method significantly outperformed the baseline and established the new stateoftheart FID scores over these benchmark datasets. Although investigating our proposed techniques mainly within an autoencoder GAN model, we believe that our proposed techniques are orthogonal and potential to be used to improve other GAN methods.
References
 [1] M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862, 2017.
 [2] M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862, 2017.
 [3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. ICML, 2017.
 [4] A. Brock, J. Donahue, and K. Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
 [5] D. M. Chen, G. Baatz, K. Koser, S. S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. Cityscale landmark identification on mobile devices. In CVPR, 2011.
 [6] T. Chen, X. Zhai, and N. Houlsby. Selfsupervised gan to counter forgetting. arXiv preprint arXiv:1810.11598, 2018.
 [7] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2172–2180, 2016.
 [8] C. Doersch, A. Gupta, and A. A. Efros. Unsupervised visual representation learning by context prediction. In CVPR, 2015.
 [9] J. Donahue, P. Krähenbühl, and T. Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.
 [10] V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro, and A. Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
 [11] S. Gidaris, P. Singh, and N. Komodakis. Unsupervised representation learning by predicting image rotations. ICLR, 2018.
 [12] I. Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
 [13] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
 [14] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
 [15] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two timescale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
 [16] D. P. Kingma and M. Welling. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 [17] N. Kodali, J. Abernethy, J. Hays, and Z. Kira. On convergence and stability of gans. arXiv preprint arXiv:1705.07215, 2017.
 [18] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
 [19] K. Liu. Varying klipschitz constraint for generative adversarial networks. arXiv preprint arXiv:1803.06107, 2018.

[20]
A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow.
Adversarial autoencoders.
In International Conference on Learning Representations, 2016. 
[21]
L. Mescheder, A. Geiger, and S. Nowozin.
Which training methods for gans do actually converge?
In
International Conference on Machine Learning
, pages 3478–3487, 2018.  [22] L. Mescheder, S. Nowozin, and A. Geiger. The numerics of gans. In Advances in Neural Information Processing Systems, pages 1825–1835, 2017.
 [23] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. ICLR, 2018.
 [24] M. Noroozi, H. Pirsiavash, and P. Favaro. Representation learning by learning to count. In ICCV, 2017.
 [25] A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary classifier GANs. In ICML, 2017.
 [26] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016.
 [27] H. Petzka, A. Fischer, and D. Lukovnicov. On the regularization of wasserstein gans. arXiv preprint arXiv:1709.08894, 2017.
 [28] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
 [29] K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann. Stabilizing training of generative adversarial networks through regularization. In Advances in Neural Information Processing Systems, pages 2018–2028, 2017.

[30]
N. Tran, T. Bui, and N. Chueng.
Improving gan with neighbors embedding and gradient matching.
In
AAAI Conference on Artificial Intelligence (AAAI)
, 2018.  [31] N.T. Tran, T.A. Bui, and N.M. Cheung. Distgan: An improved gan using distance constraints. In ECCV, 2018.
 [32] Y. Yazıcı, C.S. Foo, S. Winkler, K.H. Yap, G. Piliouras, and V. Chandrasekhar. The unusual effectiveness of averaging in gan training. arXiv preprint arXiv:1806.04498, 2018.
 [33] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena. Selfattention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.

[34]
R. Zhang, P. Isola, and A. A. Efros.
Colorful image colorization.
In ECCV, 2016. 
[35]
R. Zhang, P. Isola, and A. A. Efros.
Splitbrain autoencoders: Unsupervised learning by crosschannel prediction.
In CVPR, 2017.
Comments
There are no comments yet.