When Relation Networks meet GANs: Relation GANs with Triplet Loss

02/24/2020 ∙ by Runmin Wu, et al. ∙ Dalian University of Technology Association for Computing Machinery 0

Though recent research has achieved remarkable progress in generating realistic images with generative adversarial networks (GANs), the lack of training stability is still a lingering concern of most GANs, especially on high-resolution inputs and complex datasets. Since the randomly generated distribution can hardly overlap with the real distribution, training GANs often suffers from the gradient vanishing problem. A number of approaches have been proposed to address this issue by constraining the discriminator's capabilities using empirical techniques, like weight clipping, gradient penalty, spectral normalization etc. In this paper, we provide a more principled approach as an alternative solution to this issue. Instead of training the discriminator to distinguish real and fake input samples, we investigate the relationship between paired samples by training the discriminator to separate paired samples from the same distribution and those from different distributions. To this end, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Extensive experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks including unconditional and conditional image generation and image translation. Our source codes are available on the website: <https://github.com/JosephineRabbit/Relation-GAN>

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since first proposed in [8]

, generative adversarial networks (GANs) have witnessed a rapid development and found numerous applications in many computer vision tasks, such as image generation 

[8, 13, 44], person re-identification [2]

, image super-resolution 

[31]

etc. It also has been extended to natural language processing 

[38], video sequence synthesis [6], and speech synthesis [30] recently.

Though tremendous success has been achieved in many fields, training GANs is still a very tricky process and suffers from many issues, including the instability between the generator and the discriminator as well as the extremely subtle sensitivity to network architecture and hyper-parameters. It has been proved that most of these issues are due to the fact that the support of both target distribution and generated distribution are often of low dimension regarding to the base space, and therefore misaligned at most of the time, causing discriminator to collapse to a function that hardly provides gradients to the generator.

To remedy this issue, recent works proposed to leverage the Integral Probability Metric (IPM), such as Gradient Penalty 

[9] and Spectral Normalization [28]. In IPM-based GANs, the discriminator is constrained to a specific class of function so that it does not grow too quickly and thus alleviates vanishing gradients.

However, the existing IPM methods also have their limits. For instance, the hyperparameter tuning of gradient penalty is mostly empirical, while the spectral normalization imposes constrains on every conv-layers which hinders the learning capacity of discriminators.

In [13], the authors argue that non-IPM-based GANs are missing a relativistic discriminator, which IPM-based GANs already possess. The relativistic discriminator is necessary to make the training process analogous to divergence minimization and produce sensible predictions based on the prior knowledge that half of the samples in the mini-batch are fake. Although they have shown the power of relativistic discriminator, the potential of comparing the relation between real and fake distribution still remains to be explored.

In this paper, we explicitly study the effect of relation comparison in GANs by training the discriminator to determine whether the input paired samples are drawn from the same distribution (either real or fake). A relation network is present, acting as the discriminator. A new triplet loss is also designed for training the GANs. In this way, the before-mentioned problem of the disjointed support could be alleviated by projecting and merging the low dimension data distribution into a high dimension feature space.

Figure 1: Illustration of our Relation GAN, the “C” denotes contact operation. Our discriminator consists of two modules, an embedding module (EM) and a relation module (RM). The relation discriminator is expected to enlarge the difference of relation score between asymmetric pairs and symmetric pairs while the generator is expected to reduce the difference.

Mathematically, we prove our new triplet loss is a divergence and could achieve the Nash equilibrium leading to convergence of the generated data distribution to the real distribution. In addition, we analyze the oscillatory behavior that GANs exhibit for the Dirac-GAN and we demonstrate the proposed Relation GAN is locally convergent even with no regularized methods.

Extensive experiments are conducted on conditional and unconditional image generation and image translation tasks. The promising performance demonstrates the proposed relation gan has great potential in various applications of GANs.

In summary, the contributions of this paper are two folds.

  • We propose a new training strategy for GANs to better leverage the relation between samples. Instead of separating real samples from generated ones, the discriminator is trained to determine whether a paired samples are from the same distribution.

  • We propose a relation network architecture as the discriminator and a triplet loss for training GANs. We show both theoretically and empirically that the relation network together with the triplet loss give rise to generated density which can exactly match that of real data.

Extensive experiments on 2D grid [27], Stacked MNIST [17], CelebA [21], LSUN [42], CelebA-HQ [22] data sets confirm our proposed method performs favourably against state-of-the-arts such as relativistic GAN [13], WGAN-GP [9], Least Squares GAN (LSGAN) [24] and vanilla GAN [8].

2 Related Work

The vanilla GAN [8] minimizes the JS divergence of two distributions, leading to the gradient vanishing problem when the two distributions are disjoint. Recent works try to address this issue by designing new objective functions [24, 32, 37, 1] or more sophisticated network architectures [14, 45, 5, 33]. Others investigate the regularization and/or normalization to constrain the ability of discriminator [28, 9, 16]. Recently, a new method [13] is proposed to explore a relativistic discriminator. In the following, we will review recent works using different objective functions and a special case–relativistic GANs, which are closely related to our approaches.

2.1 Different Objective Functions in GANs

Generally, there are two kind of loss functions in GANs: the minimax GAN and the non-saturating (NS) GAN. In the former the discriminator minimizes the negative loglikelihood for the binary classification task. In the latter the generator maximizes the probability of generated samples being real. The non-saturating loss as it is known to outperform the minimax variant empirically. Among them, loss sensitive GAN 

[32] tries to solve the problem of gradient vanishing by focusing on training samples with low authenticity. WGAN [1] proposes the Wasserstein distance to replace the JS divergence, which can measure their distance even though the two distributions are disjoint. In addition, [1] also proposes to add noise to both real and generated samples to further alleviate the impact of disjoint distributions. [9] improves WGAN by replacing the weight clipping constraints with a gradient penalty, which enforces the Lipschitz constraint on the discriminator by punishing the norm of the gradient. DRAGAN [16] combines the two parts of WGAN and LSGAN, and only improves the loss function to a certain extent. The stability of loss training is controlled by constantly updating the coefficient of the latter term.

2.2 Relativistic GANs

Instead of training discriminators to predict the absolute probabilities of the input samples being real, the relativistic GAN [13]

proposes to use a relativistic discriminator, which estimates the probability of the given real sample being more realistic than a randomly sampled fake sample. Although bears a similar spirit, our method differs from

[13] in that we adopt a relation network as the discriminator to estimate the relation score of a paired input. In comparison, the discriminator in [13] treats input samples separately and relies on a ranking loss (e.g., hinge loss) to explore their relation. The idea of merging the features and comparing the relation between samples from two distribution has not been explored in the literature of GANs. In addition, our method proposes a new triplet loss to leverage the power of paired relation comparison, allowing more stability and better diversity for GANs without applying any IPM methods.

3 The Relation GAN Framework

3.1 Relation Net Architecture

In traditional GANs, a discriminator is trained to distinguish real samples from fake ones and a generator is trained to confuse the discriminator by generating realistic samples. Consider a real data distribution , and the data distribution produced by the generator . Rather than training the discriminator on real and fake data independently, we propose to train a discriminator which predicts a relation score for a paired input, indicating whether the paired samples are from the same distribution (either or ).

Inspired by the success of relation net architecture in other computer vision areas [39], our discriminator consists of two modules, including an embedding module and a relation module as shown in Figure 1. For a pair of input samples, the embedding module firstly maps each sample into a high dimensional feature space. Their corresponding features are then merged and fed into the relation module to produce the relation score for the input pair. For ease of description, we name paired inputs containing both real and fake samples as asymmetric pairs, and those containing samples from the same distribution (either real or fake) as symmetric pairs. The training process is then formulated as a min-max game (See Section 3.2), where the discriminator aims to maximize the relation scores of asymmetric sample pairs and minimize those of symmetric ones. Meanwhile, the generator is trained to confuse the discriminator by minimizing the relation scores of asymmetric sample pairs containing real and generated samples.

3.2 The Min-Max Game

The min-max game in training GANs is conducted by optimizing the losses of and iteratively. In the no-IPM GANs, the generalized losses of and can be presented as follows:

(1)

and

(2)

where and are scalar-to-scalar functions, is the distribution of real data, and denotes the generated data distribution.

Let , and . Eq (1) and (2) will become the loss functions of the standard GAN [8]. from and from .

In our Relation GAN, the formulation of the losses functions of and are as follows:

(3)

and

(4)

where and are also scalar to scalar function.

The goal of relation discriminator is to learn a loss function parameterized by which separates symmetric and asymmetric sample pairs by a desired margin. Then the generator can be trained to minimize this margin by generating realistic samples.

Inspired by the success of triplet loss [7], we formulate the similar loss function in our Relation GAN as follows:

(5)

and

where and are samples from the real data distribution, is sample from the generated data distribution and is sample from the data generated by the generator in the last step of optimization. We use a distance metric to replace the constant ‘margin’ in the original triplet loss. This variable constraining leads to a smaller difference of relation scores when the distance between the two compared samples are smaller, which is more flexible than the original fixed margin. Our experiments also shows the superiority of our new triplet loss with margin.

3.3 A Variant Loss

Since the training batch size is limited, the sampled distribution of each batch may deviates from the real data distribution. For an input batch of paired samples, the loss function in (3.2) can be written as follows:

(6)

where . Our triplet loss is designed to reduce the relation scores of symmetric sample pairs and increase those of asymmetric ones.

However, when the real sample distribution is fairly uniform with small variance, the original loss is rigorous and prone to be disturbed by outliers in one batch. For these cases, we design a variant of our new triplet loss as follows:

(7)

where represents the index of samples in a batch. The variant loss is more relaxed and not easily disturbed by the extreme samples in the same batch. It performs better on evenly distributed data sets.

Thus, we suggest to employ the variant triplet loss on uniform distribution data, e.g., datasets with only single class data. Our experiments results on the dataset of single class such as, CelebA and LSUN confirm it.

4 Theory Proof and Analysis

As discussed in the introduction, the optimal discriminator of most GANs is a divergence. In this section, we firstly prove that the proposed discriminator based on the relation net also has such property, and then show the distributional consistency under our Lipschitz continuous assumption.

4.1 A New Divergence

A divergence is a function of two variables , satisfies the following definition:

Definition 1 If is function of two variables , satisfies the following properties:

1.

2.

Then is a divergence between and .

Assumption 1 In the training process, when not reach the optimal , ought to be more realistic than , and ought to give bigger relation score to the paired input than . ought to be more realistic than also means, is bigger than

Under this assumption, we show the loss function of our relation discriminator is also a divergence in Supplementary 1.

4.2 Distributional Consistency

We use to denote the parameterized function discriminator and to denote the parameterized function of generator. Based on [32], we use the definition of Lipschitz assumption of data density as follows:

Definition 2 For any two samples and , the loss function is Lipschitz continuous with respect to a distance metric if

with a bounded Lipschitz constant , i.e.,

Assumption 2 The data density is supported in a compact set and it is Lipschitz continuous wrt with a bounded constant which is satisfied with Definition 2. Then we show the existence of Nash equilibrium such that both the function and the density of generated samples are Lipschitz. Same as the [32], we have both (, ) and (, ) are convex in and in . Then, according to the Sion’s theorem [36], with and being optimized, there exists a Nash equilibrium (, ) We also have the following lemma.

Under Assumption 2, there exists a Nash equilibrium (, ) such that both and are Lipschitz.Then we could prove that when reaching the Nash equilibrium, the density distribution of the samples generated by will converge to the real data distribution , which is the lemma 1 as follows:

Lemma 1 Under Assumption 2, for a Nash equilibrium in Lemma 1, we have

Thus, converges to . The proof of this lemma is given in the Supplementary 2.

 
(a) Vanilla GAN (b) WGAN (c) WGAN-GP (d) GAN-QP (e) Relation GAN
Figure 2: The numerical solution on Dirac GANs

4.3 The Convergence

In the literature, GANs are often treated as dynamic systems to study their training convergence [25][26][29][11]. This idea can be dated back to the Dirac GAN [25], which describes a simple yet prototypical counterexample for understanding whether the GAN training is locally nor globally convergent. To further analyze the convergence rate of training the proposed Relation GAN, we also adopt the Dirac GAN theory. However, [25] only discusses the situation where the data distributions are 1-D. We extend this theory into the 2-D case to gain better understanding.

Definition 3 The Dirac-GAN consists of a (univariate) generator distribution and a linear discriminator , where denotes the parameter of the generator,

is a 2-D vector, and

represents the parameter of the discriminator. The real data distribution is a Dirac-distribution concentrated at .

Suppose the real sample point is a vector , and the fake sample is being reorganized, which also represents a parameter of the generator. The discriminator uses the simplest linear model, i.e., , which also represents the parameters of the discriminator. Dirac GAN takes into account that in such a minimalist model, whether a false sample eventually converges to a true sample, in other words, whether a finally converges to . Specifically, in Relation GAN, our Dirac Discriminator could be simplified as: , where and denotes the parameter of the embedding module and relation module respectively.

Based on the dynamic analysis for GANs in Supplementary 3, we have the numerical solution of the GANs’ dynamic equations with a initial point as the fig 2 shows. In [25], the author find that most unregulared GANs are not locally convergent. In our 2-D Dirac GANs, the numerical solutions of the WGAN [1], WGAN-GP [9], GAN-QP [37], vanilla GAN [8] also perform oscillating near the real sample or hard to converge to the real sample point, while our Relation GAN success to converge. It indicates that our GAN has a good local convergence.

5 Experiments

We first evaluate the proposed Relation GAN on the 2D synthetic dataset and the Stacked MNIST dataset to demonstrate the diversity of generated data and the stability of generator. We then perform the image generation tasks with our method to show its superiority in synthesizing natural images. Finally, ablation study is conducted to verify the effects of the feature merging mechanism in relation nets and the proposed triplet loss.

(a) Vanilla GAN (b) LSGAN (c) WGAN-GP (d)Relativistic GAN (e) Relation GAN
Figure 3: Comparison on 2D datasets.

5.1 The Diversity of Generated Data

2D Datasets

We compare the effect of our relation discriminator on the 2D 8-Gaussian distribution, 2D 25-Gaussian distribution and 2D swissroll distribution. The experimental settings follow

[41]. The results generated by our method and four popular methods under the same setting are shown in Figure 3. Compared with the other methods, ours can better fit these 2D distributions.

Stacked MNIST For Stacked MNIST [17] experiments, we use the setting and code of  [41]

. Each of the three channels in each sample is classified by a pre-trained MNIST classifier, and the resulting three digits determine which of the 1000 modes the sample belongs to. We measure the number of modes captured with the pre-trained classifier. We choose Adam 

[15] optimizer for all experiments. Our results are shown in Table 5.1. We find that our Relation GAN could achieve best mode coverage, reaching all 1,000 modes.

Table 1: Stacked MNIST
Loss Modes
LSGAN 98510
WGANGP 6437
Vanilla GAN 92318
Relativastic GAN 82858
Ours 10000


5.2 Unconditional Image Generation

Datasets We provide comparison on four datasets, namely CIFAR-10 [4], CelebA [21], LSUN-BEDROOM [42] and CelebA-HQ [22]. The LSUN-BEDROOM dataset [42] contains 3M images which are randomly partitioned into a test set of around 30k images and a training set containing the rest. We use version of CelebA-HQ with 30k images. We only compare our method with Relativistic GAN and WGANGP on CelebA-HQ due to limited computation resources.

Settings For CIFAR-10, we use the Resnet [10] architecture proposed in [41](with spectral normalization layers removed). For CelebA, LSUN and CelebA-HQ, we used a DCGAN architecture as in [28]. We apply Adam optimizer on all experiments as Table 5.2 shows. We used 1 discriminator updates per generator update. The batch size used was 64. Other details of our experiments settings are provided in Supplementary. Table 2: Experiments Settings Dataset Iterations CIFAR-10 0.0002 0.0001 0.9 0.999 600k CelebA 0.0002 0.0001 0.9 0.999 400k LSUN 0.0001 0.0001 0 0.9 400k CelebA-HQ 0.0001 0.0001 0 0.9 250k

Evaluation To compare the sample quality of different models, we consider three different scores: IS [35], FID [11] and KID [3] which are based on the pre-trained Inception network [40]

on ImageNet 

[34].

Results and Analysis Some random generated samples on 3 data sets are shown in Figure LABEL:fig:Generated. More generated images and evaluation scores are provided in Supplementary 6. From Table 3 we could find RelatioGAN is also highly competitive on single class data sets i.e. CelebA, LSUN, while RelationGAN achieves the best performance on CIFAR-10. As we discussed in Sec.3.3, the variant loss of is more relaxed and suitable for evenly distributed data sets while the loss of in eq. (3.3) is more strict and performs better on multi-class or harder data sets (also performs best on Stacked MNIST).

CIFAR-10 CelebA
FID KID IS FID KID IS
Vanilla GAN 26.460.12 1.880.061 6.730.081 34.430.15 3.010.044 2.680.020
LS-GAN 14.90.061 1.310.056 7.740.12 19.630.11 1.840.045 2.50.021
WGAN-GP 63.560.14 8.010.068 3.560.038 66.060.27 9.060.081 2.600.029
Relativistic 23.960.15 1.880.061 0.0610.081 26.710.10 2.080.050 3.020.024
RelationGAN 13.520.060 1.260.052 7.740.18 25.370.14 2.070.044 2.650.029
RelationGAN 47.960.30 8.880.072 3.320.026 11.990.064 1.100.038 3.170.036
LSUN CelebA-HQ
FID KID IS FID KID IS
Vanilla GAN 38.17-0.28 6.610.076 4.570.010
LS-GAN 150.610.33 21.750.11 3.570.043
WGAN-GP 14.930.16 1.450.042 3.770.098 68.50.19 7.710.065 2.31 0.017
Relativistic 40.840.23 2.970.045 4.080.049 32.240.21 2.27 0.056 1.960.038
RelationGAN 70.240.37 5.890.078 4.40.056 27.870.17 2.210.047 2.130.0052
RelationGAN 12.590.11 1.370.038 3.700.081 26.170.12 2.620.043 2.150.030
Table 3: The Comparisons of FID, KID Score and IS. RelationGAN represents Relation GAN with objective function in equation (3.3) and RelationGAN represents Relation GAN with objective function in equation (3.3). The best two scores are shown in red and green, respectively.

5.3 Conditional Image Generation

We compare the MSGAN [23] which is one of the best conditional gan model on conditonal CIFAR-10 datasets. The experiment is applied by simply replace the MS-loss in [23] with the relation loss. Table 4 represents the results of FID.

MSGAN RelationGAN
FID 28.73 24.88
Table 4: The comparison of FID scores on the CIFAR-10 dataset.

5.4 Image Translation

In addition to image generation task, GANs also gains promising progress in image translation task. It has been shown a great success in ranges of image translation tasks, including style transfer, image enhance, image super resolution and image segmentation. We conduct three relative experiments on image style transfer and image super resolution, respectively.

Image Style Transfer For image style transfer task, we adopt the CycleGAN as our baseline model to translate Monet’s painting into photograph. FID score is applied to evaluate the quality of generated images. Table 5 shows the comparison of fid scores of generated images. The lower fid represents smaller perceptual difference between target domain images and generated images. We find the both relation loss and relation loss performs better than the oigianl adversarial loss in cycle-gan and the reltion loss performs best.

Image Super Resolution For Image Super Resolution task, we employ SRGAN [18] with the relastivistc loss which is the latest proposed loss for gans as our baseline. We denote our baseline as SRGAN. The train and val datasets are sampled from VOC2012. Train dataset has 16700 images and Val dataset has 425 images. We compare the psrn and ssim on three popular SR datasets: Set5 [43], Set14 [19] and Urban100 [12].

FID(MP) FID(PM)
CycleGAN 34.00 2.48
RelationGAN 33.60 2.26
RelationGAN 33.71 2.21
Table 5: The Comparison of FID Scores on Style Transfer Results. The MP represents painting to photo, the PM represents photo to painting. RelationGAN and RelationGAN represents loss equation 3.3 and equation 3.3, respectively.

Table 6 lists the psnr and ssim of different approaches on five datasets. We can observe that the fid scores of the proposed algorithm perform better than the original method on photopainting datasets.

Set5 Set14 Urban100
psnr ssim psnr ssim psnr ssim
SRGAN 28.40 0.82 25.37 0.73 23.36 0.71
RelationGAN 28.59 0.83 25.52 0.73 23.47 0.72
Table 6: Comparison of SRGAN and RelationGAN on benchmark data. Highest measures (PSNR (dB), SSIM) in bold. (4 upscaling)

5.5 Ablation Study

We conduct the ablation study on image generation datasets. We first compare our triplet loss with the siamese loss [7], whose results are shown in Table 7. The formulation of siamese loss function is shown in Supplementary 4. Second, we take a closer look on the impact of our embedding module and relation module. The “” in the Table 8 represents different architectures of discriminator, where the embedding module contains res-block and the relation module contains res-block. The “(0+3)” represents the samples are contacted together after first conv-layer and then put into the relation module (RM) which contains 3 res-block. The “no EM” represents the samples in which the paired input are packed in the beginning of the discriminator as [20]. All experiments are conducted on CIFAR-10.

Results and Analysis From Table 7, we could find the results of the proposed triplet loss is much better than Siamese loss. The “-” represents model collapse in training process. The results in Table 8 shows the bigger size of EM could enhance the performance which also demonstrates the effectiveness of our embedding strategy.

CIFAR-10 CelebA
Triplet 13.42 11.9
Siamese 107.3
Table 7: Ablation Losses
FID(CIFAR-10)
no EM 38.9
37.37
28.89
28.80
13.52
Table 8: Results of Different Architectures

6 Conclusion

In this paper we propose the Relation GANs. A relation network architecture is designed and used as the discriminator, which is trained to determine whether a paired input samples are from the same distribution or not. The generator is jointly trained with the discriminator to confuse its decision using a triplet loss.

Mathematically, we prove that the optimal discriminator based on the relation network is a divergence, indicating the distance of generated data distribution and the real data distribution becomes progressively smaller during the training process. We also prove the generated data distribution will converge to the real data distribution when getting to the Nash equilibrium. In addition, we analysis our method and several other GANs in dynamic system. We demonstrate our GAN has excellent convergence by analyzing the dynamic system of the Dirac GANs.

The results of experiments on simple 2D distribution data and Stacked MNIST verify the effectiveness of Relation GAN, especially in addressing the mode collapse problem. Our Relation GAN not only achieves state-of-the-art performance on unconditional and conditional image generation task with the basic architecture and training settings, but also achieves promising results in image translation tasks compared with other gan losses.

References

  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein generative adversarial networks. In

    Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017

    ,
    pp. 214–223. Cited by: §2.1, §2, §4.3.
  • [2] S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (2018) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, neurips 2018, 3-8 december 2018, montréal, canada. Cited by: §1.
  • [3] M. Binkowski, D. J. Sutherland, M. Arbel, and A. Gretton (2018) Demystifying MMD gans. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Cited by: §5.2.
  • [4] Y. Boureau, F. R. Bach, Y. LeCun, and J. Ponce (2010) Learning mid-level features for recognition. In

    The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010

    ,
    pp. 2559–2566. Cited by: §5.2.
  • [5] A. Brock, J. Donahue, and K. Simonyan (2018) Large scale GAN training for high fidelity natural image synthesis. CoRR abs/1809.11096. Cited by: §2.
  • [6] H. Cai, C. Bai, Y. Tai, and C. Tang (2018) Deep video generation, prediction and completion of human action sequences. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II, pp. 374–390. Cited by: §1.
  • [7] V. K. B. G, G. Carneiro, and I. D. Reid (2016) Learning local image descriptors with deep siamese and triplet convolutional networks by minimizing global loss functions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 5385–5394. Cited by: §3.2, §5.5.
  • [8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 2672–2680. Cited by: §1, §1, §2, §3.2, §4.3.
  • [9] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Conference and Workshop on Neural Information Processing Systems, pp. 5769–5779. Cited by: §1, §1, §2.1, §2, §4.3.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. See DBLP:conf/cvpr/2016, pp. 770–778. External Links: Link, Document Cited by: §5.2.
  • [11] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 6629–6640. Cited by: §4.3, §5.2.
  • [12] J. Huang, A. Singh, and N. Ahuja (2015) Single image super-resolution from transformed self-exemplars. See DBLP:conf/cvpr/2015, pp. 5197–5206. External Links: Link, Document Cited by: §5.4.
  • [13] A. Jolicoeur-Martineau (2018) The relativistic discriminator: a key element missing from standard GAN. CoRR abs/1807.00734. Cited by: §1, §1, §1, §2.2, §2.
  • [14] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of gans for improved quality, stability, and variation. See DBLP:conf/iclr/2018, External Links: Link Cited by: §2.
  • [15] D. P. Kingma and J. Ba (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980. Cited by: §5.1.
  • [16] N. Kodali, J. D. Abernethy, J. Hays, and Z. Kira (2018) On convergence and stability of gans. CoRR. Cited by: §2.1, §2.
  • [17] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E. Hubbard, and L. D. Jackel (1989) Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 2, [NIPS Conference, Denver, Colorado, USA, November 27-30, 1989], pp. 396–404. Cited by: §1, §5.1.
  • [18] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi (2017) Photo-realistic single image super-resolution using a generative adversarial network. See DBLP:conf/cvpr/2017, pp. 105–114. External Links: Link, Document Cited by: §5.4.
  • [19] X. Li and M. T. Orchard (2001)

    New edge-directed interpolation

    .
    IEEE Trans. Image Processing 10 (10), pp. 1521–1527. External Links: Link, Document Cited by: §5.4.
  • [20] Z. Lin, A. Khetan, G. C. Fanti, and S. Oh (2018) PacGAN: the power of two samples in generative adversarial networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pp. 1505–1514. Cited by: §5.5.
  • [21] Z. Liu, P. Luo, X. Wang, and X. Tang (2015) Deep learning face attributes in the wild. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 3730–3738. Cited by: §1, §5.2.
  • [22] M. Lucic, K. Kurach, M. Michalski, S. Gelly, O. Bousquet, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (2018) Are gans created equal? a large-scale study. In Advances in Neural Information Processing Systems 31, pp. 700–709. Cited by: §1, §5.2.
  • [23] Q. Mao, H. Lee, H. Tseng, S. Ma, and M. Yang (2019) Mode seeking generative adversarial networks for diverse image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 1429–1437. Cited by: §5.3.
  • [24] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley (2017) Least squares generative adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2813–2821. Cited by: §1, §2.
  • [25] L. M. Mescheder, A. Geiger, and S. Nowozin (2018) Which training methods for gans do actually converge?. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 3478–3487. Cited by: §4.3, §4.3.
  • [26] L. M. Mescheder, S. Nowozin, and A. Geiger (2017) The numerics of gans. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 1823–1833. Cited by: §4.3.
  • [27] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein (2017) Unrolled generative adversarial networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, Cited by: §1.
  • [28] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida (2018) Spectral normalization for generative adversarial networks. CoRR abs/1802.05957. Cited by: §1, §2, §5.2.
  • [29] V. Nagarajan and J. Z. Kolter (2017) Gradient descent GAN optimization is locally stable. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 5591–5600. Cited by: §4.3.
  • [30] S. Pascual, A. Bonafonte, and J. Serrà (2017) SEGAN: speech enhancement generative adversarial network. CoRR abs/1703.09452. Cited by: §1.
  • [31] (2017) Photo-realistic single image super-resolution using a generative adversarial network. Cited by: §1.
  • [32] G. Qi (2017) Loss-sensitive generative adversarial networks on lipschitz densities. CoRR abs/1701.06264. Cited by: §2.1, §2, §4.2, §4.2.
  • [33] A. Radford, L. Metz, and S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Cited by: §2.
  • [34] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115 (3), pp. 211–252. Cited by: §5.2.
  • [35] T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pp. 2226–2234. Cited by: §5.2.
  • [36] M. Sion (1958) On general minimax theorems.. Pacific J. Math. 8 (1), pp. 171–176. Cited by: §4.2.
  • [37] J. Su (2018-11) GAN-qp: a novel gan framework without gradient vanishing and lipschitz constraint. pp. . Cited by: §2, §4.3.
  • [38] S. Subramanian, S. Rajeswar, F. Dutil, C. Pal, and A. C. Courville (2017) Adversarial generation of natural language. In Proceedings of the 2nd Workshop on Representation Learning for NLP, Rep4NLP@ACL 2017, Vancouver, Canada, August 3, 2017, pp. 241–251. Cited by: §1.
  • [39] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. S. Torr, and T. M. Hospedales (2018) Learning to compare: relation network for few-shot learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 1199–1208. Cited by: §3.1.
  • [40] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. Cited by: §5.2.
  • [41] H. Thanh-Tung, T. Tran, and S. Venkatesh (2019) Improving generalization and stability of generative adversarial networks. In International Conference on Learning Representations, External Links: Link Cited by: §5.1, §5.1, §5.2.
  • [42] F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR abs/1506.03365. Cited by: §1, §5.2.
  • [43] R. Zeyde, M. Elad, and M. Protter (2010) On single image scale-up using sparse-representations. See DBLP:conf/cas/2010, pp. 711–730. External Links: Link, Document Cited by: §5.4.
  • [44] H. Zhang, I. J. Goodfellow, D. N. Metaxas, and A. Odena (2018) Self-attention generative adversarial networks. CoRR abs/1805.08318. Cited by: §1.
  • [45] H. Zhang, I. J. Goodfellow, D. N. Metaxas, and A. Odena (2018) Self-attention generative adversarial networks. CoRR abs/1805.08318. Cited by: §2.