Since first proposed in 
, generative adversarial networks (GANs) have witnessed a rapid development and found numerous applications in many computer vision tasks, such as image generation[8, 13, 44], person re-identification 
, image super-resolution
etc. It also has been extended to natural language processing, video sequence synthesis , and speech synthesis  recently.
Though tremendous success has been achieved in many fields, training GANs is still a very tricky process and suffers from many issues, including the instability between the generator and the discriminator as well as the extremely subtle sensitivity to network architecture and hyper-parameters. It has been proved that most of these issues are due to the fact that the support of both target distribution and generated distribution are often of low dimension regarding to the base space, and therefore misaligned at most of the time, causing discriminator to collapse to a function that hardly provides gradients to the generator.
To remedy this issue, recent works proposed to leverage the Integral Probability Metric (IPM), such as Gradient Penalty and Spectral Normalization . In IPM-based GANs, the discriminator is constrained to a specific class of function so that it does not grow too quickly and thus alleviates vanishing gradients.
However, the existing IPM methods also have their limits. For instance, the hyperparameter tuning of gradient penalty is mostly empirical, while the spectral normalization imposes constrains on every conv-layers which hinders the learning capacity of discriminators.
In , the authors argue that non-IPM-based GANs are missing a relativistic discriminator, which IPM-based GANs already possess. The relativistic discriminator is necessary to make the training process analogous to divergence minimization and produce sensible predictions based on the prior knowledge that half of the samples in the mini-batch are fake. Although they have shown the power of relativistic discriminator, the potential of comparing the relation between real and fake distribution still remains to be explored.
In this paper, we explicitly study the effect of relation comparison in GANs by training the discriminator to determine whether the input paired samples are drawn from the same distribution (either real or fake). A relation network is present, acting as the discriminator. A new triplet loss is also designed for training the GANs. In this way, the before-mentioned problem of the disjointed support could be alleviated by projecting and merging the low dimension data distribution into a high dimension feature space.
Mathematically, we prove our new triplet loss is a divergence and could achieve the Nash equilibrium leading to convergence of the generated data distribution to the real distribution. In addition, we analyze the oscillatory behavior that GANs exhibit for the Dirac-GAN and we demonstrate the proposed Relation GAN is locally convergent even with no regularized methods.
Extensive experiments are conducted on conditional and unconditional image generation and image translation tasks. The promising performance demonstrates the proposed relation gan has great potential in various applications of GANs.
In summary, the contributions of this paper are two folds.
We propose a new training strategy for GANs to better leverage the relation between samples. Instead of separating real samples from generated ones, the discriminator is trained to determine whether a paired samples are from the same distribution.
We propose a relation network architecture as the discriminator and a triplet loss for training GANs. We show both theoretically and empirically that the relation network together with the triplet loss give rise to generated density which can exactly match that of real data.
Extensive experiments on 2D grid , Stacked MNIST , CelebA , LSUN , CelebA-HQ  data sets confirm our proposed method performs favourably against state-of-the-arts such as relativistic GAN , WGAN-GP , Least Squares GAN (LSGAN)  and vanilla GAN .
2 Related Work
The vanilla GAN  minimizes the JS divergence of two distributions, leading to the gradient vanishing problem when the two distributions are disjoint. Recent works try to address this issue by designing new objective functions [24, 32, 37, 1] or more sophisticated network architectures [14, 45, 5, 33]. Others investigate the regularization and/or normalization to constrain the ability of discriminator [28, 9, 16]. Recently, a new method  is proposed to explore a relativistic discriminator. In the following, we will review recent works using different objective functions and a special case–relativistic GANs, which are closely related to our approaches.
2.1 Different Objective Functions in GANs
Generally, there are two kind of loss functions in GANs: the minimax GAN and the non-saturating (NS) GAN. In the former the discriminator minimizes the negative loglikelihood for the binary classification task. In the latter the generator maximizes the probability of generated samples being real. The non-saturating loss as it is known to outperform the minimax variant empirically. Among them, loss sensitive GAN tries to solve the problem of gradient vanishing by focusing on training samples with low authenticity. WGAN  proposes the Wasserstein distance to replace the JS divergence, which can measure their distance even though the two distributions are disjoint. In addition,  also proposes to add noise to both real and generated samples to further alleviate the impact of disjoint distributions.  improves WGAN by replacing the weight clipping constraints with a gradient penalty, which enforces the Lipschitz constraint on the discriminator by punishing the norm of the gradient. DRAGAN  combines the two parts of WGAN and LSGAN, and only improves the loss function to a certain extent. The stability of loss training is controlled by constantly updating the coefficient of the latter term.
2.2 Relativistic GANs
Instead of training discriminators to predict the absolute probabilities of the input samples being real, the relativistic GAN 
proposes to use a relativistic discriminator, which estimates the probability of the given real sample being more realistic than a randomly sampled fake sample. Although bears a similar spirit, our method differs from in that we adopt a relation network as the discriminator to estimate the relation score of a paired input. In comparison, the discriminator in  treats input samples separately and relies on a ranking loss (e.g., hinge loss) to explore their relation. The idea of merging the features and comparing the relation between samples from two distribution has not been explored in the literature of GANs. In addition, our method proposes a new triplet loss to leverage the power of paired relation comparison, allowing more stability and better diversity for GANs without applying any IPM methods.
3 The Relation GAN Framework
3.1 Relation Net Architecture
In traditional GANs, a discriminator is trained to distinguish real samples from fake ones and a generator is trained to confuse the discriminator by generating realistic samples. Consider a real data distribution , and the data distribution produced by the generator . Rather than training the discriminator on real and fake data independently, we propose to train a discriminator which predicts a relation score for a paired input, indicating whether the paired samples are from the same distribution (either or ).
Inspired by the success of relation net architecture in other computer vision areas , our discriminator consists of two modules, including an embedding module and a relation module as shown in Figure 1. For a pair of input samples, the embedding module firstly maps each sample into a high dimensional feature space. Their corresponding features are then merged and fed into the relation module to produce the relation score for the input pair. For ease of description, we name paired inputs containing both real and fake samples as asymmetric pairs, and those containing samples from the same distribution (either real or fake) as symmetric pairs. The training process is then formulated as a min-max game (See Section 3.2), where the discriminator aims to maximize the relation scores of asymmetric sample pairs and minimize those of symmetric ones. Meanwhile, the generator is trained to confuse the discriminator by minimizing the relation scores of asymmetric sample pairs containing real and generated samples.
3.2 The Min-Max Game
The min-max game in training GANs is conducted by optimizing the losses of and iteratively. In the no-IPM GANs, the generalized losses of and can be presented as follows:
where and are scalar-to-scalar functions, is the distribution of real data, and denotes the generated data distribution.
In our Relation GAN, the formulation of the losses functions of and are as follows:
where and are also scalar to scalar function.
The goal of relation discriminator is to learn a loss function parameterized by which separates symmetric and asymmetric sample pairs by a desired margin. Then the generator can be trained to minimize this margin by generating realistic samples.
Inspired by the success of triplet loss , we formulate the similar loss function in our Relation GAN as follows:
where and are samples from the real data distribution, is sample from the generated data distribution and is sample from the data generated by the generator in the last step of optimization. We use a distance metric to replace the constant ‘margin’ in the original triplet loss. This variable constraining leads to a smaller difference of relation scores when the distance between the two compared samples are smaller, which is more flexible than the original fixed margin. Our experiments also shows the superiority of our new triplet loss with margin.
3.3 A Variant Loss
Since the training batch size is limited, the sampled distribution of each batch may deviates from the real data distribution. For an input batch of paired samples, the loss function in (3.2) can be written as follows:
where . Our triplet loss is designed to reduce the relation scores of symmetric sample pairs and increase those of asymmetric ones.
However, when the real sample distribution is fairly uniform with small variance, the original loss is rigorous and prone to be disturbed by outliers in one batch. For these cases, we design a variant of our new triplet loss as follows:
where represents the index of samples in a batch. The variant loss is more relaxed and not easily disturbed by the extreme samples in the same batch. It performs better on evenly distributed data sets.
Thus, we suggest to employ the variant triplet loss on uniform distribution data, e.g., datasets with only single class data. Our experiments results on the dataset of single class such as, CelebA and LSUN confirm it.
4 Theory Proof and Analysis
As discussed in the introduction, the optimal discriminator of most GANs is a divergence. In this section, we firstly prove that the proposed discriminator based on the relation net also has such property, and then show the distributional consistency under our Lipschitz continuous assumption.
4.1 A New Divergence
A divergence is a function of two variables , satisfies the following definition:
Definition 1 If is function of two variables , satisfies the following properties:
Then is a divergence between and .
Assumption 1 In the training process, when not reach the optimal , ought to be more realistic than , and ought to give bigger relation score to the paired input than . ought to be more realistic than also means, is bigger than
Under this assumption, we show the loss function of our relation discriminator is also a divergence in Supplementary 1.
4.2 Distributional Consistency
We use to denote the parameterized function discriminator and to denote the parameterized function of generator. Based on , we use the definition of Lipschitz assumption of data density as follows:
Definition 2 For any two samples and , the loss function is Lipschitz continuous with respect to a distance metric if
with a bounded Lipschitz constant , i.e.,
Assumption 2 The data density is supported in a compact set and it is Lipschitz continuous wrt with a bounded constant which is satisfied with Definition 2. Then we show the existence of Nash equilibrium such that both the function and the density of generated samples are Lipschitz. Same as the , we have both (, ) and (, ) are convex in and in . Then, according to the Sion’s theorem , with and being optimized, there exists a Nash equilibrium (, ) We also have the following lemma.
Under Assumption 2, there exists a Nash equilibrium (, ) such that both and are Lipschitz.Then we could prove that when reaching the Nash equilibrium, the density distribution of the samples generated by will converge to the real data distribution , which is the lemma 1 as follows:
Lemma 1 Under Assumption 2, for a Nash equilibrium in Lemma 1, we have
Thus, converges to . The proof of this lemma is given in the Supplementary 2.
|(a) Vanilla GAN||(b) WGAN||(c) WGAN-GP||(d) GAN-QP||(e) Relation GAN|
4.3 The Convergence
In the literature, GANs are often treated as dynamic systems to study their training convergence , , , . This idea can be dated back to the Dirac GAN , which describes a simple yet prototypical counterexample for understanding whether the GAN training is locally nor globally convergent. To further analyze the convergence rate of training the proposed Relation GAN, we also adopt the Dirac GAN theory. However,  only discusses the situation where the data distributions are 1-D. We extend this theory into the 2-D case to gain better understanding.
Definition 3 The Dirac-GAN consists of a (univariate) generator distribution and a linear discriminator , where denotes the parameter of the generator,
is a 2-D vector, andrepresents the parameter of the discriminator. The real data distribution is a Dirac-distribution concentrated at .
Suppose the real sample point is a vector , and the fake sample is being reorganized, which also represents a parameter of the generator. The discriminator uses the simplest linear model, i.e., , which also represents the parameters of the discriminator. Dirac GAN takes into account that in such a minimalist model, whether a false sample eventually converges to a true sample, in other words, whether a finally converges to . Specifically, in Relation GAN, our Dirac Discriminator could be simplified as: , where and denotes the parameter of the embedding module and relation module respectively.
Based on the dynamic analysis for GANs in Supplementary 3, we have the numerical solution of the GANs’ dynamic equations with a initial point as the fig 2 shows. In , the author find that most unregulared GANs are not locally convergent. In our 2-D Dirac GANs, the numerical solutions of the WGAN , WGAN-GP , GAN-QP , vanilla GAN  also perform oscillating near the real sample or hard to converge to the real sample point, while our Relation GAN success to converge. It indicates that our GAN has a good local convergence.
We first evaluate the proposed Relation GAN on the 2D synthetic dataset and the Stacked MNIST dataset to demonstrate the diversity of generated data and the stability of generator. We then perform the image generation tasks with our method to show its superiority in synthesizing natural images. Finally, ablation study is conducted to verify the effects of the feature merging mechanism in relation nets and the proposed triplet loss.
|(a) Vanilla GAN||(b) LSGAN||(c) WGAN-GP||(d)Relativistic GAN||(e) Relation GAN|
5.1 The Diversity of Generated Data
We compare the effect of our relation discriminator on the 2D 8-Gaussian distribution, 2D 25-Gaussian distribution and 2D swissroll distribution. The experimental settings follow. The results generated by our method and four popular methods under the same setting are shown in Figure 3. Compared with the other methods, ours can better fit these 2D distributions.
. Each of the three channels in each sample is classified by a pre-trained MNIST classifier, and the resulting three digits determine which of the 1000 modes the sample belongs to. We measure the number of modes captured with the pre-trained classifier. We choose Adam optimizer for all experiments. Our results are shown in Table 5.1. We find that our Relation GAN could achieve best mode coverage, reaching all 1,000 modes.
5.2 Unconditional Image Generation
Datasets We provide comparison on four datasets, namely CIFAR-10 , CelebA , LSUN-BEDROOM  and CelebA-HQ . The LSUN-BEDROOM dataset  contains 3M images which are randomly partitioned into a test set of around 30k images and a training set containing the rest. We use version of CelebA-HQ with 30k images. We only compare our method with Relativistic GAN and WGANGP on CelebA-HQ due to limited computation resources.
Settings For CIFAR-10, we use the Resnet  architecture proposed in (with spectral normalization layers removed). For CelebA, LSUN and CelebA-HQ, we used a DCGAN architecture as in . We apply Adam optimizer on all experiments as Table 5.2 shows. We used 1 discriminator updates per generator update. The batch size used was 64. Other details of our experiments settings are provided in Supplementary.
Evaluation To compare the sample quality of different models, we consider three different scores: IS , FID  and KID  which are based on the pre-trained Inception network 
Results and Analysis Some random generated samples on 3 data sets are shown in Figure LABEL:fig:Generated. More generated images and evaluation scores are provided in Supplementary 6. From Table 3 we could find RelatioGAN is also highly competitive on single class data sets i.e. CelebA, LSUN, while RelationGAN achieves the best performance on CIFAR-10. As we discussed in Sec.3.3, the variant loss of is more relaxed and suitable for evenly distributed data sets while the loss of in eq. (3.3) is more strict and performs better on multi-class or harder data sets (also performs best on Stacked MNIST).
5.3 Conditional Image Generation
We compare the MSGAN  which is one of the best conditional gan model on conditonal CIFAR-10 datasets. The experiment is applied by simply replace the MS-loss in  with the relation loss. Table 4 represents the results of FID.
5.4 Image Translation
In addition to image generation task, GANs also gains promising progress in image translation task. It has been shown a great success in ranges of image translation tasks, including style transfer, image enhance, image super resolution and image segmentation. We conduct three relative experiments on image style transfer and image super resolution, respectively.
Image Style Transfer For image style transfer task, we adopt the CycleGAN as our baseline model to translate Monet’s painting into photograph. FID score is applied to evaluate the quality of generated images. Table 5 shows the comparison of fid scores of generated images. The lower fid represents smaller perceptual difference between target domain images and generated images. We find the both relation loss and relation loss performs better than the oigianl adversarial loss in cycle-gan and the reltion loss performs best.
Image Super Resolution For Image Super Resolution task, we employ SRGAN  with the relastivistc loss which is the latest proposed loss for gans as our baseline. We denote our baseline as SRGAN. The train and val datasets are sampled from VOC2012. Train dataset has 16700 images and Val dataset has 425 images. We compare the psrn and ssim on three popular SR datasets: Set5 , Set14  and Urban100 .
Table 6 lists the psnr and ssim of different approaches on five datasets. We can observe that the fid scores of the proposed algorithm perform better than the original method on photopainting datasets.
5.5 Ablation Study
We conduct the ablation study on image generation datasets. We first compare our triplet loss with the siamese loss , whose results are shown in Table 7. The formulation of siamese loss function is shown in Supplementary 4. Second, we take a closer look on the impact of our embedding module and relation module. The “” in the Table 8 represents different architectures of discriminator, where the embedding module contains res-block and the relation module contains res-block. The “(0+3)” represents the samples are contacted together after first conv-layer and then put into the relation module (RM) which contains 3 res-block. The “no EM” represents the samples in which the paired input are packed in the beginning of the discriminator as . All experiments are conducted on CIFAR-10.
Results and Analysis From Table 7, we could find the results of the proposed triplet loss is much better than Siamese loss. The “-” represents model collapse in training process. The results in Table 8 shows the bigger size of EM could enhance the performance which also demonstrates the effectiveness of our embedding strategy.
In this paper we propose the Relation GANs. A relation network architecture is designed and used as the discriminator, which is trained to determine whether a paired input samples are from the same distribution or not. The generator is jointly trained with the discriminator to confuse its decision using a triplet loss.
Mathematically, we prove that the optimal discriminator based on the relation network is a divergence, indicating the distance of generated data distribution and the real data distribution becomes progressively smaller during the training process. We also prove the generated data distribution will converge to the real data distribution when getting to the Nash equilibrium. In addition, we analysis our method and several other GANs in dynamic system. We demonstrate our GAN has excellent convergence by analyzing the dynamic system of the Dirac GANs.
The results of experiments on simple 2D distribution data and Stacked MNIST verify the effectiveness of Relation GAN, especially in addressing the mode collapse problem. Our Relation GAN not only achieves state-of-the-art performance on unconditional and conditional image generation task with the basic architecture and training settings, but also achieves promising results in image translation tasks compared with other gan losses.
Wasserstein generative adversarial networks.
Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 214–223. Cited by: §2.1, §2, §4.3.
-  (2018) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, neurips 2018, 3-8 december 2018, montréal, canada. Cited by: §1.
-  (2018) Demystifying MMD gans. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Cited by: §5.2.
Learning mid-level features for recognition.
The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pp. 2559–2566. Cited by: §5.2.
-  (2018) Large scale GAN training for high fidelity natural image synthesis. CoRR abs/1809.11096. Cited by: §2.
-  (2018) Deep video generation, prediction and completion of human action sequences. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II, pp. 374–390. Cited by: §1.
-  (2016) Learning local image descriptors with deep siamese and triplet convolutional networks by minimizing global loss functions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 5385–5394. Cited by: §3.2, §5.5.
-  (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 2672–2680. Cited by: §1, §1, §2, §3.2, §4.3.
-  (2017) Improved training of wasserstein gans. In Conference and Workshop on Neural Information Processing Systems, pp. 5769–5779. Cited by: §1, §1, §2.1, §2, §4.3.
-  (2016) Deep residual learning for image recognition. See DBLP:conf/cvpr/2016, pp. 770–778. External Links: Cited by: §5.2.
-  (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 6629–6640. Cited by: §4.3, §5.2.
-  (2015) Single image super-resolution from transformed self-exemplars. See DBLP:conf/cvpr/2015, pp. 5197–5206. External Links: Cited by: §5.4.
-  (2018) The relativistic discriminator: a key element missing from standard GAN. CoRR abs/1807.00734. Cited by: §1, §1, §1, §2.2, §2.
-  (2018) Progressive growing of gans for improved quality, stability, and variation. See DBLP:conf/iclr/2018, External Links: Cited by: §2.
-  (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980. Cited by: §5.1.
-  (2018) On convergence and stability of gans. CoRR. Cited by: §2.1, §2.
-  (1989) Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 2, [NIPS Conference, Denver, Colorado, USA, November 27-30, 1989], pp. 396–404. Cited by: §1, §5.1.
-  (2017) Photo-realistic single image super-resolution using a generative adversarial network. See DBLP:conf/cvpr/2017, pp. 105–114. External Links: Cited by: §5.4.
New edge-directed interpolation. IEEE Trans. Image Processing 10 (10), pp. 1521–1527. External Links: Cited by: §5.4.
-  (2018) PacGAN: the power of two samples in generative adversarial networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pp. 1505–1514. Cited by: §5.5.
-  (2015) Deep learning face attributes in the wild. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 3730–3738. Cited by: §1, §5.2.
-  (2018) Are gans created equal? a large-scale study. In Advances in Neural Information Processing Systems 31, pp. 700–709. Cited by: §1, §5.2.
-  (2019) Mode seeking generative adversarial networks for diverse image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 1429–1437. Cited by: §5.3.
-  (2017) Least squares generative adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2813–2821. Cited by: §1, §2.
-  (2018) Which training methods for gans do actually converge?. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 3478–3487. Cited by: §4.3, §4.3.
-  (2017) The numerics of gans. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 1823–1833. Cited by: §4.3.
-  (2017) Unrolled generative adversarial networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, Cited by: §1.
-  (2018) Spectral normalization for generative adversarial networks. CoRR abs/1802.05957. Cited by: §1, §2, §5.2.
-  (2017) Gradient descent GAN optimization is locally stable. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 5591–5600. Cited by: §4.3.
-  (2017) SEGAN: speech enhancement generative adversarial network. CoRR abs/1703.09452. Cited by: §1.
-  (2017) Photo-realistic single image super-resolution using a generative adversarial network. Cited by: §1.
-  (2017) Loss-sensitive generative adversarial networks on lipschitz densities. CoRR abs/1701.06264. Cited by: §2.1, §2, §4.2, §4.2.
-  (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Cited by: §2.
-  (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115 (3), pp. 211–252. Cited by: §5.2.
-  (2016) Improved techniques for training gans. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pp. 2226–2234. Cited by: §5.2.
-  (1958) On general minimax theorems.. Pacific J. Math. 8 (1), pp. 171–176. Cited by: §4.2.
-  (2018-11) GAN-qp: a novel gan framework without gradient vanishing and lipschitz constraint. pp. . Cited by: §2, §4.3.
-  (2017) Adversarial generation of natural language. In Proceedings of the 2nd Workshop on Representation Learning for NLP, Rep4NLP@ACL 2017, Vancouver, Canada, August 3, 2017, pp. 241–251. Cited by: §1.
-  (2018) Learning to compare: relation network for few-shot learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 1199–1208. Cited by: §3.1.
-  (2016) Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. Cited by: §5.2.
-  (2019) Improving generalization and stability of generative adversarial networks. In International Conference on Learning Representations, External Links: Cited by: §5.1, §5.1, §5.2.
-  (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR abs/1506.03365. Cited by: §1, §5.2.
-  (2010) On single image scale-up using sparse-representations. See DBLP:conf/cas/2010, pp. 711–730. External Links: Cited by: §5.4.
-  (2018) Self-attention generative adversarial networks. CoRR abs/1805.08318. Cited by: §1.
-  (2018) Self-attention generative adversarial networks. CoRR abs/1805.08318. Cited by: §2.