A Simple yet Effective Way for Improving the Performance of GANs

11/19/2019 ∙ by Yong-Goo Shin, et al. ∙ Korea University 0

This paper presents a simple but effective way that improves the performance of generative adversarial networks (GANs) without imposing the training overhead or modifying the network architectures of existing methods. The proposed method employs a novel cascading rejection (CR) module for discriminator, which extracts multiple non-overlapped features in an iterative manner. The CR module supports the discriminator to effectively distinguish between real and generated images, which results in a strong penalization to the generator. In order to deceive the robust discriminator containing the CR module, the generator produces the images that are more similar to the real images. Since the proposed CR module requires only a few simple vector operations, it can be readily applied to existing frameworks with marginal training overheads. Quantitative evaluations on various datasets including CIFAR-10, Celeb-HQ, LSUN, and tiny-ImageNet confirm that the proposed method significantly improves the performance of GANs and conditional GANs in terms of Frechet inception distance (FID) indicating the diversity and visual appearance of the generated images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Generative adversarial networks (GANs) [7]

based on deep convolutional neural networks (CNNs) have shown considerable success to capture complex and high-dimensional image data, and been utilized to numerous applications including image-to-image translation 

[12, 4, 36]

, image inpainting 

[32, 26, 24], and text-to-image translation [23, 11]. Despite the recent advances, however, the training of the GANs is known to be unstable and sensitive to the choices of hyper-parameters [33]. To address this problem, some researchers proposed the novel generator and discriminator structures [13, 33, 35]. These methods effectively improves image generation task on challenging datasets such as ImageNet [15] but are difficult to apply to other various applications since they impose training overhead or need to modify the network architectures.

Several works [1, 8, 19, 21, 17, 3]

attempted to stabilize the training of the GANs by using novel loss functions or regularization terms. Arjovsky 

et al[1] applied the Wasserstein distance to adversarial loss function, which shows better training stability than the original loss function. Gulrajani et al[8] extended the method in [1] by adding the gradient regularization term, called gradient penalty, to further stabilize the training procedure. Miyato et al[19] proposed the weight normalization technique called spectral normalization which limits the spectral norm of weight matrices to stabilize the training of the discriminator. By combining the projection-based discriminator in [20], this approach significantly improved the performance of the image generation task on ImageNet [15].

Recently, Chen et al[3]

combined the GANs with self-supervised learning by adding an auxiliary loss function. Although this method improves the performance of the GANs, it needs an additional task-specific network and objective functions for self-supervised learning, which results in extra computational loads on training procedure. Mao 

et al[17] proposed a simple regularization term which maximizes the ratio of the distance between images with respect to the distance between latent vectors. This technique imposes no training overheads and does not require the network structure modification, which makes it readily applicable to various applications.

Inspired by the method in [17], this paper presents a simple yet effective way that greatly improves the performance of the GANs without modifying the original network architectures or imposing the training overhead. In general, the discriminator of GANs extracts features using multiple convolutional layers and predicts whether the input image is real or fake using a fully connected layer which produces a single scalar value, i.e

. probability value. Indeed, the operation of fully connected layer predicting the single probability value is equivalent to the inner product operation. In other words, to predict the probability, this layer conducts the inner product between a single embedding vector,

i.e. a weight vector, and an image feature vector obtained via CNNs. In the inner product process, however, the discriminator unintentionally ignores the part of the feature space which is perpendicular to the weight vector. Since the generator is trained through adversarial learning which focuses on deceiving the discriminator, it produces an image without considering the ignored feature space. For instance, if the discriminator first learns the global structure for distinguishing between the real and generated images, the generator will naturally attempt to produce the images having a similar global structure with real images without considering local structure. In other words, the generator fails to fully capture the complex and high-dimensional feature space on the image data.

Figure 1: Example of the inner product. In the inner product process, the feature space which is perpendicular to w is ignored.

To alleviate this problem, we propose a novel cascading rejection (CR) module which extracts different features in an iterative procedure. The CR module leads the discriminator to effectively distinguish between real and generated images, which results in a strong penalization to the generator. In order to deceive the robust discriminator having the CR module, the generator generates the images that are more similar to the real images. Since the proposed CR module needs only a few simple vector operations, it can be readily applied to existing frameworks with marginal training overheads. We conducted extensive experiments on various datasets including CIFAR-10 [28], Celeb-HQ [13, 16], LSUN [31], and tiny-ImageNet [5, 30]. Experimental results show that the proposed method significantly improves the performance of GANs and conditional GANs in terms of a Frechet inception distance (FID) indicating the diversity and visual appearance of the generated images.

In summary, in this paper we present:

  • A simple but effective technique for improving the performance of GANs without imposing the training overhead and modifying the network structures.

  • A novel CR module which guides the discriminator to consider the non-overlapped features of the images for distinguishing between real and generated images. By strongly penalizing the generator through the discriminator having the CR module, the proposed method significantly improves the performance of the GANs and conditional GANs in terms of the FID.

2 Preliminaries

2.1 Generative adversarial networks

Typically, the GANs [7] consist of the generator and the discriminator . In the GANs, both networks are simultaneously trained: is trained to create a new image which is indistinguishable from real images, whereas is optimized to differentiate between real and generated images. This relation can be considered as a two-player min-max game where and compete with each other. Formally, the  () is trained to minimize (maximize) the loss function, called adversarial loss, as follows:

(1)

where and denote a random noise vector and a real image sampled from the noise and real data distribution , respectively. It is worth noting that and are scalar values indicating the probabilities that and came from the data distribution.

Figure 2:

Example of the problem of inner product in the discriminator. Even if the generator generates low-quality images having the ignored feature, the discriminator cannot classify real and generated images.

Figure 3: The illustration of the CR module.

The conditional GANs (cGANs) which aim at producing the class conditional images have been actively researched [18, 21, 19, 34]. The cGANs techniques usually add conditional information , such as class labels or text condition, to both generator and discriminator in order to control the data generation process in a supervised manner. This can be formally expressed as follows:

(2)

By training the networks based on the above equation, the generator can select an image category to be generated, which is not possible when employing the standard GANs framework.

2.2 Revisit the Fully Connected Layer

To train the discriminator using Eqs. 1 and 2, the discriminator should produce the single scalar value as an output. To this end, in the last layer, the discriminator usually employs a fully-connected layer with a single output channel which acts like the inner product between an embedding vector w, i.e. weight vector, and an image feature vector v obtained through the multiple convolutional layers. Even if the last layer consists of several pixels such as PatchGAN [12], the discriminator conducts the inner product for each pixel and averages all values for the adversarial loss function.

The inner product of v onto w is illustrated in Fig.1. As shown in Fig. 1, the inner product produces the scalar value V, but it ignores the part of the feature space which is perpendicular to the w. In other words, the discriminator only considers the feature space which is parallel to the w when predicting the probability value for the adversarial loss. This problem often makes the discriminator difficult to effectively penalize the generator. For instance, as shown in Fig. 2, even if the generator produces low-quality images having the ignored feature space, the discriminator cannot distinguish between real and generated images. In other words, the generator can minimize the adversarial loss by producing the low-quality images having the ignored feature space, which results in the performance degradation of the generator. To alleviate this problem, this paper proposes the CR module which encourages the discriminator to consider ignored feature space in the last layer.

3 Proposed Method

3.1 Cascading rejection module

In Euclidean space, the fully-connected layer producing a single scalar value as an output is equivalent to the inner product P(v, w) as follows:

(3)

where v and w indicate the input feature vector and the embedding vector, i.e. weight vector, in the last fully connected layer, respectively. From this formulation, we observe that the ignored feature caused by the inner product, , can be obtained by the vector rejection of v from w, which is defined as follows:

(4)

In other words, by minimizing the adversarial loss using the additional probability value obtained through the inner product of and another weight vector ŵ, the discriminator is able to consider the ignored feature space.

Based on these observations, we propose the CR module which iteratively conducts the inner product and vector rejection processes. Fig.2 illustrates the proposed CR module, where v indicates the input feature vector of the CR module, which is obtained through the multiple convolutional layers in the discriminator. The iterative vector rejection process generates vectors, i.e. , which represent the ignored feature in the previous inner product operation, whereas the iterative inner product produces scalar values, i.e. , which indicate the probabilities that the v came from the real data distribution. Note that s are non-overlapped with each other since they obtained through the vector rejection operation. By using the probabilities obtained via the CR module, the adversarial loss of the discriminator and that of the generator can be rewritten as

(5)
(6)

where () indicates the i-th probability value when the input of discriminator is real (generated) image, and is a hyper parameter which controls the relative importance of each loss term. In order to predict the probability using the more essential features in the earlier stage of the CR module, we set as follows:

(7)

When N is one, the loss functions in Eqs. 5 and 6 are equivalent to the original one in Eq. 1. In contrast, when N is larger than one, the discriminator and generator should consider the ignored feature space to minimize the and , respectively. It is worth noting that since the iterative inner product process and vector rejection process are simple vector operations, the proposed CR module does not impose the training overhead. In addition, since the proposed CR module is appended after the last fully connected layer in the discriminator, there is no modification to the existing architecture of the discriminator.

Figure 4: The illustration of the conditional projection discriminator [20].

3.2 Conditional Cascading Rejection module

In this subsection, we introduce the CR module for the cGANs, called the conditional cascading rejection (cCR) module. Among the various cGANs frameworks [18, 23, 25, 22, 21, 20], we design the cCR module based on the conditional projection discriminator [20] which shows superior performance than other existing discriminators for the cGANs. As depicted in Fig. 4, the conditional projection discriminator takes an inner product between the embedded condition vector w, which is a different vector depending on the given condition, and the feature vector of the discriminator, i.e. v, so as to impose a regularity condition. Based on the regularity condition, like the standard discriminator, the conditional projection discriminator predicts the conditional probability P(v, w, w) as follows:

(8)

Based on the above equation, we design the cCR module by replacing the w in the CR module with (w + w), as shown in Fig. 5. More specifically, the -th vector rejection process of the cCR module can be expressed as follows:

(9)

Unlike the CR module for the GANs, the cCR module conducts the vector rejection by considering the conditional information. After predicting N probabilities via the cCR module, the adversarial loss can be easily computed by using Eqs. 5 and 6.

Figure 5: The illustration of the cCR module. In this paper, we propose the cCR based on the conditional projection discriminator in [20].

4 Experiments

4.1 Implementation details

In order to evaluate the effectiveness of the CR module, we conducted extensive experiments using the CIFAR-10 [28], LSUN [31], Celeb-HQ [13, 16], and tiny-ImageNet [5, 30] datasets. The CIFAR-10 [28] and LSUN [31] datasets consist of 10 classes, whereas the tiny-ImageNet [5, 30], which is a subset of the ImageNet [5], is composed of 200 classes. Among a large number of images in the LSUN dataset, we randomly selected 30,000 images for each class. In addition, we compressed the images from the Celeb-HQ, LSUN, and tiny-ImageNet datasets as pixels. For the objective function, we adopted the hinge version of adversarial loss. The hinge version loss with the CR module is defined as follow:

(10)
(11)

Since all parameters in the generator and the discriminator including the CR module can be differentiated, we performed an optimization using the Adam optimizer [14]

, which is a stochastic optimization method with adaptive estimation of moments. We set the parameters of Adam optimizer,

i.e. and

, to 0 and 0.9, respectively, and set the learning rate to 0.0002. During training procedure, we updated the discriminator five times per each update of the generator. For the CIFAR-10 dataset, we used a batch size of 64 and trained the generator for 100k iterations, whereas the generators for the Celeb-HQ and LSUN datasets were trained for 100k iterations with a batch size of 32. For the tiny-ImageNet, we set a batch size as 64 and trained the generator for 400k iterations (about 100 epochs). Our experiments were conducted on CPU Intel(R) Xeon(R) CPU E3-1245 v5 and GPU RTX 2080 Ti, and implemented in

TensorFlow.

Table 1: Detailed architectures of the generator and discriminator used for the 32 32 CIFAR-10 dataset. In our experiments, we follow the implementation of the Miyato et al[20] (a) Architecture of the generator, (b) Architecture of the discriminator.
Table 2: Detailed architectures of the generator and discriminator used for the 64 64 Celeb-HQ, LSUN, and tiny-ImageNet datasets. In our experiments, we follow the implementation of the Miyato et al[20] (a) Architecture of the generator, (b) Architecture of the discriminator.
Figure 6: Detailed architectures of the ResBlock used in our experiments. (a) ResBlock of the discriminator, (b) ResBlock of the generator.

4.2 Baseline models

In this work, we employed the generator and discriminator architectures of the leading cGANs scheme [20, 19], as our baseline models. The detailed architectures of the two different models for and resolutions are presented in Tables 1 and 2, respectively. Both of cases employ the multiple residual block [9] (ResBlocks) as depicted in Fig. 6. In the ResBlock of the discriminator, we employed the spectral normalization [19]

for all layers including the proposed CR module. For the discriminator, the down-sampling (average-pooling) is performed after the second convolutional layer, whereas the generator up-sampled the feature maps using a nearest neighbor interpolation prior to the first convolution layer.

GANs cGANs
Dataset Standard Proposed Proposed cGANs Proporesed Proposed
GANs method method [20] method method
() () () () () ()

CIFAR-10 [28]
18.26 16.90 16.46 11.34 10.87 10.92
Celeb-HQ [13, 16] 11.81 11.11 8.49 - - -
LSUN [31] 26.86 24.37 21.48 22.69 19.09 19.96
tiny-ImageNet [5, 30] 39.12 38.37 36.57 32.15 31.75 29.69
Table 3: Comparison of the proposed method with the standard GANs and cGANs on the CIFAR-10, Celeb-HQ, LSUN, and tiny-ImageNet datasets in terms of the FID.

4.3 Comparison of Sample Quality

Evaluation metrics To evaluate the performance of the generator, in this paper, we employed the principled and comprehensive metric, called frechet inception distance (FID) [10], which measures the visual appearance and diversity of the generated images. The FID can be obtained by calculating the Wasserstein-2 distance between the distribution of the real images, , and that of the generated ones, , in the feature space obtained via the Inception model [27], which is defined as follows:

(12)

where are the mean and covariance of the samples with distribution and , respectively. Lower FID scores indicate better quality of the generated images. Indeed, there is an alternative approximate measure of image quality, called inceptions score (IS). However, since the IS has some flaws as mentioned in [2, 3], we employed the FID as the main metric in this paper.

Figure 7: A graph showing FID scores of the GANs over the training iteration on the tiny-ImageNet dataset.

Results To demonstrate the advantage of the CR module, we conducted extensive experiments by adjusting the N value. In our experiments, we randomly generated 50,000 images for CIFAR-10, LSUN, and tiny-ImageNet datasets and 30,000 images for Celeb-HQ dataset. Table 3 shows the comprehensive performances of the proposed method. The bold numbers in Table 3 represent the best performance among the results. As shown in Table 3, the CR module significantly improves the performance of GANs. In addition, Fig. 7 shows the FID results on tiny-ImageNet dataset during the training procedure. As shown in Fig. 7, the proposed method shows better performance than the standard GANs during the training procedure. These results reveal that the proposed method is able to effectively improve the GANs performance by considering the ignored features in the discriminator. Thus, we confirmed that, by strongly penalizing the generator using the CR module, the proposed method leads the generator to produce the images that are more similar to the real images and results in the low FID scores.

Figure 8: A graph showing FID scores of the cGANs over the training iteration on the tiny-ImageNet dataset.
Figure 9: Examples of the generated images on Celeb-HQ, LSUN and tiny-ImageNet datasets. (a) Generated images on Celeb-HQ dataset, (b) Generated images on LSUN dataset, (c) Generated images on tiny-ImageNet dataset.

Moreover, to demonstrate the validity of the cCR module, we conducted additional cGAN experiments on CIFAR-10, LSUN, and tiny-ImageNet datasets, except for the Celeb-HQ dataset which does not contain the conditional information, i.e

. class information. We employed the same baseline models with the experiments of GANs, but replaced the BN in the generator with the conditional batch normalization layer 

[6]. Also, the CR module in the discriminator is replaced with cCR module. As shown in Table 3, the proposed method shows better performance than cGANs [20], which reveals the performance of cGNAs can be improved by applying the cCR module to the cGANs framework. We notice that the cCR module with shows worse performance than that with on datasets having a small number of classes such as CIFAR-10 and LSUN, whereas shows better performance on tiny-ImageNet dataset consisting of 200 classes. These results indicate that is enough to penalize the generator for training the datasets having a small number of classes. In other words, to achieve the fine performance, the of the cCR module should be adjusted depending on the number of the given condition.

Table 4: Detailed architectures of the generator and discriminator for image-to-image translation with CR module. (a) Architecture of the generator, (b) Architecture of the discriminator.

Fig. 9 shows the examples images on Celeb-HQ, LSUN and tiny-ImageNet datasets. As depicted in Fig. 9, the proposed method allows the generator to produce visually pleasing images. In addition, the proposed method additionally requires the network parameters, where indicates the dimension of the last layer in the discriminator, which are very small numbers compared to the overall discriminator parameters. Thus, the proposed CR module can be added to the discriminator with a marginal training overhead. It is worth noting that this work does not intend to design an optimal generator and discriminator architectures for the CR and cCR modules; there could be another structure that leads to better performance and generates more high quality images. In contrast, we care more about whether it is possible to improve the performance of the GANs and cGANs frameworks by simply adding the CR and cCR module to the discriminator, respectively.

Figure 10: Examples of the generate images on the photo  Monet dataset [36].
CycleGAN [36] Proposed method
()
Photo Monet 100.74 95.27
Monet Photo 168.50 152.09
Table 5: Comparison of the proposed method with the CycleGAN [36] on the photo  Monet dataset in terms of the FID.

4.4 Image-to-image translation with CR module

To demonstrate the generalization ability of the proposed method, we applied the CR module to the image-to-image translation scheme. In this paper, we selected the CycleGAN [36], one of the state-of-the-art frameworks to conduct image-to-image translation with the unpaired training data, as the baseline framework. The detailed architectures are described in Table 4. Note that we did not conduct global pooling on the last layer of the discriminator. Instead, in the CR module, we predicted probabilities for each pixel using the convolutional layer which is equivalent to the fully connected layer. In addition, in the ResBlock of the generator, we employed the instance normalization [29] (IN) instead of the BN.

In our experiments, we evaluated the performance on the photo  Monet dataset in [36]. We compressed the images from the photo  Monet dataset as pixels, and trained the generator for 100k iterations using the hinge version of adversarial loss in Eqs. 10 and 11. Experimental results are described in Fig. 10 and Table 5. As shown in Table 5, the proposed method significantly improves the FID score, which represents that the proposed CR module can be applied to the discriminator having several pixels in the last layer such as PatchGAN [12]. Thus, we confirmed that the proposed CR module can be easily utilized for improving the performance of other GAN-based applications.

5 Conclusion

In this paper, we have introduced a straightforward method for improving the performance of the GANs. By using the non-overlapped features obtained via the proposed CR module, the discriminator effectively penalizes the generator during the training procedure, which results in improving the performance of the generator. One of the main advantages of the CR module is that it can be readily integrated with the existing discriminator architectures. Moreover, our experiments reveal that, without imposing the training overhead, the discriminator with the CR module significantly improves the performance of the baseline models. In addition, the generalization ability of the proposed method is demonstrated by applying the CR module to the cGANs and image-to-image translation frameworks. It is expected that the proposed method will be applicable to various applications based on the GANs.

References

  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §1.
  • [2] S. Barratt and R. Sharma (2018) A note on the inception score. arXiv preprint arXiv:1801.01973. Cited by: §4.3.
  • [3] T. Chen, X. Zhai, M. Ritter, M. Lucic, and N. Houlsby (2019) Self-supervised gans via auxiliary rotation loss. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 12154–12163. Cited by: §1, §1, §4.3.
  • [4] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797. Cited by: §1.
  • [5] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: §1, §4.1, Table 3.
  • [6] V. Dumoulin, J. Shlens, and M. Kudlur (2017) A learned representation for artistic style. Proc. of ICLR 2. Cited by: §4.3.
  • [7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §2.1.
  • [8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767–5777. Cited by: §1.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.2.
  • [10] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637. Cited by: §4.3.
  • [11] S. Hong, D. Yang, J. Choi, and H. Lee (2018) Inferring semantic layout for hierarchical text-to-image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7986–7994. Cited by: §1.
  • [12] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017)

    Image-to-image translation with conditional adversarial networks

    .
    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: §1, §2.2, §4.4.
  • [13] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196. Cited by: §1, §1, §4.1, Table 3.
  • [14] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.
  • [15] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1, §1.
  • [16] Z. Liu, P. Luo, X. Wang, and X. Tang (2015) Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738. Cited by: §1, §4.1, Table 3.
  • [17] Q. Mao, H. Lee, H. Tseng, S. Ma, and M. Yang (2019) Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1429–1437. Cited by: §1, §1, §1.
  • [18] M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. Cited by: §2.1, §3.2.
  • [19] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957. Cited by: §1, §2.1, §4.2.
  • [20] T. Miyato and M. Koyama (2018) CGANs with projection discriminator. arXiv preprint arXiv:1802.05637. Cited by: §1, Figure 4, Figure 5, §3.2, §4.2, §4.3, Table 1, Table 2, Table 3.
  • [21] A. Odena, C. Olah, and J. Shlens (2017) Conditional image synthesis with auxiliary classifier gans. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    ,
    pp. 2642–2651. Cited by: §1, §2.1, §3.2.
  • [22] A. Odena (2016) Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583. Cited by: §3.2.
  • [23] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee (2016) Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396. Cited by: §1, §3.2.
  • [24] M. Sagong, Y. Shin, S. Kim, S. Park, and S. Ko (2019) Pepsi: fast image inpainting with parallel decoding network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11360–11368. Cited by: §1.
  • [25] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Advances in neural information processing systems, pp. 2234–2242. Cited by: §3.2.
  • [26] Y. Shin, M. Sagong, Y. Yeo, S. Kim, and S. Ko (2019) PEPSI++: fast and lightweight network for image inpainting. arXiv preprint arXiv:1905.09010. Cited by: §1.
  • [27] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §4.3.
  • [28] A. Torralba, R. Fergus, and W. T. Freeman (2008)

    80 million tiny images: a large data set for nonparametric object and scene recognition

    .
    IEEE transactions on pattern analysis and machine intelligence 30 (11), pp. 1958–1970. Cited by: §1, §4.1, Table 3.
  • [29] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §4.4.
  • [30] L. Yao and J. Miller (2015) Tiny imagenet classification with convolutional neural networks. CS 231N. Cited by: §1, §4.1, Table 3.
  • [31] F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365. Cited by: §1, §4.1, Table 3.
  • [32] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2018) Free-form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589. Cited by: §1.
  • [33] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena (2018) Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318. Cited by: §1.
  • [34] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915. Cited by: §2.1.
  • [35] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas (2018) Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence 41 (8), pp. 1947–1962. Cited by: §1.
  • [36] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §1, Figure 10, §4.4, §4.4, Table 5.