Adversarial training [goodfellow2014generative] has become one of the most popular tools for solving generation and manipulation problems. For example in image generation [goodfellow2014generative, radford2015unsupervised], super-resolution [ledig2017photo], image-to-image transformation [isola2017image, zhu2017unpaired], text-to-image [reed2016generative]
and others. Common to all these works is the discriminator–generator information flow – via a loss function. That is, the generator output images are fed into the discriminator which produces a ‘real-fake’ score for each image in terms of a pre-defined loss function. This score is back-propagated to the generator through gradients.
Recent research in the GAN field discusses the design of the loss function and regularization terms. For example, the basic cross-entropy loss [goodfellow2014generative], Wasserstein distance [arjovsky2017wasserstein], spectral normalization [miyato2018spectral] or relativistic discriminator [jolicoeur2018relativistic]. This discussion has contributed significantly to the advancement of GANs and using a discriminator has become highly effective. To date, after training the discriminator is forsaken and the deep understanding of the data distribution is lost. This seems wasteful to us, hence, we seek a way to enjoy the discriminator also during test-time. In addition, encapsulating the discriminator information into a single score looses the spatial understanding of which regions are more ‘real’ and which are considered ‘fake’. In the current scheme only limited spatial information flows with the back-propagation because the gradients are averaged over each batch.
In this paper we propose a different approach that explicitly exploits the discriminator’s activations, in test-time, in order to improve the generator output. We propagate the discriminator information back to the generator utilizing an iterative feedback loop, as illustrated in Figure 1. The overall framework is as follows: We start with classic training of the generator and discriminator. Then, at test-time, the generator produces an output image which fed into the discriminator in order to compute its feedback. The discriminator activations are fed into a third module which we name – feedback module. The goal of this module is to convert the discriminator activations into ‘corrections’ which can then be added to the original generator activations. We repeat this process iteratively until convergences (- iterations).
The main contributions of the Adversarial Feedback Loop (AFL) are two-fold. First, to the best of our knowledge, our novel adversarial feedback loop is the first use of the discriminator at test-time. Second, our scheme makes the spatial discriminative information accessible to the generator, allowing it to ‘correct’ artifacts and distortions thus producing higher quality images. A few motivational examples are presented in Figure 2, where it can be seen that our pipeline takes images produced by the generator and corrects those regions that suffered from artifacts.
Experiments on the CIFAR-10 [krizhevsky2009learning] and CelebA [liu2015faceattributes] data-sets for the task of unsupervised image generation show that combining AFL with state-of-the-art GAN methods improves the Inception Score (IS) [salimans2016improved] which is evident also qualitatively as better visual quality. In addition, when integrated with ESRGAN [wang2018esrgan], a state-of-the-art method for super-resolution, AFL can further improve the results; it achieves higher Perceptual Index [blau20182018] and lower RMSE, making the results more visually appealing and more trustworthy to the ground truth.
2 Related Work
The idea of exploiting the discriminator features served as motivation to some previous works. The GAN-based methods [wang2017high, larsen2015autoencoding]
proposed to use a loss based on features extracted from the discriminator layers. They compared the discriminator features of fake image to the discriminator features of real images, in a similar manner to the renowned perceptual loss[dosovitskiy2016generating]. In all those methods, the utility of the discriminator layers was limited to training-time, which is different from our suggestion to enjoy its benefits also in test-time.
The concept of feedback has already made its way into the training framework of several previous works that exploit the iterative estimation of the output aiming at better final results. In[oberweger2015training] a feedback loop was trained for hand pose estimation, in [Carreira_2016_CVPR] feedback was used for human pose estimation, while [li2016iterative] proposed to use the feedback for the problem of instance segmentation. [zamir2017feedback] suggest a general feedback learning architecture based on recurrent networks, that can benefit from early quick predictions and from a hierarchical structure of the output in label space. An interesting solution for the task of video frames prediction was presented in [lotter2016deep], that introduced an unsupervised recurrent network that feeds the predictions back to the model. Feedback was also used for super-resolution by [haris2018deep] that suggest a network that uses the error feedback from multiple up and down-scaling stages.
To the best of our knowledge, none of these previous method has proposed applying the concept of feedback in the framework of GAN. To place our proposed framework in context with the terminology common in works discussing feedback paradigms, one can think of the generator as a ‘predictor’, the discriminator as an ‘error estimator’ and feedback modules close the loop and convert the errors from the discriminator feature space to the generator feature space.
|(a) Generator–Discriminator–Feedback module pipeline||(b) Dual-input feedback module|
In this section we present our AFL framework. As discussed in the introduction, all current methods use the discriminator for adversarial training only. We wish to change that and explicitly use the knowledge that the discriminator has gathered also during test-time. This way the discriminator can ”leak” information to the generator by providing feedback on the generator’s failures and thus assisting the generator in fixing them. We designed a solution that is generic so that it can be integrated with any GAN based network.
3.1 Framework Overview
Given a generator-discriminator architecture, and respectively, we denote their layers by and , where is the layer index.
These layers (or some) are connected via feedback modules 111Each of which consists of two convolutional layers:
(). The input to each feedback module is the activation map of the corresponding layer of the discriminator . The output of each feedback module is added to the corresponding activation map of the generator , thus forming a skip-connection, such that the generator activation maps change to:
See Figure 3(a) for illustration. Each feedback module is further associated with a scalar parameter that multiplies its output. Setting deactivates the ’th module altogether, while tunes the contribution of the feedback.
The basic feedback module connects between equally-sized activation layers of the discriminator and the generator. We also suggest a slightly more complex form, where the feedback module s are given as input the activation maps of both the discriminator and the generator (concatenated, noted as ), as illustrated in Figure 3(b), such that the generator activation maps change to:
The training scheme consists of two phases. The first phase is identical to the common practice in training GANs. The feedback modules are inactive and we apply standard adversarial network training, in which the generator, , and the discriminator, , are trained according to the selected base method. The outcome is a trained generator and a trained discriminator that can differentiate between real images and fake images produced by the generator.
The second training phase is where we activate the feedback modules and train them. This is done while freezing the generator , but allowing the discriminator to keep updating. This way the feedback modules learn to correct the generator results in order to improve them based on the feedback given from the discriminator. Since the output from the generator improves, we must allow the discriminator to continue and refine its weights.
We next write in further detail the steps of the second phase of the training:
(e.g., a random vector), the generator produces an initial output imagethat is fed into the discriminator.
We set and use the following update equation:
where aggregates the output of all the feedback modules:
depending on which feedback module type is used. In practice, in almost all our experiment two iterations sufficed, i.e. until we get .
The feedback modules are trained with the same objective as the baseline generator (e.g. cross entropy, Wasserstein distance, etc.), while replacing every instance of the term with the term .
At test-time we freeze the entire network including the generator, the discriminator and the feedback modules. The activation levels of the feedback modules are tuned by setting, i.e.
Typically the impact of the corrections from the feedback modules needs to be attenuated and we have found empirically that best results are obtained when . This way only the stronger corrections really contribute.
Note, that because of the batch-norm layer in each feedback module, its output signal is forced to be with the same strength (variance) as the generator features, such that multiplying the output by a smallis sufficient in order to preserve the original features, and marginally correct them. Unless otherwise specified, in all our experiments we used the value of .
In principle the test-time process could also be repeated iteratively, however, we have found that it suffices to run a single iteration to achieve satisfactory results.
In this section we present experiments conducted on several data-sets, tasks and networks that demonstrate the contributions of our method. In all cases we add our AFL to existing methods while adopting their training details: the architecture, the loss function, the normalization term and the hyper-parameters. Our modifications are reserved to the second training phase, in which we train the feedback module and to the testing phase, where we use the feedback module during generation.
4.1 Empirical analysis on a simple 2D case
Before diving into applications, we first perform empirical evaluation of AFL in a simple, yet illustrative, scenario. The goal here is to show that AFL is effectively able to utilize the discriminator information in order to improve the generated results.
The scenario we chose is generation of 2D coordinates that lie on a ‘Swiss roll’. The generator gets a random input points and is trained to generate points
that fit into the ‘real’ data distribution, represented by the discriminator. The discriminator is trained to classify each sample as ‘real’ or ‘fake’ according to a given ground-truth swiss-roll data distribution.
As architecture, we chose both the generator and the discriminator to consist of a sequence of four fully-connected layers. For the feedback we used a single module, that corrects the input of the last layer of the generator. The objective we used was the WGAN-GP [gulrajani2017improved] adversarial loss. Please refer to the supplementary for implementation details. As baseline model the generator and the discriminator were trained for iterations. Then we froze the generator, added the feedback module and trained it with the same discriminator for another iterations.
Our results are presented in Figure 4. It can be observed that the baseline generator succeeds to generate samples close to the real data distribution, however, using the proposed AFL pipeline improves the generation accuracy and results in a distribution that is much closer to the real one. The AFL module identifies that inaccuracies in the generated points, corrects them, and leads to better results.
|(a) Test input variance = training input variance|
|(b) Test input variance = training input variance|