Adversarial Feedback Loop

11/20/2018 ∙ by Firas Shama, et al. ∙ 14

Thanks to their remarkable generative capabilities, GANs have gained great popularity, and are used abundantly in state-of-the-art methods and applications. In a GAN based model, a discriminator is trained to learn the real data distribution. To date, it has been used only for training purposes, where it's utilized to train the generator to provide real-looking outputs. In this paper we propose a novel method that makes an explicit use of the discriminator in test-time, in a feedback manner in order to improve the generator results. To the best of our knowledge it is the first time a discriminator is involved in test-time. We claim that the discriminator holds significant information on the real data distribution, that could be useful for test-time as well, a potential that has not been explored before. The approach we propose does not alter the conventional training stage. At test-time, however, it transfers the output from the generator into the discriminator, and uses feedback modules (convolutional blocks) to translate the features of the discriminator layers into corrections to the features of the generator layers, which are used eventually to get a better generator result. Our method can contribute to both conditional and unconditional GANs. As demonstrated by our experiments, it can improve the results of state-of-the-art networks for super-resolution, and image generation.

READ FULL TEXT VIEW PDF

Authors

page 2

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Adversarial training [goodfellow2014generative] has become one of the most popular tools for solving generation and manipulation problems. For example in image generation [goodfellow2014generative, radford2015unsupervised], super-resolution [ledig2017photo], image-to-image transformation [isola2017image, zhu2017unpaired], text-to-image [reed2016generative]

and others. Common to all these works is the discriminator–generator information flow – via a loss function. That is, the generator output images are fed into the discriminator which produces a ‘real-fake’ score for each image in terms of a pre-defined loss function. This score is back-propagated to the generator through gradients.

Recent research in the GAN field discusses the design of the loss function and regularization terms. For example, the basic cross-entropy loss [goodfellow2014generative], Wasserstein distance [arjovsky2017wasserstein], spectral normalization [miyato2018spectral] or relativistic discriminator [jolicoeur2018relativistic]. This discussion has contributed significantly to the advancement of GANs and using a discriminator has become highly effective. To date, after training the discriminator is forsaken and the deep understanding of the data distribution is lost. This seems wasteful to us, hence, we seek a way to enjoy the discriminator also during test-time. In addition, encapsulating the discriminator information into a single score looses the spatial understanding of which regions are more ‘real’ and which are considered ‘fake’. In the current scheme only limited spatial information flows with the back-propagation because the gradients are averaged over each batch.

In this paper we propose a different approach that explicitly exploits the discriminator’s activations, in test-time, in order to improve the generator output. We propagate the discriminator information back to the generator utilizing an iterative feedback loop, as illustrated in Figure 1. The overall framework is as follows: We start with classic training of the generator and discriminator. Then, at test-time, the generator produces an output image which fed into the discriminator in order to compute its feedback. The discriminator activations are fed into a third module which we name – feedback module. The goal of this module is to convert the discriminator activations into ‘corrections’ which can then be added to the original generator activations. We repeat this process iteratively until convergences (- iterations).

The main contributions of the Adversarial Feedback Loop (AFL) are two-fold. First, to the best of our knowledge, our novel adversarial feedback loop is the first use of the discriminator at test-time. Second, our scheme makes the spatial discriminative information accessible to the generator, allowing it to ‘correct’ artifacts and distortions thus producing higher quality images. A few motivational examples are presented in Figure 2, where it can be seen that our pipeline takes images produced by the generator and corrects those regions that suffered from artifacts.

Experiments on the CIFAR-10 [krizhevsky2009learning] and CelebA [liu2015faceattributes] data-sets for the task of unsupervised image generation show that combining AFL with state-of-the-art GAN methods improves the Inception Score (IS) [salimans2016improved] which is evident also qualitatively as better visual quality. In addition, when integrated with ESRGAN [wang2018esrgan], a state-of-the-art method for super-resolution, AFL can further improve the results; it achieves higher Perceptual Index  [blau20182018] and lower RMSE, making the results more visually appealing and more trustworthy to the ground truth.

DCGAN

[radford2015unsupervised]

DCGAN

+AFL

Figure 2: Contribution of AFL: Face generation results of (top) DCGAN [radford2015unsupervised] and (middle) integrating DCGAN within the proposed AFL framework. Faces generated by DCGAN+AFL are sharper and show fewer artifacts. (bottom) The differences between the images generated without and with AFL highlight that AFL corrects the spatial regions that suffer from artifacts, for example, the cheek of the left-most face, and the eyes of the second from left.

2 Related Work

The idea of exploiting the discriminator features served as motivation to some previous works. The GAN-based methods [wang2017high, larsen2015autoencoding]

proposed to use a loss based on features extracted from the discriminator layers. They compared the discriminator features of fake image to the discriminator features of real images, in a similar manner to the renowned perceptual loss 

[dosovitskiy2016generating]. In all those methods, the utility of the discriminator layers was limited to training-time, which is different from our suggestion to enjoy its benefits also in test-time.

The concept of feedback has already made its way into the training framework of several previous works that exploit the iterative estimation of the output aiming at better final results. In 

[oberweger2015training] a feedback loop was trained for hand pose estimation, in [Carreira_2016_CVPR] feedback was used for human pose estimation, while [li2016iterative] proposed to use the feedback for the problem of instance segmentation. [zamir2017feedback] suggest a general feedback learning architecture based on recurrent networks, that can benefit from early quick predictions and from a hierarchical structure of the output in label space. An interesting solution for the task of video frames prediction was presented in [lotter2016deep], that introduced an unsupervised recurrent network that feeds the predictions back to the model. Feedback was also used for super-resolution by  [haris2018deep] that suggest a network that uses the error feedback from multiple up and down-scaling stages.

To the best of our knowledge, none of these previous method has proposed applying the concept of feedback in the framework of GAN. To place our proposed framework in context with the terminology common in works discussing feedback paradigms, one can think of the generator as a ‘predictor’, the discriminator as an ‘error estimator’ and feedback modules close the loop and convert the errors from the discriminator feature space to the generator feature space.

(a) Generator–Discriminator–Feedback module pipeline (b) Dual-input feedback module
Figure 3: The feedback framework: (a) The proposed feedback module passes information from the discriminator to the generator thus “learning” how to correct the generated image in order to make it more real in terms of the discriminator score. (b) It is also possible to let the feedback module consider both the features of the discriminator and the features of the generator.

3 Method

In this section we present our AFL framework. As discussed in the introduction, all current methods use the discriminator for adversarial training only. We wish to change that and explicitly use the knowledge that the discriminator has gathered also during test-time. This way the discriminator can ”leak” information to the generator by providing feedback on the generator’s failures and thus assisting the generator in fixing them. We designed a solution that is generic so that it can be integrated with any GAN based network.

3.1 Framework Overview

Given a generator-discriminator architecture, and respectively, we denote their layers by and , where is the layer index. These layers (or some) are connected via feedback modules 111Each of which consists of two convolutional layers:
()
. The input to each feedback module is the activation map of the corresponding layer of the discriminator . The output of each feedback module is added to the corresponding activation map of the generator , thus forming a skip-connection, such that the generator activation maps change to:

(1)

See Figure 3(a) for illustration. Each feedback module is further associated with a scalar parameter that multiplies its output. Setting deactivates the ’th module altogether, while tunes the contribution of the feedback.

The basic feedback module connects between equally-sized activation layers of the discriminator and the generator. We also suggest a slightly more complex form, where the feedback module s are given as input the activation maps of both the discriminator and the generator (concatenated, noted as ), as illustrated in Figure 3(b), such that the generator activation maps change to:

(2)

3.2 Training

The training scheme consists of two phases. The first phase is identical to the common practice in training GANs. The feedback modules are inactive and we apply standard adversarial network training, in which the generator, , and the discriminator, , are trained according to the selected base method. The outcome is a trained generator and a trained discriminator that can differentiate between real images and fake images produced by the generator.

The second training phase is where we activate the feedback modules and train them. This is done while freezing the generator , but allowing the discriminator to keep updating. This way the feedback modules learn to correct the generator results in order to improve them based on the feedback given from the discriminator. Since the output from the generator improves, we must allow the discriminator to continue and refine its weights.

We next write in further detail the steps of the second phase of the training:

First iteration

Given input

(e.g., a random vector), the generator produces an initial output image

that is fed into the discriminator.

The iteration

We set and use the following update equation:

(3)

where aggregates the output of all the feedback modules:

(4)

or

(5)

depending on which feedback module type is used. In practice, in almost all our experiment two iterations sufficed, i.e. until we get .

The Objective

The feedback modules are trained with the same objective as the baseline generator (e.g. cross entropy, Wasserstein distance, etc.), while replacing every instance of the term with the term .

3.3 Testing

At test-time we freeze the entire network including the generator, the discriminator and the feedback modules. The activation levels of the feedback modules are tuned by setting

, i.e.

(6)

Typically the impact of the corrections from the feedback modules needs to be attenuated and we have found empirically that best results are obtained when . This way only the stronger corrections really contribute.

Note, that because of the batch-norm layer in each feedback module, its output signal is forced to be with the same strength (variance) as the generator features, such that multiplying the output by a small

is sufficient in order to preserve the original features, and marginally correct them. Unless otherwise specified, in all our experiments we used the value of .

In principle the test-time process could also be repeated iteratively, however, we have found that it suffices to run a single iteration to achieve satisfactory results.

4 Experiments

In this section we present experiments conducted on several data-sets, tasks and networks that demonstrate the contributions of our method. In all cases we add our AFL to existing methods while adopting their training details: the architecture, the loss function, the normalization term and the hyper-parameters. Our modifications are reserved to the second training phase, in which we train the feedback module and to the testing phase, where we use the feedback module during generation.

4.1 Empirical analysis on a simple 2D case

Before diving into applications, we first perform empirical evaluation of AFL in a simple, yet illustrative, scenario. The goal here is to show that AFL is effectively able to utilize the discriminator information in order to improve the generated results.

The scenario we chose is generation of 2D coordinates that lie on a ‘Swiss roll’. The generator gets a random input points and is trained to generate points

that fit into the ‘real’ data distribution, represented by the discriminator. The discriminator is trained to classify each sample as ‘real’ or ‘fake’ according to a given ground-truth swiss-roll data distribution.

As architecture, we chose both the generator and the discriminator to consist of a sequence of four fully-connected layers. For the feedback we used a single module, that corrects the input of the last layer of the generator. The objective we used was the WGAN-GP [gulrajani2017improved] adversarial loss. Please refer to the supplementary for implementation details. As baseline model the generator and the discriminator were trained for iterations. Then we froze the generator, added the feedback module and trained it with the same discriminator for another iterations.

Our results are presented in Figure 4. It can be observed that the baseline generator succeeds to generate samples close to the real data distribution, however, using the proposed AFL pipeline improves the generation accuracy and results in a distribution that is much closer to the real one. The AFL module identifies that inaccuracies in the generated points, corrects them, and leads to better results.

(a) Test input variance = training input variance
(b) Test input variance = training input variance
Figure 4: Swiss-roll generation: Results of generating points that lie on a swiss-roll. (a) When the variance of the random input during test-time matches the variance of the inputs used for training the baseline generator does a decent job. Adding AFL corrects the small inaccuracies and yields a distribution almost identical to the real one. (b) When the variance of the random input is increased, the baseline generator fails. Conversely, using AFL has still succeeds to some extent to reproduce the swiss-roll distribution.