Precise Recovery of Latent Vectors from Generative Adversarial Networks

02/15/2017 ∙ by Zachary C Lipton, et al. ∙ University of California, San Diego 0

Generative adversarial networks (GANs) transform latent vectors into visually plausible images. It is generally thought that the original GAN formulation gives no out-of-the-box method to reverse the mapping, projecting images back into latent space. We introduce a simple, gradient-based technique called stochastic clipping. In experiments, for images generated by the GAN, we precisely recover their latent vector pre-images 100 experiments demonstrate that this method is robust to noise. Finally, we show that even for unseen images, our method appears to recover unique encodings.



There are no comments yet.


page 2

page 3

Code Repositories


"Precise Recovery of Latent Vectors from Generative Adversarial Networks" in PyTorch

view repo


Recovering latent vectors from Generative Adversarial Networks

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep convolutional neural networks (CNNs) are now standard tools for machine learning practitioners. Currently, they outperform all other computer vision techniques for discriminative learning problems including image classification and object detection. Generative adversarial networks (GANs)

(Goodfellow, 2014; Radford et al., 2015) adapt deterministic deep neural networks to the task of generative modeling.

GANs consist of a generator and a discriminator. The generator maps samples from a low-dimensional latent space onto the space of images. The discriminator tries to distinguish between images produced by the generator and real images. During training, the generator tries to fool the discriminator. After training, researchers typically discard the discriminator. Images can then be generated by drawing samples from the latent space and passing them through the generator.

While the generative capacity of GANs is well-known, how best to perform the reverse mapping (from image space to latent space) remains an open research problem. Donahue et al. (2016) suggests an extension to GAN in which a third model explicitly learns the reverse mapping. Creswell & Bharath (2016) suggest that inverting the generator is difficult, noting that, in principle, a single image may map to multiple latent vectors . They propose a gradient-based approach to recover latent vectors and evaluate the process on the reconstruction error in image space.

We reconstruct latent vectors by performing gradient descent over the components of the latent representations and introduce a new technique called stochastic clipping. To our knowledge, this is the first empirical demonstration that DCGANS can be inverted to arbitrary precision. Moreover, we demonstrate that these reconstructions are robust to added noise. After adding small amounts of Gaussian noise to images, we nonetheless recover the latent vector with little loss of fidelity.

In this research, we also seek insight regarding the optimizations over neural network loss surfaces. We seek answers to the questions: (i) Will the optimization achieve the globally minimal loss of or get stuck in sub-optimal critical points? (ii) Will the optimization recover precisely the same input every time? Over experiments, we find that on a pre-trained DCGAN network, gradient descent with stochastic clipping recovers the true latent vector of the time to arbitrary precision.

Related Work: Several papers attempt gradient-based methods for inverting deep neural networks. Mahendran & Vedaldi (2015)

invert discriminative CNNs to understand hidden representations.

Creswell & Bharath (2016) invert the generators of GANS but do not report finding faithful reconstruction in the latent space. We note that the task of finding pre-images for non-convex mappings has a history in computer vision dating at least as far back as Bakır et al. .

2 Gradient-Based Input Reconstruction and Stochastic Clipping

To invert the mappings learned by the generator, we apply the following idea. For a latent vector , we produce an image . We then initialize a new, random vector of the same shape as . This new vector maps to a corresponding image . In order to reverse engineer the input , we successively update the components of in order to push the representation closer to the original image . In our experiments we minimize the norm, yielding the following optimization problem:

We optimize over by gradient descent, performing the update until some convergence criteria is met. The learning rate is attenuated over time.

Note that this optimization has a global minima of . However, we do not know if the solution that achieves this global minimum is unique. Moreover, this optimization is non-convex, and thus we know of no theoretical reason that this optimization should precisely recover the original vector.

In many cases, we know that the original input comes from a bounded domain. For DCGANS, all latent vectors are sampled uniformly from the hyper-cube. To enforce this constraint, we apply the modified optimization

With standard clipping, we replace components that are too large with the maximum allowed value and components that are too small with the minimum allowed value. Standard clipping precisely recovers a large fraction of vectors .

For the failure cases, we noticed that the reconstructions had some components stuck at either or . Because

, we know that the probability that a component should lie right at the boundary is close to zero. To prevent these reconstructions from getting stuck, we introduce a heuristic technique called

stochastic clipping. When using stochastic clipping, instead of setting components to or , we reassign the clipped components uniformly at random in the allowed range. While this can’t guard against an interior local minima, it helps if the only local minima contain components stuck at the boundary.

3 Experiments

We now summarize our experimental findings. All experiments are conducted with DCGANs as described by Radford et al. (2015)

and re-implemented in Tensorflow by

Amos (2016). First, we visualize the reconstruction process showing after initialization, iterations, and k iterations (Figure 1). The reconstruction () produces an image indistinguishable from the original.

(a) Imposter initialization
(b) After iterations
(c) After iterations
(d) Original Images
Figure 1: Reconstruction visualizations.

Next, we consider the fidelity of reconstruction after updates. In Table 1, we show that even with conservative thresholds for determining reconstruction success, stochastic thresholding recovers percent of latent vectors. We evaluate these numbers using examples.

Accuracy Threshold
No Clipping
Standard Clipping
Stochastic Clipping
Table 1: Percentage of successful reconstructions for various thresholds out of trials.

We then consider the robustness of these reconstructions to noise. We apply Gaussian white noise

, attempting to reconstruct from . Our experiments show that even for substantial levels of noise, the reconstruction error in -space is low and appears to grow proportionally to the added noise (Figure 2).

(a) -space reconstruction with noise
(b) and

for noise variance .1 and recovery by stochastic clipping

Figure 2: -space reconstruction error grows proportional to added noise. Stochastic clipping appears more robust to noise than other methods. Because pixel values are scaled , noise variance of

corresponds to a standard deviation of

pixel values.

Finally, we ask whether for unseen images, the recovered vector is always the same. To determine the consistency of the recovered vector, we recover 1000 vectors for the same image and plot the average pair-wise distance between reconstructions.

Clipping None Standard Stochastic Baseline
Table 2: Average pairwise distance of recovered vectors for unseen images. The baseline score is the average distance between random vectors sampled randomly from the latent space.

4 Conclusions

We show that GAN generators can, in practice, be inverted to arbitrary precision. These inversions are robust to noise and the inversions appear unique even for unseen images. Stochastic clipping is both more accurate and more robust than standard clipping. We suspect that stochastic clipping should also give better and more robust reconstructions of images from discriminative CNN reconstructions, leaving these experiments to future work.