Deep Neural Networks are Surprisingly Reversible: A Baseline for Zero-Shot Inversion

by   Xin Dong, et al.
Harvard University

Understanding the behavior and vulnerability of pre-trained deep neural networks (DNNs) can help to improve them. Analysis can be performed via reversing the network's flow to generate inputs from internal representations. Most existing work relies on priors or data-intensive optimization to invert a model, yet struggles to scale to deep architectures and complex datasets. This paper presents a zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation. The crux of our method is to inverse the DNN in a divide-and-conquer manner while re-syncing the inverted layers via cycle-consistency guidance with the help of synthesized data. As a result, we obtain a single feed-forward model capable of inversion with a single forward pass without seeing any real data of the original task. With the proposed approach, we scale zero-shot direct inversion to deep architectures and complex datasets. We empirically show that modern classification models on ImageNet can, surprisingly, be inverted, allowing an approximate recovery of the original 224x224px images from a representation after more than 20 layers. Moreover, inversion of generators in GANs unveils latent code of a given synthesized face image at 128x128px, which can even, in turn, improve defective synthesized images from GANs.


page 2

page 6

page 7

page 8

page 9

page 10

page 17


MixCon: Adjusting the Separability of Data Representations for Harder Data Recovery

To address the issue that deep neural networks (DNNs) are vulnerable to ...

A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks via Learned Weights Statistics

Quantizing the floating-point weights and activations of deep convolutio...

Zero-Shot Program Representation Learning

Learning program representations has been the core prerequisite of code ...

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

Training a referring expression comprehension (ReC) model for a new visu...

How to Stop Off-the-Shelf Deep Neural Networks from Overthinking

While deep neural networks (DNNs) can perform complex classification tas...

Uncovering Why Deep Neural Networks Lack Robustness: Representation Metrics that Link to Adversarial Attacks

Neural networks have been shown vulnerable to adversarial samples. Sligh...

JoJoGAN: One Shot Face Stylization

While there have been recent advances in few-shot image stylization, the...

1 Introduction

We focus on the invertibility of an arbitrary pre-trained deep neural network (DNN), i.e., recovering its input from the intermediate representations. The impact of such inversion is two-fold. First, inversion provides insights into how DNNs manipulate and process information at different depths. Inverting an intermediate representation back to the input space unveils how the network filters information and hints at better designing and utilizing deep models for downstream tasks. Second, inversion of widely used DNNs is valuable for analyzing their vulnerability when access to the hidden activations is given.

Mobile and edge devices increasingly rely on DNNs for complex tasks (howard2017mobilenets; chen2018enhanced; deng2013new; hinton2012deep), while suffering stringent battery and memory constraints (zhang2018shufflenet; han2015learning). Split computing (SC) has been a popular approach to overcome this via executing the first several layers on devices and the remaining layers on the cloud (kang2017; choi2018deep; cohen2020lightweight). SC acts as a privacy-preserving way as only features are shared by mobile devices instead of original data (kang2017; Jeong2018; eshratifar2019jointdnn; li2018auto; jankowski2020joint; eshratifar2019bottlenet). Given that no prior work has successfully inverted the released features in the absence of private priors, the paradigm is presumed safe (RN84; pagliari2020crime; matsubara2020head; yao2020deep).

Figure 1: Our method is capable of inverting off-the-shelf pretrained networks in a single step without access to the original data. As a result, an inversion network is learned that enables out-to-input synthesis (Sec. 3). The method is applicable to discriminative (Sec. 4.1) and generative (Sec. 4.2

) models trained on complex datasets. Our empirical results shade light on (i) deeper understanding of DNNs in filtering information, (ii) potential defence techniques from adversarial attack, (iii) estimating a latent code for the image generated from GAN, (iv) improving corrupted generated images, and even (iv) potential discriminative feature for real vs generated images (more in Sec. 

4). ‘Re-generated’ denotes a new generation from the latent code unveiled by inversion model given the corresponding images on the left.

Albeit being important, DNNs’ invertibility remains an ill-posed and challenging problem (Arora2015): it highly depends on model architecture (jacobsen2018revnet), weights distribution (QiLei2019), and learning objectives (geirhos2018imagenet)

. Take the popular activation function in modern DNNs, Rectified Linear Unit (ReLU), for example 

(nair2010rectified). It outputs only positive input and zero otherwise. Such a saturation nature prevents anyone from accurately recovering its input. Linear layers such as convolutional and fully-connected layers are also barely invertible. They can be expressed as a matrix product between weights matrix and layer input. The resulting weights matrix has a left inverse if it has rank and a greater-than-one expansion rate . Such condition of expansion rate is hardly satisfied in modern neural networks (e.g., ResNet (He2015) and RepVGG (ding2021repvgg)). What’s more, DNNs are typically trained with discriminative objectives, which are designed to extract class-relevant features and mask out the rest (goodfellow2009measuring; szegedy2013intriguing; glorot2011domain).

Current approaches of model inversion require specific assumptions to be met to succeed, even for shallow architectures and simple datasets. For example, one direction of works provides theoretical proof of the invertibility assuming the weights of DNNs are random-like (Arora2015; Gilbert2017; behrmann2018analysis), and expansion rates are larger than a certain constant (QiLei2019; aberdam2020and). However, these conditions are hardly satisfied for existing trained DNNs (Cai2019WeightNB; Li2020AdditivePQ). Other works formulate the inversion problem as a generative task (Dosovitskiy2016; Nguyenplugplay2017; RN58; hand2018phase)

. They either leverage a pre-trained generative adversarial network (

e.g., BigGAN (brock2018large)) or train a GAN from scratch with large amounts of real data. The generator in the GAN acts as the inversion model to recover input given the hidden activations. In this scenario, knowledge and access to the pre-trained GAN or original dataset are necessary. This makes inversion obsolete in various application scenarios, e.g., in practice, no pre-trained GAN nor the massive amount of original images are provided amid cost, privacy, and proprietary concerns.

This paper considers the problem of zero-shot model inversion: we study the viability to learn an inversion model that faithfully reverses the original function mapping, given only a target pre-trained model without access to original training data. Our proposed method unifies the inversion for both discriminative and generative models as shown in Fig. 1

. To overcome the increased complexity and non-linearity of the target model as depth increases, we propose a divide-and-conquer inversion method that partitions the inversion into sequential layer/block-wise inversion sub-problems. To support optimization of modules in the inversion model, we exploit synthetic data generated via minimizing the discrepancy between noise input’s feature statistics and ones stored in batch normalization (BN) 

(santos2019learning; yin2020dreaming; xu2020generative). With synthetic data proxies, we are able to optimize the inversion model to enforce feature embedding similarity with respect to the original counterpart. We call it cycle consistency-guided inversion. Combining aforementioned techniques enables us to successfully scale zero-shot model inversion to deep architectures and complicated datasets without assumptions on model or dataset priors.

In summary, we make the following contributions: (i) We propose a new inversion paradigm based on a divide-and-conquer strategy and a cycle-consistency loss. (ii) We demonstrate that the method generalizes to both discriminative and generating models, even for complex datasets (i.e., ImageNet (deng2009imagenet) and CelebA (liu2015faceattributes)), surpassing state-of-the-art baselines. (iii) We analyze inversion insights, including image recovery, quality enhancement, latent code projection, and adversarial sample characterization.

2 Related Work

We next investigate previous work and cast them into four categories: invertible neural networks, analytical inversion, generative inversion, and input optimization based inversion.

(Dosovitskiy2016; RN58)
No pre-trained prior
No adversarial training
Single-pass inversion
No exotic arch./weight constraint
Unify inversion of multiple models
Table 1: Comparison of our approach with prior work.

Invertible Neural Networks.

Invertible neural networks (INN) (jacobsen2018revnet; behrmann2019invertible_res; grathwohl2018ffjord; behrmann2021understanding; chang2018reversible) are a family of neural networks which can be treated as one-to-one function approximators because of their special architecture and restricted weights distribution. For example, i-RevNet (jacobsen2018revnet) consists of well-designed RevBlocks, which are interleaved with convolution, reshuffling, and partitioning, and i-ResNet (behrmann2019invertible_res) adds Lipschitz condition of weights to ResNet (He2015). Earlier literature on flow-based generative models (dinh2014nice; dinh2016density; kingma2018glow) shares similar design principles.

Analytical Inversion.

Most of INNs have to rely on exotic architectures and regularization, thus cannot be applied on standard model architectures. Analytical inversion relaxes the constraint on model architecture and studies the theoretical invertibility of standard feedforward neural networks (arora2014provable). For example, (Arora2015; Gilbert2017) theoretically show that the approximate reverse of a feedforward layer can be obtained by taking the transpose of weights matrix, based on the hypothesis that weights in the target network are random-like, which is not always held for a pre-trained network  (Cai2019WeightNB; Li2020AdditivePQ)

. Despite the simplicity, these methods can only scale to CIFAR-10 level, and the recovered images are noisy.

(Gilbert2017), in addition, makes a strong assumption that the activation function is concatenated ReLU (shang2016understanding) which keeps the positive and negative values separately. Besides discriminative models, (QiLei2019; Fangchang2018) investigate the theoretical invertibility of the deep generative model. (QiLei2019)

indicates that the per layer inversion can be achieved through linear programming in polynomial time if weights are random-like and the layer output dimension is larger than double that of input, and empirically inverts a simple generator on the MNIST dataset.

Generative Inversion.

Despite of the great progress of analytical inversion, there is still a discrepancy between conditions of theories and modern pre-trained models. As a learning-based and data-driven approach, generative inversion aims to learn a generative model that reverses the target model’s input-output mapping. (Nguyenplugplay2017; teterwak2021understanding) leverage a pre-trained generator (e.g., BigGAN (brock2018large)) as the ‘learned prior’, to optimize the latent space of a generator to maximize the activation of target network. If exquisite pre-trained generators are not accessible, a generative adversarial network can be trained on the original dataset of the target model (RN58). The generator takes features as input and outputs the reconstructed images, and the binary discriminator distinguishes real and reconstructed images to compete with the generator. The dependence on the pre-trained generator, the original dataset, and unstable adversarial learning limits the practicability of generative inversion. In addition, at most layers in relative shallow architecture (i.e., AlexNet) are inverted in the mentioned works, yet scalable to modern deep architectures.

Input Optimization Based Inversion.

Given an arbitrary pre-trained discriminative model such as classifier,

(yin2020dreaming; mordvintsev2015deepdream; cai2020zeroq; haroush2020knowledge; chawla2021data)

show that one can reveal certain information of training set by optimization to match pre-stored statistics or maximum activation of neurons. However, these methods are all designed for untargeted model inversion and cannot reconstruct the input faithfully given the feature embedding. In addition, these methods remain computationally heavy.


Model inversion is also related to autoencoders 

(vincent2010stacked; hinton2011transforming) given a similar functionality of target and inversion models, to an encoder-decoder pair. However, the encoder and decoder in an autoencoder are trained jointly with a massive amount of data in an end-to-end manner. In our case, only an individually pre-trained target model is accessible. Aligning with the previous finding of layer-wise training of autoencoder (e.g.

, deep belief network 

(hinton2006fast; bengio2007greedy)), and we also find greedy layer-wise training beneficial for inversion model training as detailed in Sec. 3.3.

To summarize the above literature, one common trend is that researchers continually reduce the assumptions made by previous studies, for example, exotic architectures, random weights distribution, and specific layer expansion rate. In this paper, we focus on direct model inversion with minimal assumptions, i.e., without original dataset nor exquisite priors but only the target pre-trained model. We unify the inversion of both discriminative and generative deep models and scale the inversion to deep architectures and complex datasets.

3 Method

We first formulate the problem and then describe the proposed inversion method. Consider a multi-layer neural network (discriminative or generative) as a transfer function composed of layers, , trained on the dataset . is the parameterized layer that may include its associated batch normalization and activation. Unless otherwise specified, we consider the widely used as the activation function. However, all the results in this paper can be extended to or other variants. is the sub-network from -th to -th layers, e.g., is the first layer and is the whole network.

In this paper, we consider the following problem: given a pre-trained network , is it possible to learn its (approximate) reverse function without access to ? This is practical for scenarios where original data is not available and learning a direct decoder is not possible. More specifically, we expect to learn a new function , where is the recovered input.

3.1 Divide-and-Conquer Inversion

DNNs have a compositional structure, consisting of layers or even blocks operating on the propagated tensor called a feature map. Several studies have revealed that neural networks produce more complicated feature maps of higher non-linearity and capacity as the depth increases 

(bianchini2014complexity; eldan2016power; telgarsky2016benefits; raghu2017expressive). As a result, the difficulty of DNN inversion was reported in previous work (Gilbert2017; RN58; Dosovitskiy2016; Fangchang2018).

To circumvent the difficulty of approximating all stacked layers jointly from scratch, we first partition the overall inversion problem into several layer-(or block-)wise inversion sub-problems before integrating them together for refinement. To this end, we introduce a simple yet effective inversion strategy called Divide-and-Conquer Inversion (DCI) that progressively inverts the computational flow of DNNs and gradually refines them. Different from a direct end-to-end inversion, DCI has two advantages: (i) a single layer (or block) has less non-linearity and complexity, thus is easier to be inverted; (ii) DCI provides richer supervision signal across layers (or blocks), while the overall inversion only utilizes supervision at the two ends of the target model. Both strengths make the optimization more effective, stable, and data efficient.

Starting from the first layer, DCI inverts each layer with two simultaneous goals: (i) inverting the target layer , and (ii) ensuring that newly inverted layer works well jointly with all previously inverted layers . The necessity of the second objective comes from the non-exact inversion and small deltas that are amplified when propagated. Therefore we adjust representations according to the accumulated error. For the first objective, we minimize the layer reconstruction loss,


where is the input of the target layer. As for the second objective, we aim to ensure the reconstruction quality of the current temporary “overall” inversion model up to the layer such that . This translates into minimizing the distance between inverted input to the original input , where we introduce another loss term for this purpose:


3.2 Cycle-Consistency Guided Inversion

We further re-exploit the target model for stronger inversion guidance amid the unique setup of the inversion problem. As we reverse the computation of the target model, this forms a natural loop with the original computation flow. Inspired by perceptual metric (Richard2018) and cycle-consistent image translation (zhu2017unpaired), we explore cycle consistency to measure the quality of reconstructed inputs by re-checking them with the target model. Intuitively, if the reconstructed input is faithfully and semantically close to the original input then the direct model should produce similar, if not exact, feature responses at all layers. To this end, we cycle the reconstructed input back to the direct (target) model, and minimize the distance between features of the reconstructed input and original input at various depth. The cycle consistency for inversion is formally defined as follow:


This enables a full utilization of the features from the original input twice to provide richer supervision during optimization of inversion model. In the layer reconstruction loss (Eq. (missing)), we use features of the original input as the reconstruction objective. In addition, the cycle consistency loss (Eq. (missing)) also uses features of the original input as a reference and enforces the inverted input to have similar features as the original input. With the above losses, the final optimization objective for an inversion layer can thus be expressed as


where is a hyper-parameter.

3.3 Training Strategy

DCI divides the computation of a feed-forward neural network into several parts and inverts the computational flow progressively. One straightforward strategy is to sequentially optimize each individual inversion module

starting from the first (i.e., input) layer . We observe that the inversion error will accumulate as we move deeper in the model. To mitigate this accumulation issue, we utilize an improved training strategy. After the optimization of a certain inversion module , we further take all previous inversion modules into consideration and fine-tune all layers up to , i.e., , with the same loss to reduce the accumulated inversion error. When , we skip this fine-tuning because there is no inversion model before .

3.4 Data Sampling

One remaining challenge to optimize inversion modules with Eq. (missing) is input data . When the target model is a generative model (i.e.

, the generator of a generative adversarial network), it is effortless to get input data by sampling random latent codes from a Normal (or Uniform) distribution. However, when the target model is an image classification model, it is not feasible to know or sample images from the underlying image distribution. Although one can model a superset of input images by sampling each pixel independently from a uniform distribution

, this superset would be too different from the real data distribution, and in practice too loose as input proxies (yin2020dreaming).

It may seem that the original training data is needed for fine-tuning of the inverted model, but the model itself is more than sufficient to provide enough guidance. Inspired by the progress in adversarial-free generative models (li2017mmd; binkowski2018demystifying) and data-free knowledge distillation (yin2020dreaming; xu2020generative), we re-use these techniques to generate a small subset for finetuning. The method we choose minimizes the discrepancy between features statistics of synthetic data and statistics stored in batch normalization (BN) layers:


where and are features statistics computed at the -th layer. and are their corresponding moving averages stored in BN layers. is the cross entropy between and a randomly assigned label , and

is a secondary image regularization like total variance to make derived images more natural 

(yin2020dreaming). Minimizing the discrepancy of features’ statistics between synthetic data and real data (i.e.

, BN statistics) is equivalent to reducing the integral probability metric 

(li2017mmd; binkowski2018demystifying; muller1997integral) between distributions of synthetic data and real data111We detail this intuitive justification in appendix.. Thus, the synthetic data can be treated as data from a distribution which is close to the underlying real data distribution and thus can act as a reasonable proxy for the model inversion task.

Naively combining BN-guided data synthesis and other existing inversion approaches is suboptimal due to the following reasons:

  • [leftmargin=*]

  • Prior work (RN58; teterwak2021understanding) train a generator (in a GAN framework) as the inversion model, whereas the GAN training is very sensitive and challenging given only synthetic data (kodali2017convergence; tran2021data; zhao2020diffaugment). The resulting efficacy falls short to DCI, as we will show later in Sec. 4.3.

  • Prior work (Nguyenplugplay2017; RN58; teterwak2021understanding) usually require the whole training set (more than M images) to train the GAN for inversion. Yet deriving massive synthetic data remains slow (yin2020dreaming).

In contrast, without any adversarial learning, DCI does not have the above disadvantages when using synthetic data because of its progressive inversion nature and cycle consistency supervision. DCI is data-efficient and can be enabled by only K synthetic images.

4 Experiments

We next demonstrate the efficacy of our method on inverting discriminate and generative models.

Real images of px. from the ImageNet1K validation set.

Recovered images from RepVGG feature embeddings after 21 blocks .

Figure 2: ImageNet inversion results given only a pretrained RepVGG without additional information. Note that recovered images have contextually correct backgrounds, in realistic scenarios, of close proxy to real samples. Best viewed in color.

4.1 Classifier Inversion on the ImageNet1K Dataset

We first study inversion of the classifiers. For this set of experiments, we consider the inversion of RepVGG (ding2021repvgg) that is one of the state-of-the-art classification models with a deep yet mathematically neat architecture. It has a ResNet-like (He2015) multi-branch topology during training, and a mathematically equivalent VGG-like (simonyan2014very) inference-time architecture, achieved by layer folding. Specifically, we target a RepVGG-A0 (ding2021repvgg) that has convolution layers, each of which consists of a convolution, a ReLU, and a folded BN layer, yielding a top-1 validation accuracy on ImageNet222Based on the author CVPR’21 released repo and models at We invert one convolution layer at a time using our DCI strategy, starting from the input layer. Each inversion layer mimics its corresponding target counterpart. Each inversion layer is optimized via Adam (kingma2014adam) for K iterations. We use K synthetic images as detailed in Sec. 3.4 for the optimization. See appendix for additional experimental details.

Main Inversion Results.

Fig. 2 shows inversion results on ImageNet1K. Remarkably, inversion recovers very close pixel-wise proxy to the original image from deep embedding outputted by the -th layer333The spatial size of this embedding is (, )., preserving original semantic and visual attributes, including color, orientation, outline, and position.

Inversion at Different Layer Depths.

Figure 3:

Inversion from features at increased depths. (Left) ‘0’ donates real image. Number indicates layer depth. (Right) PSNR and LPIPS at various layer depths. Shaded region is the standard deviation across ImageNet testing set.

As noticed, edges are blurred in inverted images, likely caused by information filtering due to the classification nature of the model. To dig deeper into this hypothesis, we visualize the inverted images from different depths of the target model in Fig. 3

. As the depth increases, inverted images become more blurry, losing more details. To quantify the changes, we also provide peak signal-to-noise ratio (PSNR) 

(psnr) and learned perceptual metric (LPIPS) (Richard2018) in Fig. 3 (right) that is consistent with the qualitative observation.

One intriguing fact shown in Fig. 3 resides in a consistent preservation of high level information across most layers, challenging prior security arguments in split computing (kang2017; Jeong2018; eshratifar2019jointdnn; li2018auto; jankowski2020joint; eshratifar2019bottlenet). For example, as shown in Fig. 3, we can achieve nearly perfect recovery from outputs of the -th layer (with spatial size

px) that already past three stride-

convolutions perceived as lossy operations. We also find that increasing the depth is not always effective in making inversion harder, hence hinges on easy mitigation of such risks via deploying deeper sub-nets on devices. As shown in Fig. 3 (right), using features from the -th to -th layers still results in recovered images of similar quality in terms of PSNR and LPIPS. Finally, we find that the last two layers dramatically degrade the inversion quality. One can still recognize the class of inverted images after the penultimate (i.e., the -th convolution) layer. However, if we invert features after the final (i.e., the fully-connected) layer, only the predominant color is recognizable444We include inversion results from features of the -th convolution layer and final fully-connected layer in the appendix.

. This may indicate that class-invariant information is quickly filtered out towards the end of the model while the initial stages focus on feature extractions. This is also in line with observations in transfer learning 

(NIPS2014_375c7134; li2020rifle; long2015learning; dollar2018rethinking; zoph2020rethinking)

and self-supervised learning 

(he2020momentum; grill2020bootstrap; chen2020big).

Figure 4: For each image group: top left is legitimate image (base + random perturbation ), top right is adversarial image (base + adversarial perturbation  (madry2017towards)). Both perturbations are of the same magnitude. Image pairs are vertically aligned. Tables match image pairs left to right.

Recovering Adversarial Samples.

We next leverage inversion to understand mechanism of adversarial images (kurakin2016adversarial). During the crafting process of adversarial samples, adversarial perturbation is optimized to maximize the tangle of network prediction (43405). We find that inversion model has discrepant behavior for randomly (Gaussian noise) perturbed images and adversarially perturbed ones. As shown in Fig. 4, randomly perturbed images share close proxy semantics with original images. However, when we do inversion from features of adversarial samples crafted by PGD (madry2017towards)

, the recovered images reveal high-order chaos. This occurrence implies that not only the prediction results but also the intermediate features are disrupted by adversarial samples. The visualization of feature maps corresponding to images with random and adversarial perturbations in the appendix reinforces the above finding. This provides heuristic insights on large-scale dataset to prior work that argue random perturbation with same magnitude won’t disturb the network 

(carlini2017adversarial; lee2018simple). The discrepant inversion behaviors for adversarial and legitimate samples may inspire defense methods.

SNGAN generated samples with px (the first generation).

Recovered images (the second generation).

Figure 5: CelebA inversion results. The recovered latent codes unveil images that are very close to the original samples, with similar visual feature, style, and orientation. Inversion improves the quality of some defective samples.

4.2 GAN Inversion on the CelebA Dataset

We next shift to inversion of generative models. For this experiment we focus on the popular SNGAN architecture that has Inception Score (IS) on the CelebA dataset of px resolution (miyato2018spectral). The SNGAN has layers, including one fully-connected layer at the beginning, one convolution layer at the end, and residual blocks. Each residual block consists of convolution layers.555Based on implementation and models at We break the inversion of the whole SNGAN model into inversion of the fully-connected layer, the last convolution layer, and residual blocks with the optimization objective in Eq. (missing) and the training strategy in Sec. 3.3. For mathematical consistency with prior GAN literature, we use to indicate the target generator instead of . denotes sampled latent code that generates . We aim to learn that faithfully reconstructs given , a problem that has large piratical impacts (e.g., image compression) but remains challenging (QiLei2019; styleganv2).

Main Inversion Results for Latent Code Recovery.

Fig. 5 visualizes our inversion results of the target generator. To demonstrate efficacy we first use the inversion model to recover latent codes, and then pass the recovery back to the generator. Without bells and whistles, the second generation images faithfully align with the first generation. In addition, recovered faces are visually very similar to the original targets. One interesting observation is that the inversion and re-generation process purify the secondary attributes such as backgrounds, but preserve main attributes of the faces. This further indicates that our inversion takes semantic information into consideration on account of the cycle consistency guidance.

Figure 6:

Linear interpolation of inverted latent codes

(a) Defective SNGAN-generated images . (b) Improved images via inversion-reprojection . Figure 7: Inversion improves the failure cases of SNGAN-generated samples. (a) Real images from the CelebA validation set. (b) Re-synthesized latent codes from inversion . Figure 8: Inverting real images. Unlike generated images, real images are changed after inversion.

Interpolation of Recovered Latent Codes.

To show validity of recovered latent code space, we conduct linear interpolation between recovered latent codes as in Fig. 6. This signifies that the recovered latent codes fit well in the input space of SNGAN, sitting on smooth transitions between adjacent samples.

Improving Defective Generated Images.

One favorable concomitant of inversion and re-generation is that the defective first-generation images can be greatly after re-projection, shown in Fig. 7

. Here a first generation image is generated by SNGAN starting from a randomly sampled latent code. The random latent code, as well as the first generation image, could be outlierers, slightly diverging from the distribution of normal input. Since the inversion model is trained for the output-to-input mapping, it finds the closest in-distribution latent code for the original outlier latent code during recovery, hence improves image quality.

Real vs. Generated Images.

All the above experiments are conducted on the first generation images from SNGAN. We next check upon whether the inversion model has distinct behaviors for real images, as noticed by Kerras et al. latent code per real image results in a distant synthesis than the original sample (styleganv2). For a fair comparison, we first save both real images and generated images in the format of PNG (wiki:Portable_Network_Graphics) and use strictly the same inversion pipeline for both of them.

Although SNGAN is trained to map a base distribution (e.g.

, Normal distribution) to the underlying distribution of real images, there is still a discrepancy between SNGAN’s output distribution and the underlying distribution. We observe that the inversion model is able to capture such distribution discrepancy, demonstrating varying behaviors on the generated and real images, as shown in 

Fig. 8. The inversion model can still faithfully invert generated images, but shift real images and alter the style after inversion and re-generation. Instead real images and their second generation merely share orientation and color. This may inspire future work on detecting deep fake images (tolosana2020deepfakes; chesney2018deep).

Original images (a) DeePSiM (RN58)

(b) DeepDream 

(c) DeepInversion (yin2020dreaming) This work
Figure 9: Qualitative comparison to prior inversion methods.
Method Inference Time (s) PSNR LPIPS
DeepDream (mordvintsev2015deepdream) 0.55K 9.53 0.90
DeepInversion (yin2020dreaming) 1.92K 10.93 0.60
This work 0.015 18.83 0.44
Inference time measured on NVIDIA V100 GPU at batch size .
Table 2: Quantitative comparison to prior methods.

4.3 Comparison to Prior Work

Next, we compare our method with other approaches under the zero-shot model inversion setup, that either optimize an auxiliary network ((a) DeePSiM (RN58)) or input tensors ((b) DeepDream (mordvintsev2015deepdream) and (c) DeepInversion (yin2020dreaming)). We include their detailed setups in Appendix.

We show both qualitative comparison in Fig. 9 and quantitative results in Table 2. The inferior result of DeePSiM (trained with synthetic data) demonstrates the incompatible combination of previous generative inversion approaches (Nguyenplugplay2017; RN58; teterwak2021understanding) and BN guided data synthesis as discussed in Sec. 3.4. DeepDream yields unrecognizable images compared to our method, while DeepInversion results in improved features and semantics even though a gap remains between recovered and original images. Both DeepDream and DeepInversion require K forward and backward passes to optimize inputs – while our method only needs one forward pass through the inversion model, hence is much more efficient.

4.4 Inverting More Architectures on ImageNet1K

To demonstrate the general applicability of the proposed method, we next show inversion results for additional networks on the ImageNet dataset, covering varying architectures (ResNet- and ResNet-) and training recipes (standard (He2015) and self-supervised (chen2020mocov2)).

Inverting ResNet-18.

We start with ResNet-18 inversion. For this experiment we base the network on the implementation and pre-trained model of ResNet-

from the PyTorch model zoo 

(torchvision). ResNet- contains units of BasicBlock, each of which consists of two convolution layers (with BN and ReLU) and a shortcut connection. BasicBlock has a multi-branch architecture because of the existence of shortcut connection. We invert the BasicBlock as a whole unit. The first four sequential layers in ResNet-

are convolution, BN, ReLU, and max pooling layers. Since it is impossible to invert a max pooling without additional information 

(estrach2014signal), we invert the sequence of the first four layers as a whole block, which we refer to as the initial block. Same as before, each inversion block mimics its corresponding target counterpart but with reversed input and output dimensions.

Fig. 10 shows inversion results of pre-trained ResNet- on ImageNet. With the proposed method, we are able to recover recognizable input images after up to blocks (the initial block and BasicBlock’s) that contain convolution layers and one max pooling layer in total. We observe inversion can still faithfully restore semantically correct images of high fidelity that contain similar visual details to original counterparts.

Real images of px. from the ImageNet1K validation set.

Recovered images from ResNet-18 feature embeddings after convolution layers .

Figure 10: ImageNet inversion results given only a pretrained ResNet-18 without original data. Note that recovered images have contextually correct backgrounds, in realistic scenarios, of close proxy to real samples. Best viewed in color.

Inverting ResNet-50 (standard (He2015) and self-supervised MoCo V2 (chen2020mocov2)).

We next move to invert a deeper architecture as ResNet-. For additional insights to network behavior, we study inversion of both a normally trained network as in (He2015), and the recent self-supervised trained network as in MoCo V2 (chen2020mocov2)666Based on For this experiment, we break the overall ResNet- architecture into five sub-networks and invert one sub-network each time.

Same as ResNet-, the first sub-network is the initial block. The other four sub-networks consist of Bottleneck’s respectively (He2015).

Real images of px. from the ImageNet1K validation set.

Recovered images from ResNet-50 (standard) feature embeddings after convolution layers .

Recovered images from ResNet-50 (MoCo V2 (chen2020mocov2)) feature embeddings after convolution layers .

Figure 11: ImageNet inversion results without original data given only a pretrained ResNet-50 with changing training recipes (standard (He2015) and self-supervised (chen2020mocov2)). Best viewed in color.

Fig. 11 shows the main inversion results of ResNet- on ImageNet. We observe that a stronger feature extractor preserves more information - see a quick comparison between second and third rows in Fig. 11. Self-supervised pretraining leads to stronger inversion, when compared to a standard training recipe of the same network architecture. This is consistent with the recent findings in (yin2021see). These results demonstrate that we are able to recover recognizable input images for up to sub-networks, which have convolution layers and one max pooling layer in total.


We observe that the last convolution, pooling, and fully-connected layers remain lossy for inversion, which may stem from the fact that the network is pretrained heavily for the classification task. Yet, we observe inversion viability up to the -st convolutional block. This unveils that DNNs extract such a fertile amount of image-specific representation, that even pixel-wise recovery is viable.


We have shown feasibility of network inversion of deep models on complex datasets, while alleviating most constraints by prior work such as model, training, or dataset priors. Our zero-shot method unifies inversion of discriminate and generative inversion under one approach. We further presented extensive analysis of inversion behavior of large-scale networks, characterized their behaviours in filtering information, defending adversarial attacks, improving defective generation, and varying response to generated and real samples.

Broader Impact

This work contributes to deeper understanding of DNNs’ invertibility. We conducted a range of experiments to study the properties the inversion induces. This will help both research and industrial communities to investigate data security vulnerabilities of machine-learning-as-a-service and improve their pre-trained models. The method also reduces the amount of data required during inversion as compared to prior art, helping to alleviating environmental burdens at the same time.


Appendix A Appendix

a.1 Additional Details for Main Manuscript

Lossy Final Fully-Connected Layer.

In the main manuscript, we show that the recovered images from RepVGG feature embeddings after convolution layers preserve original semantic and visual attributes. However as we discuss in Limitations, we find that information in the feature embeddings decays rapidly through the last two layers in RepVGG. This may indicate that class-invariant information is quickly filtered out towards the end of the model while the initial stages focus on feature extraction, which is aligned with prior observations in transfer learning (NIPS2014_375c7134; li2020rifle; long2015learning; dollar2018rethinking; zoph2020rethinking) and self-supervised learning (he2020momentum; grill2020bootstrap; chen2020big). We visualize the above findings in Fig. 12. One can still recognize the class of inverted images after the penultimate (i.e., the -th convolution) layer. However, if we invert features after the final (i.e., the fully-connected) layer, only the predominant color is recognizable.

(a) Original images
(b) Inversion from the
last convolution layer
(c) Inversion from the
final full-connected layer
Figure 12: Results of inversion from the penultimate layer (i.e., the last convolution layer (b)) and the final layer (i.e., the final full-connected layer (c))

Experiential Setup for Our Method.

For all experiments in the main manuscript, each inversion step is optimized via Adam (kingma2014adam) for K iterations. We set the initial learning rate as during optimization of a certain inversion module (K iterations) and during the following fine-tuning of all layer up to such that (K iterations). We use cosine annealing with warm restarts strategy to adjust the learning rate during optimization (cosine; loshchilov2016sgdr). The coefficient of cycle consistency loss in Equ. 4 is to ensure it has similar magnitude as other loss terms. We use K synthetic images as detailed in Sec. 3.4 for the optimization.

Experimental Setup for Prior Work.

We next elaborate on details for prior baselines:

  • DeePSiM (RN58) learns a generator as inversion model which takes feature embeddings from target model as latent code and outputs recovered images. The generator is trained adversarially competing with an extra binary discriminator. For fair comparison, we replace the ImageNet training set used by (RN58) with K synthetic samples for optimization of inversion model.

  • DeepDream (mordvintsev2015deepdream) and DeepInversion (yin2020dreaming) both back-propagate gradients onto inputs to optimize them towards natural images. For a fair comparison, we replace the CE loss in original setup (yin2020dreaming; mordvintsev2015deepdream) and use distance instead between synthesized and target embeddings to invert the same target layers as in the proposed method. We scale this loss to similar magnitudes as other loss terms with a scaling factor of , other scaling terms the same as (yin2020dreaming). We study a randomly sampled validation batch of batch size for this comparison. We use setting that consumes K updates per batch as in (yin2020dreaming) amid the requirement for feature map dimension consistency.