1 Introduction
We focus on the invertibility of an arbitrary pretrained deep neural network (DNN), i.e., recovering its input from the intermediate representations. The impact of such inversion is twofold. First, inversion provides insights into how DNNs manipulate and process information at different depths. Inverting an intermediate representation back to the input space unveils how the network filters information and hints at better designing and utilizing deep models for downstream tasks. Second, inversion of widely used DNNs is valuable for analyzing their vulnerability when access to the hidden activations is given.
Mobile and edge devices increasingly rely on DNNs for complex tasks (howard2017mobilenets; chen2018enhanced; deng2013new; hinton2012deep), while suffering stringent battery and memory constraints (zhang2018shufflenet; han2015learning). Split computing (SC) has been a popular approach to overcome this via executing the first several layers on devices and the remaining layers on the cloud (kang2017; choi2018deep; cohen2020lightweight). SC acts as a privacypreserving way as only features are shared by mobile devices instead of original data (kang2017; Jeong2018; eshratifar2019jointdnn; li2018auto; jankowski2020joint; eshratifar2019bottlenet). Given that no prior work has successfully inverted the released features in the absence of private priors, the paradigm is presumed safe (RN84; pagliari2020crime; matsubara2020head; yao2020deep).
Albeit being important, DNNs’ invertibility remains an illposed and challenging problem (Arora2015): it highly depends on model architecture (jacobsen2018revnet), weights distribution (QiLei2019), and learning objectives (geirhos2018imagenet)
. Take the popular activation function in modern DNNs, Rectified Linear Unit (ReLU), for example
(nair2010rectified). It outputs only positive input and zero otherwise. Such a saturation nature prevents anyone from accurately recovering its input. Linear layers such as convolutional and fullyconnected layers are also barely invertible. They can be expressed as a matrix product between weights matrix and layer input. The resulting weights matrix has a left inverse if it has rank and a greaterthanone expansion rate . Such condition of expansion rate is hardly satisfied in modern neural networks (e.g., ResNet (He2015) and RepVGG (ding2021repvgg)). What’s more, DNNs are typically trained with discriminative objectives, which are designed to extract classrelevant features and mask out the rest (goodfellow2009measuring; szegedy2013intriguing; glorot2011domain).Current approaches of model inversion require specific assumptions to be met to succeed, even for shallow architectures and simple datasets. For example, one direction of works provides theoretical proof of the invertibility assuming the weights of DNNs are randomlike (Arora2015; Gilbert2017; behrmann2018analysis), and expansion rates are larger than a certain constant (QiLei2019; aberdam2020and). However, these conditions are hardly satisfied for existing trained DNNs (Cai2019WeightNB; Li2020AdditivePQ). Other works formulate the inversion problem as a generative task (Dosovitskiy2016; Nguyenplugplay2017; RN58; hand2018phase)
. They either leverage a pretrained generative adversarial network (
e.g., BigGAN (brock2018large)) or train a GAN from scratch with large amounts of real data. The generator in the GAN acts as the inversion model to recover input given the hidden activations. In this scenario, knowledge and access to the pretrained GAN or original dataset are necessary. This makes inversion obsolete in various application scenarios, e.g., in practice, no pretrained GAN nor the massive amount of original images are provided amid cost, privacy, and proprietary concerns.This paper considers the problem of zeroshot model inversion: we study the viability to learn an inversion model that faithfully reverses the original function mapping, given only a target pretrained model without access to original training data. Our proposed method unifies the inversion for both discriminative and generative models as shown in Fig. 1
. To overcome the increased complexity and nonlinearity of the target model as depth increases, we propose a divideandconquer inversion method that partitions the inversion into sequential layer/blockwise inversion subproblems. To support optimization of modules in the inversion model, we exploit synthetic data generated via minimizing the discrepancy between noise input’s feature statistics and ones stored in batch normalization (BN)
(santos2019learning; yin2020dreaming; xu2020generative). With synthetic data proxies, we are able to optimize the inversion model to enforce feature embedding similarity with respect to the original counterpart. We call it cycle consistencyguided inversion. Combining aforementioned techniques enables us to successfully scale zeroshot model inversion to deep architectures and complicated datasets without assumptions on model or dataset priors.In summary, we make the following contributions: (i) We propose a new inversion paradigm based on a divideandconquer strategy and a cycleconsistency loss. (ii) We demonstrate that the method generalizes to both discriminative and generating models, even for complex datasets (i.e., ImageNet (deng2009imagenet) and CelebA (liu2015faceattributes)), surpassing stateoftheart baselines. (iii) We analyze inversion insights, including image recovery, quality enhancement, latent code projection, and adversarial sample characterization.
2 Related Work
We next investigate previous work and cast them into four categories: invertible neural networks, analytical inversion, generative inversion, and input optimization based inversion.





Ours  
Zeroshot  ✓  ✓  ✓  ✓  
No pretrained prior  ✓  ✓  ✓  ✓  ✓  
No adversarial training  ✓  ✓  ✓  ✓  ✓  
Singlepass inversion  ✓  ✓  ✓  ✓  
No exotic arch./weight constraint  ✓  ✓  ✓  ✓  
Unify inversion of multiple models  ✓ 
Invertible Neural Networks.
Invertible neural networks (INN) (jacobsen2018revnet; behrmann2019invertible_res; grathwohl2018ffjord; behrmann2021understanding; chang2018reversible) are a family of neural networks which can be treated as onetoone function approximators because of their special architecture and restricted weights distribution. For example, iRevNet (jacobsen2018revnet) consists of welldesigned RevBlocks, which are interleaved with convolution, reshuffling, and partitioning, and iResNet (behrmann2019invertible_res) adds Lipschitz condition of weights to ResNet (He2015). Earlier literature on flowbased generative models (dinh2014nice; dinh2016density; kingma2018glow) shares similar design principles.
Analytical Inversion.
Most of INNs have to rely on exotic architectures and regularization, thus cannot be applied on standard model architectures. Analytical inversion relaxes the constraint on model architecture and studies the theoretical invertibility of standard feedforward neural networks (arora2014provable). For example, (Arora2015; Gilbert2017) theoretically show that the approximate reverse of a feedforward layer can be obtained by taking the transpose of weights matrix, based on the hypothesis that weights in the target network are randomlike, which is not always held for a pretrained network (Cai2019WeightNB; Li2020AdditivePQ)
. Despite the simplicity, these methods can only scale to CIFAR10 level, and the recovered images are noisy.
(Gilbert2017), in addition, makes a strong assumption that the activation function is concatenated ReLU (shang2016understanding) which keeps the positive and negative values separately. Besides discriminative models, (QiLei2019; Fangchang2018) investigate the theoretical invertibility of the deep generative model. (QiLei2019)indicates that the per layer inversion can be achieved through linear programming in polynomial time if weights are randomlike and the layer output dimension is larger than double that of input, and empirically inverts a simple generator on the MNIST dataset.
Generative Inversion.
Despite of the great progress of analytical inversion, there is still a discrepancy between conditions of theories and modern pretrained models. As a learningbased and datadriven approach, generative inversion aims to learn a generative model that reverses the target model’s inputoutput mapping. (Nguyenplugplay2017; teterwak2021understanding) leverage a pretrained generator (e.g., BigGAN (brock2018large)) as the ‘learned prior’, to optimize the latent space of a generator to maximize the activation of target network. If exquisite pretrained generators are not accessible, a generative adversarial network can be trained on the original dataset of the target model (RN58). The generator takes features as input and outputs the reconstructed images, and the binary discriminator distinguishes real and reconstructed images to compete with the generator. The dependence on the pretrained generator, the original dataset, and unstable adversarial learning limits the practicability of generative inversion. In addition, at most layers in relative shallow architecture (i.e., AlexNet) are inverted in the mentioned works, yet scalable to modern deep architectures.
Input Optimization Based Inversion.
Given an arbitrary pretrained discriminative model such as classifier,
(yin2020dreaming; mordvintsev2015deepdream; cai2020zeroq; haroush2020knowledge; chawla2021data)show that one can reveal certain information of training set by optimization to match prestored statistics or maximum activation of neurons. However, these methods are all designed for untargeted model inversion and cannot reconstruct the input faithfully given the feature embedding. In addition, these methods remain computationally heavy.
Autoencoder.
Model inversion is also related to autoencoders
(vincent2010stacked; hinton2011transforming) given a similar functionality of target and inversion models, to an encoderdecoder pair. However, the encoder and decoder in an autoencoder are trained jointly with a massive amount of data in an endtoend manner. In our case, only an individually pretrained target model is accessible. Aligning with the previous finding of layerwise training of autoencoder (e.g.(hinton2006fast; bengio2007greedy)), and we also find greedy layerwise training beneficial for inversion model training as detailed in Sec. 3.3.To summarize the above literature, one common trend is that researchers continually reduce the assumptions made by previous studies, for example, exotic architectures, random weights distribution, and specific layer expansion rate. In this paper, we focus on direct model inversion with minimal assumptions, i.e., without original dataset nor exquisite priors but only the target pretrained model. We unify the inversion of both discriminative and generative deep models and scale the inversion to deep architectures and complex datasets.
3 Method
We first formulate the problem and then describe the proposed inversion method. Consider a multilayer neural network (discriminative or generative) as a transfer function composed of layers, , trained on the dataset . is the parameterized layer that may include its associated batch normalization and activation. Unless otherwise specified, we consider the widely used as the activation function. However, all the results in this paper can be extended to or other variants. is the subnetwork from th to th layers, e.g., is the first layer and is the whole network.
In this paper, we consider the following problem: given a pretrained network , is it possible to learn its (approximate) reverse function without access to ? This is practical for scenarios where original data is not available and learning a direct decoder is not possible. More specifically, we expect to learn a new function , where is the recovered input.
3.1 DivideandConquer Inversion
DNNs have a compositional structure, consisting of layers or even blocks operating on the propagated tensor called a feature map. Several studies have revealed that neural networks produce more complicated feature maps of higher nonlinearity and capacity as the depth increases
(bianchini2014complexity; eldan2016power; telgarsky2016benefits; raghu2017expressive). As a result, the difficulty of DNN inversion was reported in previous work (Gilbert2017; RN58; Dosovitskiy2016; Fangchang2018).To circumvent the difficulty of approximating all stacked layers jointly from scratch, we first partition the overall inversion problem into several layer(or block)wise inversion subproblems before integrating them together for refinement. To this end, we introduce a simple yet effective inversion strategy called DivideandConquer Inversion (DCI) that progressively inverts the computational flow of DNNs and gradually refines them. Different from a direct endtoend inversion, DCI has two advantages: (i) a single layer (or block) has less nonlinearity and complexity, thus is easier to be inverted; (ii) DCI provides richer supervision signal across layers (or blocks), while the overall inversion only utilizes supervision at the two ends of the target model. Both strengths make the optimization more effective, stable, and data efficient.
Starting from the first layer, DCI inverts each layer with two simultaneous goals: (i) inverting the target layer , and (ii) ensuring that newly inverted layer works well jointly with all previously inverted layers . The necessity of the second objective comes from the nonexact inversion and small deltas that are amplified when propagated. Therefore we adjust representations according to the accumulated error. For the first objective, we minimize the layer reconstruction loss,
(1) 
where is the input of the target layer. As for the second objective, we aim to ensure the reconstruction quality of the current temporary “overall” inversion model up to the layer such that . This translates into minimizing the distance between inverted input to the original input , where we introduce another loss term for this purpose:
(2) 
3.2 CycleConsistency Guided Inversion
We further reexploit the target model for stronger inversion guidance amid the unique setup of the inversion problem. As we reverse the computation of the target model, this forms a natural loop with the original computation flow. Inspired by perceptual metric (Richard2018) and cycleconsistent image translation (zhu2017unpaired), we explore cycle consistency to measure the quality of reconstructed inputs by rechecking them with the target model. Intuitively, if the reconstructed input is faithfully and semantically close to the original input then the direct model should produce similar, if not exact, feature responses at all layers. To this end, we cycle the reconstructed input back to the direct (target) model, and minimize the distance between features of the reconstructed input and original input at various depth. The cycle consistency for inversion is formally defined as follow:
(3) 
This enables a full utilization of the features from the original input twice to provide richer supervision during optimization of inversion model. In the layer reconstruction loss (Eq. (missing)), we use features of the original input as the reconstruction objective. In addition, the cycle consistency loss (Eq. (missing)) also uses features of the original input as a reference and enforces the inverted input to have similar features as the original input. With the above losses, the final optimization objective for an inversion layer can thus be expressed as
(4) 
where is a hyperparameter.
3.3 Training Strategy
DCI divides the computation of a feedforward neural network into several parts and inverts the computational flow progressively. One straightforward strategy is to sequentially optimize each individual inversion module
starting from the first (i.e., input) layer . We observe that the inversion error will accumulate as we move deeper in the model. To mitigate this accumulation issue, we utilize an improved training strategy. After the optimization of a certain inversion module , we further take all previous inversion modules into consideration and finetune all layers up to , i.e., , with the same loss to reduce the accumulated inversion error. When , we skip this finetuning because there is no inversion model before .3.4 Data Sampling
One remaining challenge to optimize inversion modules with Eq. (missing) is input data . When the target model is a generative model (i.e.
, the generator of a generative adversarial network), it is effortless to get input data by sampling random latent codes from a Normal (or Uniform) distribution. However, when the target model is an image classification model, it is not feasible to know or sample images from the underlying image distribution. Although one can model a superset of input images by sampling each pixel independently from a uniform distribution
, this superset would be too different from the real data distribution, and in practice too loose as input proxies (yin2020dreaming).It may seem that the original training data is needed for finetuning of the inverted model, but the model itself is more than sufficient to provide enough guidance. Inspired by the progress in adversarialfree generative models (li2017mmd; binkowski2018demystifying) and datafree knowledge distillation (yin2020dreaming; xu2020generative), we reuse these techniques to generate a small subset for finetuning. The method we choose minimizes the discrepancy between features statistics of synthetic data and statistics stored in batch normalization (BN) layers:
(5) 
where and are features statistics computed at the th layer. and are their corresponding moving averages stored in BN layers. is the cross entropy between and a randomly assigned label , and
is a secondary image regularization like total variance to make derived images more natural
(yin2020dreaming). Minimizing the discrepancy of features’ statistics between synthetic data and real data (i.e., BN statistics) is equivalent to reducing the integral probability metric
(li2017mmd; binkowski2018demystifying; muller1997integral) between distributions of synthetic data and real data^{1}^{1}1We detail this intuitive justification in appendix.. Thus, the synthetic data can be treated as data from a distribution which is close to the underlying real data distribution and thus can act as a reasonable proxy for the model inversion task.Naively combining BNguided data synthesis and other existing inversion approaches is suboptimal due to the following reasons:

[leftmargin=*]

Prior work (RN58; teterwak2021understanding) train a generator (in a GAN framework) as the inversion model, whereas the GAN training is very sensitive and challenging given only synthetic data (kodali2017convergence; tran2021data; zhao2020diffaugment). The resulting efficacy falls short to DCI, as we will show later in Sec. 4.3.

Prior work (Nguyenplugplay2017; RN58; teterwak2021understanding) usually require the whole training set (more than M images) to train the GAN for inversion. Yet deriving massive synthetic data remains slow (yin2020dreaming).
In contrast, without any adversarial learning, DCI does not have the above disadvantages when using synthetic data because of its progressive inversion nature and cycle consistency supervision. DCI is dataefficient and can be enabled by only K synthetic images.
4 Experiments
We next demonstrate the efficacy of our method on inverting discriminate and generative models.
Real images of px. from the ImageNet1K validation set.
Recovered images from RepVGG feature embeddings after 21 blocks .
4.1 Classifier Inversion on the ImageNet1K Dataset
We first study inversion of the classifiers. For this set of experiments, we consider the inversion of RepVGG (ding2021repvgg) that is one of the stateoftheart classification models with a deep yet mathematically neat architecture. It has a ResNetlike (He2015) multibranch topology during training, and a mathematically equivalent VGGlike (simonyan2014very) inferencetime architecture, achieved by layer folding. Specifically, we target a RepVGGA0 (ding2021repvgg) that has convolution layers, each of which consists of a convolution, a ReLU, and a folded BN layer, yielding a top1 validation accuracy on ImageNet^{2}^{2}2Based on the author CVPR’21 released repo and models at https://github.com/DingXiaoH/RepVGG. We invert one convolution layer at a time using our DCI strategy, starting from the input layer. Each inversion layer mimics its corresponding target counterpart. Each inversion layer is optimized via Adam (kingma2014adam) for K iterations. We use K synthetic images as detailed in Sec. 3.4 for the optimization. See appendix for additional experimental details.
Main Inversion Results.
Fig. 2 shows inversion results on ImageNet1K. Remarkably, inversion recovers very close pixelwise proxy to the original image from deep embedding outputted by the th layer^{3}^{3}3The spatial size of this embedding is (, )., preserving original semantic and visual attributes, including color, orientation, outline, and position.
Inversion at Different Layer Depths.
As noticed, edges are blurred in inverted images, likely caused by information filtering due to the classification nature of the model. To dig deeper into this hypothesis, we visualize the inverted images from different depths of the target model in Fig. 3
. As the depth increases, inverted images become more blurry, losing more details. To quantify the changes, we also provide peak signaltonoise ratio (PSNR)
(psnr) and learned perceptual metric (LPIPS) (Richard2018) in Fig. 3 (right) that is consistent with the qualitative observation.One intriguing fact shown in Fig. 3 resides in a consistent preservation of high level information across most layers, challenging prior security arguments in split computing (kang2017; Jeong2018; eshratifar2019jointdnn; li2018auto; jankowski2020joint; eshratifar2019bottlenet). For example, as shown in Fig. 3, we can achieve nearly perfect recovery from outputs of the th layer (with spatial size
px) that already past three stride
convolutions perceived as lossy operations. We also find that increasing the depth is not always effective in making inversion harder, hence hinges on easy mitigation of such risks via deploying deeper subnets on devices. As shown in Fig. 3 (right), using features from the th to th layers still results in recovered images of similar quality in terms of PSNR and LPIPS. Finally, we find that the last two layers dramatically degrade the inversion quality. One can still recognize the class of inverted images after the penultimate (i.e., the th convolution) layer. However, if we invert features after the final (i.e., the fullyconnected) layer, only the predominant color is recognizable^{4}^{4}4We include inversion results from features of the th convolution layer and final fullyconnected layer in the appendix.. This may indicate that classinvariant information is quickly filtered out towards the end of the model while the initial stages focus on feature extractions. This is also in line with observations in transfer learning
(NIPS2014_375c7134; li2020rifle; long2015learning; dollar2018rethinking; zoph2020rethinking)and selfsupervised learning
(he2020momentum; grill2020bootstrap; chen2020big).Recovering Adversarial Samples.
We next leverage inversion to understand mechanism of adversarial images (kurakin2016adversarial). During the crafting process of adversarial samples, adversarial perturbation is optimized to maximize the tangle of network prediction (43405). We find that inversion model has discrepant behavior for randomly (Gaussian noise) perturbed images and adversarially perturbed ones. As shown in Fig. 4, randomly perturbed images share close proxy semantics with original images. However, when we do inversion from features of adversarial samples crafted by PGD (madry2017towards)
, the recovered images reveal highorder chaos. This occurrence implies that not only the prediction results but also the intermediate features are disrupted by adversarial samples. The visualization of feature maps corresponding to images with random and adversarial perturbations in the appendix reinforces the above finding. This provides heuristic insights on largescale dataset to prior work that argue random perturbation with same magnitude won’t disturb the network
(carlini2017adversarial; lee2018simple). The discrepant inversion behaviors for adversarial and legitimate samples may inspire defense methods.SNGAN generated samples with px (the first generation).
Recovered images (the second generation).
4.2 GAN Inversion on the CelebA Dataset
We next shift to inversion of generative models. For this experiment we focus on the popular SNGAN architecture that has Inception Score (IS) on the CelebA dataset of px resolution (miyato2018spectral). The SNGAN has layers, including one fullyconnected layer at the beginning, one convolution layer at the end, and residual blocks. Each residual block consists of convolution layers.^{5}^{5}5Based on implementation and models at https://github.com/kwotsin/mimicry/ We break the inversion of the whole SNGAN model into inversion of the fullyconnected layer, the last convolution layer, and residual blocks with the optimization objective in Eq. (missing) and the training strategy in Sec. 3.3. For mathematical consistency with prior GAN literature, we use to indicate the target generator instead of . denotes sampled latent code that generates . We aim to learn that faithfully reconstructs given , a problem that has large piratical impacts (e.g., image compression) but remains challenging (QiLei2019; styleganv2).
Main Inversion Results for Latent Code Recovery.
Fig. 5 visualizes our inversion results of the target generator. To demonstrate efficacy we first use the inversion model to recover latent codes, and then pass the recovery back to the generator. Without bells and whistles, the second generation images faithfully align with the first generation. In addition, recovered faces are visually very similar to the original targets. One interesting observation is that the inversion and regeneration process purify the secondary attributes such as backgrounds, but preserve main attributes of the faces. This further indicates that our inversion takes semantic information into consideration on account of the cycle consistency guidance.
Interpolation of Recovered Latent Codes.
To show validity of recovered latent code space, we conduct linear interpolation between recovered latent codes as in Fig. 6. This signifies that the recovered latent codes fit well in the input space of SNGAN, sitting on smooth transitions between adjacent samples.
Improving Defective Generated Images.
One favorable concomitant of inversion and regeneration is that the defective firstgeneration images can be greatly after reprojection, shown in Fig. 7
. Here a first generation image is generated by SNGAN starting from a randomly sampled latent code. The random latent code, as well as the first generation image, could be outlierers, slightly diverging from the distribution of normal input. Since the inversion model is trained for the outputtoinput mapping, it finds the closest indistribution latent code for the original outlier latent code during recovery, hence improves image quality.
Real vs. Generated Images.
All the above experiments are conducted on the first generation images from SNGAN. We next check upon whether the inversion model has distinct behaviors for real images, as noticed by Kerras et al. latent code per real image results in a distant synthesis than the original sample (styleganv2). For a fair comparison, we first save both real images and generated images in the format of PNG (wiki:Portable_Network_Graphics) and use strictly the same inversion pipeline for both of them.
Although SNGAN is trained to map a base distribution (e.g.
, Normal distribution) to the underlying distribution of real images, there is still a discrepancy between SNGAN’s output distribution and the underlying distribution. We observe that the inversion model is able to capture such distribution discrepancy, demonstrating varying behaviors on the generated and real images, as shown in
Fig. 8. The inversion model can still faithfully invert generated images, but shift real images and alter the style after inversion and regeneration. Instead real images and their second generation merely share orientation and color. This may inspire future work on detecting deep fake images (tolosana2020deepfakes; chesney2018deep).Original images  (a) DeePSiM (RN58)  (b) DeepDream (mordvintsev2015deepdream) 
(c) DeepInversion (yin2020dreaming)  This work 
Method  Inference Time (s)  PSNR  LPIPS 
DeepDream (mordvintsev2015deepdream)  0.55K  9.53  0.90 
DeepInversion (yin2020dreaming)  1.92K  10.93  0.60 
This work  0.015  18.83  0.44 
Inference time measured on NVIDIA V100 GPU at batch size . 
4.3 Comparison to Prior Work
Next, we compare our method with other approaches under the zeroshot model inversion setup, that either optimize an auxiliary network ((a) DeePSiM (RN58)) or input tensors ((b) DeepDream (mordvintsev2015deepdream) and (c) DeepInversion (yin2020dreaming)). We include their detailed setups in Appendix.
We show both qualitative comparison in Fig. 9 and quantitative results in Table 2. The inferior result of DeePSiM (trained with synthetic data) demonstrates the incompatible combination of previous generative inversion approaches (Nguyenplugplay2017; RN58; teterwak2021understanding) and BN guided data synthesis as discussed in Sec. 3.4. DeepDream yields unrecognizable images compared to our method, while DeepInversion results in improved features and semantics even though a gap remains between recovered and original images. Both DeepDream and DeepInversion require K forward and backward passes to optimize inputs – while our method only needs one forward pass through the inversion model, hence is much more efficient.
4.4 Inverting More Architectures on ImageNet1K
To demonstrate the general applicability of the proposed method, we next show inversion results for additional networks on the ImageNet dataset, covering varying architectures (ResNet and ResNet) and training recipes (standard (He2015) and selfsupervised (chen2020mocov2)).
Inverting ResNet18.
We start with ResNet18 inversion. For this experiment we base the network on the implementation and pretrained model of ResNet
from the PyTorch model zoo
(torchvision). ResNet contains units of BasicBlock, each of which consists of two convolution layers (with BN and ReLU) and a shortcut connection. BasicBlock has a multibranch architecture because of the existence of shortcut connection. We invert the BasicBlock as a whole unit. The first four sequential layers in ResNetare convolution, BN, ReLU, and max pooling layers. Since it is impossible to invert a max pooling without additional information
(estrach2014signal), we invert the sequence of the first four layers as a whole block, which we refer to as the initial block. Same as before, each inversion block mimics its corresponding target counterpart but with reversed input and output dimensions.Fig. 10 shows inversion results of pretrained ResNet on ImageNet. With the proposed method, we are able to recover recognizable input images after up to blocks (the initial block and BasicBlock’s) that contain convolution layers and one max pooling layer in total. We observe inversion can still faithfully restore semantically correct images of high fidelity that contain similar visual details to original counterparts.
Real images of px. from the ImageNet1K validation set.
Recovered images from ResNet18 feature embeddings after convolution layers .
Inverting ResNet50 (standard (He2015) and selfsupervised MoCo V2 (chen2020mocov2)).
We next move to invert a deeper architecture as ResNet. For additional insights to network behavior, we study inversion of both a normally trained network as in (He2015), and the recent selfsupervised trained network as in MoCo V2 (chen2020mocov2)^{6}^{6}6Based on https://github.com/facebookresearch/moco.. For this experiment, we break the overall ResNet architecture into five subnetworks and invert one subnetwork each time.
Same as ResNet, the first subnetwork is the initial block. The other four subnetworks consist of Bottleneck’s respectively (He2015).
Real images of px. from the ImageNet1K validation set.
Recovered images from ResNet50 (standard) feature embeddings after convolution layers .
Recovered images from ResNet50 (MoCo V2 (chen2020mocov2)) feature embeddings after convolution layers .
Fig. 11 shows the main inversion results of ResNet on ImageNet. We observe that a stronger feature extractor preserves more information  see a quick comparison between second and third rows in Fig. 11. Selfsupervised pretraining leads to stronger inversion, when compared to a standard training recipe of the same network architecture. This is consistent with the recent findings in (yin2021see). These results demonstrate that we are able to recover recognizable input images for up to subnetworks, which have convolution layers and one max pooling layer in total.
Limitations
We observe that the last convolution, pooling, and fullyconnected layers remain lossy for inversion, which may stem from the fact that the network is pretrained heavily for the classification task. Yet, we observe inversion viability up to the st convolutional block. This unveils that DNNs extract such a fertile amount of imagespecific representation, that even pixelwise recovery is viable.
Conclusions
We have shown feasibility of network inversion of deep models on complex datasets, while alleviating most constraints by prior work such as model, training, or dataset priors. Our zeroshot method unifies inversion of discriminate and generative inversion under one approach. We further presented extensive analysis of inversion behavior of largescale networks, characterized their behaviours in filtering information, defending adversarial attacks, improving defective generation, and varying response to generated and real samples.
Broader Impact
This work contributes to deeper understanding of DNNs’ invertibility. We conducted a range of experiments to study the properties the inversion induces. This will help both research and industrial communities to investigate data security vulnerabilities of machinelearningasaservice and improve their pretrained models. The method also reduces the amount of data required during inversion as compared to prior art, helping to alleviating environmental burdens at the same time.
References
Appendix A Appendix
a.1 Additional Details for Main Manuscript
Lossy Final FullyConnected Layer.
In the main manuscript, we show that the recovered images from RepVGG feature embeddings after convolution layers preserve original semantic and visual attributes. However as we discuss in Limitations, we find that information in the feature embeddings decays rapidly through the last two layers in RepVGG. This may indicate that classinvariant information is quickly filtered out towards the end of the model while the initial stages focus on feature extraction, which is aligned with prior observations in transfer learning (NIPS2014_375c7134; li2020rifle; long2015learning; dollar2018rethinking; zoph2020rethinking) and selfsupervised learning (he2020momentum; grill2020bootstrap; chen2020big). We visualize the above findings in Fig. 12. One can still recognize the class of inverted images after the penultimate (i.e., the th convolution) layer. However, if we invert features after the final (i.e., the fullyconnected) layer, only the predominant color is recognizable.
(a) Original images 


Experiential Setup for Our Method.
For all experiments in the main manuscript, each inversion step is optimized via Adam (kingma2014adam) for K iterations. We set the initial learning rate as during optimization of a certain inversion module (K iterations) and during the following finetuning of all layer up to such that (K iterations). We use cosine annealing with warm restarts strategy to adjust the learning rate during optimization (cosine; loshchilov2016sgdr). The coefficient of cycle consistency loss in Equ. 4 is to ensure it has similar magnitude as other loss terms. We use K synthetic images as detailed in Sec. 3.4 for the optimization.
Experimental Setup for Prior Work.
We next elaborate on details for prior baselines:

DeePSiM (RN58) learns a generator as inversion model which takes feature embeddings from target model as latent code and outputs recovered images. The generator is trained adversarially competing with an extra binary discriminator. For fair comparison, we replace the ImageNet training set used by (RN58) with K synthetic samples for optimization of inversion model.

DeepDream (mordvintsev2015deepdream) and DeepInversion (yin2020dreaming) both backpropagate gradients onto inputs to optimize them towards natural images. For a fair comparison, we replace the CE loss in original setup (yin2020dreaming; mordvintsev2015deepdream) and use distance instead between synthesized and target embeddings to invert the same target layers as in the proposed method. We scale this loss to similar magnitudes as other loss terms with a scaling factor of , other scaling terms the same as (yin2020dreaming). We study a randomly sampled validation batch of batch size for this comparison. We use setting that consumes K updates per batch as in (yin2020dreaming) amid the requirement for feature map dimension consistency.