Data from Model: Extracting Data from Non-robust and Robust Models

07/13/2020 ∙ by Philipp Benz, et al. ∙ KAIST 수리과학과 0

The essence of deep learning is to exploit data to train a deep neural network (DNN) model. This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model. We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information by measuring the accuracy drop on the original validation dataset. We perform this experiment for both a non-robust and robust origin model. Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models. The success of this cycling transformation can be attributed to the shared feature mapping existing in data and model. Using the same data, we observe that different DtM processes result in models having different features, especially for different network architecture families, even though they achieve comparable performance.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In deep learning applications, such as image classification [6, 18], data is used to train deep neural network (DNN) models. This work explores the reverse process of generating data from the model, with one general question in mind: What is the relationship between data and model? This question cannot be addressed well by only focusing on the model training process, Data To Model (DtM) [8]. Thus we combine the widely adopted DtM with its reverse process of Data from Model (DfM). More specifically, we repeat the process of DtM and DfM in sequence and measure the accuracy over the original validation dataset. In this chaining process, we always assume access to only either the data or the model generated in the previous process. More specifically, in the DtM process, we can only access the data generated from the previous DfM process, and similarly, in the DfM process, we only access the model generated by the previous DtM process. This chain process is depicted in Figure 1.

Figure 1: The chain of performing DtM and DfM repetitively. The blue arrows indicate the DtM process and the red arrows indicate the DfM process.

Our work is mainly inspired by [7] which attributes the success of adversarial examples to the existence of non-robust features (mappings) in a dataset. During the typical DtM process, these non-robust features are learned by a model which consequently has the same non-robust features. This indicates the feature mapping as the link between data and model. In their work, the original training dataset is adopted as the background image in the data extraction process [7]. In this work, we explore the possibility of retrieving a learned feature mapping from a trained model without the original training dataset, which makes the DfM process more meaningful. Moreover, we iterate the DtM and DfM process in sequence instead of just performing it once. Another aspect of this work is to explore whether such feature mappings are the same or different for different runs for the same or different architectures.

To decode the learned features of a model into a dataset without knowledge of the original training data, we adopt random substitute datasets as background to increase sample diversity and introduce virtual logits to model the logit behavior of DNNs. Our experiments show that we can obtain models with similar properties as the original model in terms of both accuracy and robustness. We showcase the effectiveness of our approach on MNIST and CIFAR10 for both non-robust and robust origin models.

2 Related Work

DNNs are vulnerable to small, imperceptible perturbations [13]. This intriguing phenomenon led to various attack [4, 2, 9, 11, 16] and defense methods [3]. Interpretations for the reason of the existence of adversarial examples have been explored in [4, 14, 17]. Ilyas et al[7] attributed the phenomenon of adversarial examples to the existence of non-robust features. They introduce features as a mapping relationship from the input to the output. We adopt this definition and aim to extract this feature mapping from the model to the data. Model visualization methods [12, 10] can be seen as a DfM method, without training a new model on the extracted data. Such methods are commonly further exploited for model compression [5, 1] techniques. Instead of compressing models, [15] aims to compress an entire dataset into only a few synthetic images. Training on these few synthetic images, however, leads to a serious performance drop.

3 Methodology

Given a -classification dataset consisting of data samples and their corresponding true class , a DNN ( omitted from now on) parameterized through the weights

is commonly trained via mini-batch stochastic gradient descent (SGD) to achieve

. In this work, we term this process DtM (data to model) and we explore its reverse process of extracting data from model (DfM). More specifically, starting from origin dataset DtM results in origin model , with DfM, can be extracted which leads to through DtM and so on. During the DtM and DfM process, we assume having no access to the previous models and datasets, respectively.

To retrieve features from a model and store them in the form of data, we deploy the -variant of projected gradient descent (PGD) [9]. Due to the absence of the original dataset, we leverage substitute images from a substitute dataset , as well as virtual logits

as the target values for the gradient-based optimization process. We specify the logit output of a classifier as

and use the -loss between the network output logit and the virtual target logit

as the loss function optimized by PGD. The retrieved dataset consists of images

, where

indicates a vector optimized through PGD and

lies in the range . The data samples and their respective output logit values represent the new dataset.

After the DfM process, the retrieved dataset can be used in the DtM process, by training the model weights with the -distance between the previously stored ground truth logit vector and the output logit vector, .

We heuristically found a simple scheme of Gaussian distributed values

for the virtual logits. The highest logit is determined by , and the mean values of the remaining logit values are equally separated between with . The order of logit values is chosen randomly to introduce diversity into the dataset.

In the above process, the origin model is non-robust. Following [7] we also use a robust model for obtained with adversarial training and repeat the chaining process.

4 Experiments

4.1 DfM and DtM in sequence

Figure 2: Qualitative results for the DfM process starting from a non-robust origin model ( columns on the left) and a robust origin model ( columns on the right). The first row indicates the origin dataset. The subsequent rows indicate the obtained dataset after the -th DtM and DfM process. The results are shown for the LeNet architecture with Fashion MNIST as the background images.

We sequentially apply DfM and DtM starting from the origin model . The non-robust and robust origin models

are obtained through standard and adversarial training, respectively, on LeNet for MNIST and VGG8 for CIFAR10. VGG8 refers to a VGG network with only one convolution layer between each max pooling operation. To obtain

from the preceding model , k images are extracted from through the DfM process. For the background images, we choose Fashion MNIST and MS-COCO as background images for MNIST and CIFAR10, respectively. The generated images are shown in Figure 2. In subsequent DtM, dataset is then used to train the model which is an independent model of the same architecture. The accuracy for all models is reported on the original validation dataset. We present the results with five repetitions of this process in Table 1.

non-robust robust non-robust robust
Table 1: Applying DtM and DfM in sequence for standard and adversarially trained models.
non-robust robust
Table 2: Robustness evaluation of the models obtained during the chaining process for a non-robust (left) and robust (right) origin model. The results are reported for the LeNet architecture on MNIST.

Qualitative results in Figure 2 show that the extracted images look totally different from the original images, due to which it might be tempting to expect the models trained on them will work poorly on the original validation dataset. Table 1, however, shows that comparable performance is achieved and this is due to similar feature mappings existing in the generated images despite the large visual discrepancy. Nonetheless, we observe that there is a general trend that the accuracies for both the non-robust and robust models decrease for each sequence of DfM/DtM. The accuracy increase by from to for the robust VGG8 model is the only exception to this. For the non-robust origin model , the accuracy drop is trivial in the first few iterations of the DfM and DtM process and it becomes more observable in later iterations. For the robust origin model , the accuracy is retained better. For example, the robust LeNet only decreases by from to , while the non-robust LeNet decreases by .

We further investigate the robustness to adversarial examples for the models from the DfM/DtM process. Therefore we evaluate the different retrieved MNIST models on adversarial examples generated with -PGD under different , update steps and the corresponding step size calculated as . Similar to the observed accuracy drop for clean images, a similar trend occurs when the model is under attack. However, the accuracy degradation seems to be more severe for robustness. For example, the accuracy of the non-robust LeNet drops from to for a relatively weak attack of while the clean image accuracy drops from to . Similar behavior is observed for the models originating from the robust . It is worth mentioning that the models originating from the robust are consistently more robust than their counterparts originating from the non-robust . After two subsequent DfM iterations starting from the robust , still shows similar robustness as the standard non-robust model . This result suggests that DfM can be applied as an alternative to adversarial training.

4.2 DfM and DtM on different architectures

VGG16 VGG19 ResNet18 ResNet50


VGG16 ()
VGG19 ()
ResNet18 ()


VGG16 ()
VGG19 ()
ResNet18 ()
Table 3: Cross-training of the extracted datasets from non-robust (top) and robust (bottom) models. The models were originally trained on CIFAR10. The robust models were adversarially trained with the variant of PGD. The rows indicate the model from which the data was extracted. The columns indicate the trained model. The values indicate the accuracy of the CIFAR-10 test dataset.

The above analysis shows that DtM and DfM can be performed for the same and simple architecture with a limited performance drop. Here we apply DfM to the standard and adversarially trained CIFAR10 models and train different state-of-the-art architectures on the extracted data. For simplicity we stop the chaining process after obtaining . The results in Table 3 show that all model architectures can be successfully trained on the extracted data. Similar to Table 1, a performance drop is observed for the data extracted from the non-robust . For the robust , however, we observe that the retrained consistently outperforms their corresponding for both similar and different architectures, which is somewhat surprising.

4.3 Do different models learn different feature mappings?

VGG16 VGG19 ResNet18 ResNet50




Table 4: Cross-evaluation of datasets extracted from non-robust (top) and robust (bottom) models. The models were originally trained on CIFAR-10. The robust models were obtained with adversarial training via the variant of PGD. The diagonal values were obtained for the same architecture but a different training run. The accuracy of the extracted data on the original model is .

Given the possibility to extract a certain feature mapping from a model, in this section, we analyze whether different models trained from the same origin dataset have different feature mappings. To this end, we utilize k feature images extracted from models trained under the same conditions, and perform a cross-evaluation of the model accuracy on each other. The results for models with standard and adversarial training are reported in Table 4. We observe that the cross-evaluation accuracies are higher than random guess, which indicates that some shared feature mappings are learned. However, for both non-robust and robust models only in a few cases an accuracy higher than is achieved. This phenomenon can also be observed when the extracted dataset was evaluated on an independently trained instance of the same architecture as the original architecture. The relatively low cross-evaluation accuracies illustrate that models from different runs learn different features which are not fully compatible with each other.

Another interesting observation is that model architectures from the same network family, VGG family for instance, seem to have more common feature mappings than different architectures. For example, for the feature images extracted from the standard ResNet50, the ResNet networks exhibit an accuracy of around , while the VGG networks only show an accuracy of around . This phenomenon is more prevalent in non-robust models than in robust models. Overall, the results show that different models learn different feature mappings from the same dataset even though they have comparable classification accuracy.

5 Conclusion

In this work we introduced the Data from Model (DfM) process, a technique to reverse the conventional model training process, by extracting data back from the model. A model trained on the generated dataset that look totally different from the original dataset can achieve comparable performance as their counterparts trained on the origin dataset. The success of this technique confirmed feature mapping as the link between data and model. Our work provides insight about the relationship between data and model as well understanding of model robustness.


  • [1] K. Bhardwaj, N. Suda, and R. Marculescu (2019) Dream distillation: a data-independent model compression framework. arXiv preprint arXiv:1905.07072. Cited by: §2.
  • [2] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In Symposium on Security and Privacy (SP), Cited by: §2.
  • [3] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay (2018) Adversarial attacks and defences: a survey. arXiv preprint arXiv:1810.00069. Cited by: §2.
  • [4] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), Cited by: §2.
  • [5] M. Haroush, I. Hubara, E. Hoffer, and D. Soudry (2019) The knowledge within: methods for data-free model compression. arXiv preprint arXiv:1912.01274. Cited by: §2.
  • [6] K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In

    European Conference on Computer Vision (ECCV)

    Cited by: §1.
  • [7] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry (2019) Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §1, §2, §3.
  • [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §1.
  • [9] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), Cited by: §2, §3.
  • [10] A. Mahendran and A. Vedaldi (2015) Understanding deep image representations by inverting them. In

    Conference on Computer Vision and Pattern Recognition (CVPR)

    Cited by: §2.
  • [11] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard (2017) Universal adversarial perturbations. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  • [12] A. Mordvintsev, C. Olah, and M. Tyka (2015) Inceptionism: going deeper into neural networks. Cited by: §2.
  • [13] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §2.
  • [14] T. Tanay and L. Griffin (2016) A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690. Cited by: §2.
  • [15] T. Wang, J. Zhu, A. Torralba, and A. A. Efros (2018) Dataset distillation. arXiv preprint arXiv:1811.10959. Cited by: §2.
  • [16] C. Zhang, P. Benz, T. Imtiaz, and I. Kweon (2020) CD-uap: class discriminative universal adversarial perturbation. In

    AAAI Conference on Artificial Intelligence (AAAI)

    Cited by: §2.
  • [17] C. Zhang, P. Benz, T. Imtiaz, and I. Kweon (2020) Understanding adversarial examples from the mutual influence of images and perturbations. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  • [18] C. Zhang, F. Rameau, S. Lee, J. Kim, P. Benz, D. M. Argaw, J. Bazin, and I. S. Kweon (2019) Revisiting residual networks with nonlinear shortcuts. In British Machine Vision Conference (BMVC), Cited by: §1.