Adversarial Out-domain Examples for Generative Models: Adversarial examples for GAN generators
Deep generative models are being increasingly used in a wide variety of applications. However, the generative process is not fully predictable and at times, it produces an unexpected output. We will refer to those outputs as out-domain examples. In the present paper we show that an attacker can force a pre-trained generator to reproduce an arbitrary out-domain example if fed by a suitable adversarial input. The main assumption is that those outputs lie in an unexplored region of the generator's codomain and hence they have a very low probability of being naturally generated. Moreover, we show that this adversarial input can be shaped so as to be statistically indistinguishable from the set of genuine inputs. The goal is to look for an efficient way of finding these inputs in the generator's latent space.READ FULL TEXT VIEW PDF
Adversarial Out-domain Examples for Generative Models: Adversarial examples for GAN generators
The existence of adversarial examples has been demonstrated for a quite large set of deep learning architectures[adv, advseg, advrl]. An adversarial input is a carefully forged data instance that aims at driving the model into an incorrect or unexpected behaviour. Moreover, the adversarial setup requires that those instances must be as much as possible indistinguishable from genuine inputs.
In the present work, motivated by the extensive studies carried out on adversarial inputs for discriminative models, we extend the adversarial context into the increasingly popular generative models field. In particular, we focused on the most promising class of architectures, called Generative Adversarial Networks (GANs) [goodfellow2014generative]. GANs
implicitly perform generative modeling of a target data distribution by training a deep neural network architecture. This is composed by two neural networks, a generator and a discriminator that are trained simultaneously in a zero-sum game. In the end, the generator learns a deterministic mapping between a latent representation and an approximation of the target data distribution. What we show with the present work is that a pre-trained generator can be forced to reproduce an arbitrary output if fed by a suitable adversarial input. In particular, our findings show that the data space, defined by the generator, contains data instances having very low probability of lying in the space of the expected outputs (i.e., the target data distribution). We will refer to those outputs asout-domain examples and to the relative adversarial inputs as
out-domain latent vectorsor OLV in short. Figure 1 shows a set of out-domain examples for a generator trained by using a Progressive GAN [progan]. In that example, we found a set of inputs capable to force the generative model111 A pre-trained Progressive GAN available at https://tfhub.dev/google/progan-128/1. to produce images completely different from those belonging to the generator training dataset, i.e., the CelebA dataset [celeba]. Moreover, we show that those OLV ’s can be forged in order to be statistically indistinguishable from known-to-be trusted inputs. The existence of such adversarial inputs raises new practical questions about the use of GAN generators in an untrustworthy environment, as a web application. The main contributions of the present paper may be summarized as follows:
We show that a generator may be forced to produce out-domain data instances which are arbitrarily different from those for which the generator is trained.
Our experiments refer to three common image datasets and for standard and conditional GAN architectures: Deep Convolutional GAN (DCGAN) [dcgan] and Auxiliary Classifier GAN
Auxiliary Classifier GAN(ACGAN) [acgan].
We propose a first type of adversarial input for not encoder-based generative models.
We investigate the nature of out-domain examples showing that their quality strongly depends on the dimension of both the latent and the data space.
The objective of a generative model is to learn a probability distributionthat approximates a target data probability distribution . Actually, in general, is unknown and it can only be inferred by a limited set of samples. One of the most powerful approaches to train a generative model is the recently proposed Generative Adversarial Networks (GANs) framework. GANs require the simultaneous training of two neural networks: a generator and a discriminator . This unsupervised training process is modeled as a zero-sum game222Renamed Adversarial Training in this context. where ’s objective is to maximize the probability of discriminating between 333which is a function of . and , whereas ’s objective is to make and indistinguishable in order to mislead . A trained generator can be intended also as a deterministic mapping function between the latent space defined by and the data space defined by , i.e. .444whereas . In other words, during the training process, the neural network receives as input a vector and produces . Each element of is a realization of a random vector , with , where is an arbitrary density function. Then, the optimization problem can be easily summarized by the following formulation:
where and are, respectively, the parameters of the generator and the discriminator network and is an instance from the training set .
The use of random latent instances as input of makes possible to explore the data space by generating new data instances, not necessarily available in the training set.
Many extensions to the original GAN framework have been successfully developed [dcgan, acgan, wgan, donahue2016adversarial]. One of the most influential work is [dcgan], in which the DCGAN architecture was proposed. DCGAN is capable of exploiting the full potential of Convolutional Neural Network (CNN) in both the generator and discriminator perspective. The GANs framework can be easily extended to train conditional generators using the ACGAN architecture [mirza2014conditional, acgan]. In this case, a supervised training approach is used to the purpose of learning a probability distribution conditioned to a set of classes . During this training process, the class is chosen randomly (e.g., with uniform probability) from the set including all the possible classes. Then, the generator is modeled as a bivariate function that receives as input an instance of the latent space labelled with its related class . ACGAN architecture improves the performance of the training process by adding an auxiliary classification task to the discriminator. The latter outputs two probability distributions, the first over the source (i.e., the probability that the instance comes from ) and the second over the class labels (i.e., the probability that the instance belongs to class ). In this case, the optimization problem can be parametrized by extending (1) as follows:
are the loss functions of the discriminator and the generator, respectively.
Some examples of adversarial inputs for encoder-based generative models like Variational AutoEncoder
Many works investigate the possibility of finding or exploiting an inverse mapping from the data space to the latent space of a pre-trained generator [nguyen2017plug, creswell2018inverting, precise_recovery]. In [nguyen2017plug] a pre-trained generator is used to invert a discriminative model to the purpose of synthesizing novel images. It is noteworthy that the authors report a first case of partial out-domain example. In particular, a generator trained on ImageNet was able to reproduce images belonging to classes known to but unknown to .
Additionally, a technique to map images into a latent representation with a pre-trained generator is proposed in [creswell2018inverting] and in [precise_recovery]. In those works, the authors mention the possibility of mapping images, which are not present in the training set, into the generator latent space. That is shown only for images coming from the same distribution on which the model is trained. The proposed inversion technique is essentially the same used in the present work and it is based on the direct optimization of the generator input by a gradient descent based approach. In [creswell2018inverting] during the optimization process, the latent vectors are encouraged to be similar
Variational AutoEncoder(VAE) and VAE-GAN are analyzed in [vae_adversarial]. In the proposed scenario, given a data instance , an attacker aims at producing an instance that differs in a limited way from but which is capable of driving the VAE to reconstruct a far from the original . The reconstructed can be an approximation of an arbitrary data instance chosen from the attacker. All the results described by the authors refer to as if it came from the same data distribution on which the VAE is trained. A similar attack scenario has been investigated also in [vae_attack2]. At the best of our knowledge no other form of adversarial input targeting GAN generators has been proposed.
to those of the latent prior distribution by adding a penalty term in the loss function. This term is a weighted sum of the discrepancy between the mean and standard deviation of the latent vector and the latent prior distribution. We exploited the possibility of extending this penalty term beyond the second moment, given that just two moments might not be sufficient to correctly identify the latent vectors.
is used to estimate the chances ofof being in the ’s dataset by calculating the distance between and .
Let be the set of all possible data that can be generated by and let be the set of all the possible latent vectors coming from . Then, an out-domain example for a generator is defined to be an element such that:
where is the out-domain latent vector () used to generate an out-domain example and both and are negligible probabilities. The underlying assumption of (3) is that the probability of of belonging to the set of expected outputs is negligible. In the present work, we will refer to the set of expected outputs of a generator with the term domain. The domain of a generator can also be intended as the semantic contents defined by 555For instance, the domain of the MNIST dataset is the set of digits representation and the domain of CelebA dataset is a set of human faces.. To the purpose of finding a suitable out-domain latent vector , we choose a target instance and then, we look for the such that there is the minimum distance between and . We refer to this process with the expression latent search.
Coherently with (3), the target instance is chosen ad hoc to be out of the generator’s domain. In our experiments, this is done by sampling from a dataset different and enough distant, from the semantic viewpoint666For instance, we can consider CIFAR10 and CelebA, sufficient distant but MNIST and SVHN not from the one used in the training process of the generator . This scenario, is depicted in Figure 2 where a set of out-domain examples from a DCGAN generator trained on the CIFAR10 (right panel) is reported. In this case, target instances have been randomly chosen from five datasets different from CIFAR10.
In addition, as required by (3), an out-domain latent vector is considered a valid input for the generator if it lies in a dense region of the latent space. This implies that must be statistically indistinguishable from a valid latent vector sampled from the latent probability distribution . In the adversarial perspective, this means that a defender is unable to tell apart a valid latent vector from an out-domain latent vector before the generation of .
The recent success of generative models in the scientific [particlegan, molgan] and in the entertainment field [animegan, musicgan], inspired the development of many GANs based software applications. These are often in the form of a web service with an interactive interface by which the user provides direct or indirect input to the model777One example can be found here: https://make.girls.moe.. Assuming a white-box access to the generator model, an attacker can find out-domain latent vectors capable of driving the service to produce inadequate contents such as pornographic or offensive material.
The attacker can use these out-domain examples in order to perform a very effective and straightforward defacing attack direct to the generator owner. Indeed, this type of web application and software allows to share888On social networks and save internal copies of the images created by the users. This scenario resembles a reflected or stored Cross-site Scripting (XSS) attack where the attacker is able to arbitrary modify an image in the web page.
The white-box assumption is supported by the observation that often these applications, in order to reduce the server load, run the generative model in the client-side999using framework as https://js.tensorflow.org/ and additionally pre-trained versions of open-source generator are frequently used.
We assume that the owner (referred as defender) performs a validation process on before the calculation of . This validation can be intended as a function . Therefore, the defender accepts to calculate if and only if is equal to . In our attack scenario, this function
is represented by a distributional hypothesis test. The null hypothesis () is that the vector is sampled from
. Thus, given a test statisticand for fixed type I error , the decision rule can be formalized as follows:
where is the distribution of the test statistics under and corresponds to the classic -value of confirmatory data analysis. The same scenario can be easily extended to conditional generators. In that case, we assume that the defender is able to arbitrary chose and fix a data class . The attacker aims at finding a suitable out-domain latent vector for the conditioned generator .
As mentioned in Section 4, out-domain examples can be found by looking for the closest representation of an arbitrary chosen target instance in the data space defined by . Actually, by leveraging the differentiable nature of the generator and the structure of a well formed latent-data mapping [dcgan], we can transform this searching problem in an efficient optimization process as follows:
Where is a given target instance, is a distance function and is a penalty term applied to . The purpose of a penalty term is to force the solution to be consistent with . More precisely, is defined as the weighted sum of the squared difference of the first sample moments of
and the theoretical moments of a random variable.
Where is the moment of and is the sample moment of the latent vector . The parameter is the weight assigned to the moment difference. In the case of conditional generators, the searching process is performed by fixing a class as input of the generator function. This implies that the optimization process acts only on the latent vector and cannot modify the class representation . More formally, in the conditional case, the problem can be reformulated as:
assuming that is randomly chosen from by the defender.
Starting from a random initialization of , say obtained by sampling from , we iteratively update the current latent vector according to the following rule:
and where is the learning rate.
At each iteration of the optimization process, the distance function is computed between the target and . We tested and compared two distance functions: the mean squared error (MSE) and the cross entropy (XE).
It is noteworthy that in the cross entropy case, the softmax function is used in order to ensure the unitary sum in both and . However, given its not-bijectivity, we force the comparison between the target and generated image to be scale invariant101010Only the brightness rate between pixels is compared.. Nonetheless, although this approach diverts from the original objective of founding the closest111111In the Euclidean sense. codomain instance to , XE is able to provide a very good approximation (at least in the visual form) of with fewer training iterations than MSE.
The penalty term is used to ensure the indistinguishability of the out-domain vector from a trusted input. The main objective is to find a such that the probability of is maximized. This can be obtained by forcing the out-domain latent vector to have moments equal to those of a random variable distributed as
. Indeed, in probability theory, Moment Generating Functions (MGFs) have great practical relevance not only because they can be used to easily derive moments, but also because they uniquely identify a probability distribution function, a feature that makes them a handy tool to solve several problems. The MGF (if it exists) can be seen as an alternative specification useful to characterize a random variable. On one hand, the MGF can be used to compute themoment of a distribution as the derivative of the MGF evaluated in 0121212The idea is to write the series expansion of and then apply the expected value both to the LHS and the RHS of the equation. For further details see [feller2008introduction].. On the other hand, a MGF is useful to compare two different random variables, studying their behavior under limit conditions. Given a random variable , its MGF is defined as the expected value of :
If (9) holds, then the moment of , denoted by , exists and it is finite for any :
In our experiments, we tested and compared two common prior distributions, i.e., the standard normal and the continuous uniform distribution in. Given the constraint imposed by the latter, we perform a hard clipping on values in order to force the latent vector to lie in the allowed hypercube. As proposed in [precise_recovery], we tested the stochastic clipping method but results showed no substantial improvement.
The quality of the out-domain examples is evaluated on different DCGAN generators and on conditional generators trained within an ACGAN framework. For the sake of exposition, we will refer to each trained generator with the following compact notation:
In particular, a generator is trained for any combination of the followings:
Architecture: DCGAN, ACGAN
Training Dataset: CIFAR10 [cifar10], SVHN [svhn] and a simple variation of MNIST [mnist], called ColorMNIST131313Only for DCGAN generators
Latent space dimension:
Latent prior distribution: and
For instance, DCGAN-CIFAR10-Normal- defines a DCGAN generator trained on CIFAR10 with a normal latent prior distribution and latent space dimension equal to . The ColorMNIST dataset is obtained by applying a random background color to the original MNIST. The reason of that modification is to offer to the generator the chance of representing a larger set of outputs141414By adding the colors to MNIST images we let the generator learn a larger number of RGB triplets. while keeping virtually unaltered the complexity of MNIST. All the generators and discriminators’ architectures as well as the hyper-parameters and the training process are the same proposed in [dcgan]. We tested three values of the latent space dimension that are commonly used in literature.
The weights used in the regularization function are listed in Table 9. The validation process of the out-domain vectors is performed by fixing , for the normal prior and for the uniform prior. We performed three different distributional tests, i.e., Kolmogorov-Smirnov, Shapiro-Wilk151515only for the Normal prior. and Anderson-Darling [shapiro1990test]. Results showed that, given the penalty term and an OLV , all the distributional tests bring to the same decision. The following results refer to the Anderson-Darling test [anderson1954test], which was finally chosen since its test statistics is based on the Cumulative Distribution Function CDF[ross2014first] and, compared to other tests, it places more weight on the values in the tails of the distribution.
We defined a test-set of target instances to the purpose of evaluating the capability of different generators to reproduce out-domain examples. This test-set contains randomly chosen instances from four image datasets i.e, Omniglot [omniglot], CelebA [celeba], UT-Zap50K [shoes] and Tiny ImageNet [tinyimagenet]161616These datasets have null particular domain intersection with the training datasets.. A random sample of images is selected for each dataset for a total of target instances. In the case of Tiny ImageNet, the images are sampled from classes which are different from those of CIFAR10. All images are forced to share the same dimension of pixels and to be normalized in the interval . To simplify the understanding of the results, the target distance function used for all the experiments is the Mean Squared Error (MSE). The average Mean Squared Error MSE171717The MSE is calculated between the target distance and the generated out-domain example. and the percentage of successfully passed statistical tests are computed on the test-set and used as main evaluation scores. In the latent search process, the Adam [adam] optimizer is used with a learning rate equal to . All the experiments are performed using the TensorFlow framework [tensorflow]. The most relevant codes used for the present work along with an interactive proof of concept are available on: https://github.com/pasquini-dario/OutDomainExamples.
Table 1 shows the results related to the DCGAN generators.
|(a) Normal Latent distribution||(b) Uniform Latent distribution|
|Dataset||dim.||Avg MSE||Test Succ.||Avg MSE*||Avg MSE||Test Succ.||Avg MSE*|
All the forged OLV pass successfully the distributional test, regardless of the chosen prior distribution. Several checks on the biases of the OLV have been carried out providing pretty good results (see Section 7.2 of the supplementary material for further details). This is due to the fact that the penalty term strongly constraints the values of the latent vectors in a well defined range. It is worth to notice that relaxing the moments penalty during the optimization process (5) would reduce further the MSE. In contrast to in-domain inversion [creswell2018inverting], we can state that out-domain examples do not take any significant advantage from latent vectors statistically close to those used during the training process. Figure 3 shows a set of target instances and related out-domain examples for a total of six generators. The upper panel reports the out-domain examples produced by three generators trained on different training sets but with same latent space dimension and prior. When the training set of the generator is ColorMNIST, the method fails in finding suitable OLV capable of reproducing the target images. For the other two, the generator is able to provide a valid reconstruction for all the targets. The failure of ColorMNIST may be connected to the fact that it is less heterogeneous with respect to SVHN and CIFAR10. By heterogeneity we intend the actual number of different pixels which are necessary in order to reproduce the same heterogeneity of the whole set (i.e., the entropy). It is reasonable to expect that the larger the variety of images in the training set, the larger will be the set of potential out-domain examples reproduced by the generator. As an estimator of that variety, we computed the Shannon entropy181818on the normalized pixels’ distribution.[jost2006entropy] for a sample of images from each training set.
Results are depicted in Figure 5 and show that there is a strong dependence between the average MSE and the entropy of the training set.
Figure 5 depicts a two-dimensional projection of a set of out-domain latent vectors and latent vectors directly sampled by . This representation is obtained by applying the dimension reduction algorithm called t-distributed Stochastic Neighborhood Embedding (t-SNE) [maaten2008visualizing] on vectors of size . It is possible to note how the out-domain latent vectors tend to be uniformly distributed in the space. In the case of the Omniglot dataset, the OLV tend to cluster in a specific region and this may be due to its intrinsic homogeneity.
Even if the entropy is a sort of predictor of the success of our method, it is still possible, given a target image, a latent prior and a training set, to enhance the quality of the generated image by increasing the dimension of the latent space. As a matter of fact, we can observe, by looking at the lower panel of Figure 3, that an increase of the latent space dimension makes the generated image more similar to the target one. An additional motivation can be that the latent space acts as an information bottleneck for the target instance during the latent search process191919A target image has a number of pixels equal to . The biggest latent size analyzed is .. All these possibilities are evaluated in terms of MSE. Figure 6 shows the average MSE for each latent space dimension, latent prior and training set confirming that the more is the entropy of the training set, the higher the probability of success in the generation and, at the same time, the larger the dimension of the latent space, the higher the quality of the reconstruction. Instead, there is no relevant difference in the quality of the out-domain examples when the latent prior distribution varies.
Figure 7 shows three examples of linear interpolation between latent vectors [dcgan]. The first row depicts a smooth and semantic meaningful transaction between two random vectors sampled from , referred as in-domain latent vectors. By semantic meaningful transaction, we mean that each image between the two interpolation points remains coherent with . The second row depicts the interpolation between an in-domain vector and an out-domain vector. In contrast with the first case, the transaction is unbalanced and not particularly smooth. From the sequence, it can be noticed that the semantic valid attributes of the starting image, i.e. the black of the hair and the reflection on the forehead, are deformed to recreate the final MNIST digit. The last row shows the extreme case of interpolation between two out-domain latent vectors. In this case, all the intermediate data instances never cross the in-domain set.
|(a) Normal Latent distribution||(b) Uniform Latent distribution|
|Dataset||dim.||Avg MSE||Test Succ.||Avg MSE||Test Succ.|
Conditional generators are trained to the purpose of enforcing their outputs to be part of a meaningful, from the semantic viewpoint, data class. Typically, this implies a better global coherence and quality in the definition of the generator’s data space [acgan]. This is especially true for models trained with the ACGAN framework in which the generators are encouraged to produce images that are correctly classified from the discriminator as genuine and belonging to its class. The experiments described below aim at finding out if the conditional extension is sufficient to the purpose of preventing the generation of out-domain examples.
We trained different ACGAN generators using the same set of parameters reported in Section 6.1. Also the architecture used for the generator and the discriminator is the same used for the previous DCGAN experiments202020 The only difference is the number of neurons in the generator’s input layer and in the discriminator’s output layer due to the conditional setup
We are able to find an out-domain example for each image in the test-set, conditionally to each class. We do not report the results when the training set is ColorMNIST, since it already failed in the less severe DCGAN experiment. As an example, Figure 9 shows the generated out-domain examples, conditionally to each class of CIFAR10, for four randomly chosen target instances in the test-set. It can be noticed that the class has no relevant impact on the quality of the out-domain examples: the attack succeeds regardless of the class. The same happens when attacking the generators trained on SVHN. Results in terms of MSE are summarized in Table 6.2. It is possible to observe that the average MSE is uniformly larger compared to the DCGAN experiments due to the conditional setup. In Figure 10, we also report the distribution of the MSE, conditionally to each class, for each training set. No specific patterns are registered: the MSE distribution is approximately the same for each class and training set. However, there is a slight variability for CIFAR10 given the larger heterogeneity among its classes. The validation of the out-domain latent vectors is the same described in Section 6.1. Also in this case, all the latent vectors pass successfully the Anderson-Darling test. Moments distributions for each dataset, latent prior and latent space dimension are also checked. No relevant difference has been observed with respect to the not-conditional generators experiments.
The only difference is the number of neurons in the generator’s input layer and in the discriminator’s output layer due to the conditional setup. Training hyper-parameters are the same proposed in [acgan]. As aforementioned, in the conditional setup, the hypothesis is that the class is randomly chosen by the defender and the attacker can not modify its representation during the latent search process. Tests and validation are performed as in Section 6.1 but they are evaluated conditionally to each class in the generators’ classes set. All the tested training sets are composed by classes. In this case, the MSE is calculated as the average over the classes.
We showed how to forge suitable adversarial inputs capable of driving a trained generator to produce arbitrary data instances in output. This is possible for both conditional and not-conditional generators. Additionally, we showed that an adversarial input can be shaped in order to be statistically indistinguishable from the set of trusted inputs. We also showed that the success of our method strongly depends on two main factors: the heterogeneity of the set on which the generator is trained and the latent space dimension.
In additional experiments we found a set of generators showing a greater resilience to the generation of out-domain examples. In particular, the Non-saturating GANs with ResNet architecture analyzed in [ganland] shows an inherent difficulty to produce out-domain examples even when the generator is trained on high entropic datasets such as CIFAR10. We conjucture that this property is strongly related to the generator architecture.
In the described adversarial scenario, we supposed that an aware defender can just test the validity of the model’s input in order to evaluate the genuineness of the latent vectors. However, it is possible to imagine a more powerful defender able to verify the generator’s output in order to spot unexpected generation.
As future directions of activity, we expect to i) investigate the generation of out-domain examples for other GAN architectures; ii) study the generation of out-domain examples in contexts other than those of images; iii) define a different way of finding these out-domain latent vectors, by relaxing the constraint imposed by the penalty term ; iv) investigate the possibility of training an arbitrary complex generator which is resilient to the generation of out-domain examples; v) evaluate the possibility of extending this defacing attack to a black-box scenario, using a local copy of the generator by queering the remote model, as proposed in [blackbox].
Figure 12 and 13 show the distribution of the biases, with respect to each single moment, for the normal and the uniform priors and for each combination of latent space dimension and dataset. In particular, Figure 12
highlights that the estimation of the odd moments for the normal is precise. The second and the fourth moment are instead slightly overestimated and underestimated, respectively. For the uniform prior, Figure13
shows that the bias for the second and the sixth moment is slightly positive (overestimation) and for the remainings the estimation is precise, with a quite large variance in the estimation of the third moment. No evident patterns are worth to notice when looking at the training dataset or the latent space dimension changes, for both the prior distributions.
Remark that for the standard normal distribution, 1st and 4th moments are equal to 0, the 2nd (which coincides with the variance) is equal to 1 and the 4th is equal to 3. For the uniform ininstead, the odd moments are all equal to 0, whereas the 2nd is equal to , the 4th is equal to and the 6th is equal to . Results are shown in Table 3 and 4.
Further out-domain examples for a DCGAN generator are depicted in Figure 14.