Deep neural networks (DNNs) have demonstrated great successes in advancing the state-of-the-art performance of discriminative tasks krizhevsky2012imagenet,goodfellow2016deep,he2016deep,collobert2008unified,deng2013recent,silver2016mastering. However, recent research found that DNNs are vulnerable to adversarial examples which are carefully crafted instances aiming to induce arbitrarily prediction errors for learning systems. Such adversarial examples containing small magnitude of perturbation have shed light on understanding and discovering potential vulnerabilities of DNNs szegedy2013intriguing,goodfellow2014explaining,moosavi2016deepfool,papernot2016limitations,carlini2017towards,xiao2018generating,xiao2018spatially,xiao2018characterizing,xiao2019meshadv. Most existing work focused on constructing adversarial examples by adding pixel-wise perturbations goodfellow2014explaining or spatially transforming the image xiao2018spatially,engstrom2017rotation (e.g., in-plane rotation or out-of-plane rotation). Generating structured perturbations with semantically meaningful patterns is an important yet under-explored field. At the same time, deep generative models have demonstrated impressive performance in learning disentangled semantic factors through data generation in an unsupervised radford2015unsupervised,karras2017progressive,brock2018large or weakly-supervised manner based on semantic attributes yan2016attribute2image,choi2018stargan. Empirical findings in yan2016attribute2image,zhu2016generative,radford2015unsupervised demonstrated that a simple linear interpolation on the learned image manifold can produce smooth visual transitions between a pair of input images.
In this paper, we introduce a novel attack which generates structured perturbations with semantically meaningful patterns. Motivated by the findings mentioned above, we leverage an attribute-conditional image editing model choi2018stargan to synthesize adversarial examples by interpolating between source and target images in the feature-map. Here, we focus on changing a single attribute dimension to achieve adversarial goals while keeping the generated adversarial image realistic (e.g., see Figure Document
for details). To validate the effectiveness of the proposed attack method, we consider two tasks, namely, face verification and landmark detection, as face recognition field has been extensively explored and the existing models are shown to be reasonably robust to attacks. We conduct both qualitative and quantitative evaluations on CelebA dataset liu2015deep. Please find more visualization results on the anonymous website:https://sites.google.com/view/generate-semantic-adv-example. The contributions of the proposed are three-folds. First, we propose a novel attack method capable of generating structured adversarial perturbations guided by semantic attributes. This allows us to analyze the robustness of a recognition system against different types of semantic attacks. Second, the proposed attack exhibits high transferability and leads to 65% black-box attack success rate on a real-world face verification platform. Third, the proposed method is more effective than pixel-wise perturbations in attacking existing defense methods, which could potentially open up new research opportunities and challenges in the long run.
Semantic image editing.
Semantic image synthesis and manipulation is a popular research topic in machine learning, graphics and vision. Thanks to recent advances in deep generative models kingma2013auto,goodfellow2014generative,oord2016pixel and the empirical analysis of deep classification networks krizhevsky2012imagenet,simonyan2014very,szegedy2015going, past few years have witnessed tremendous breakthroughs towards high-fidelity pure image generation radford2015unsupervised,karras2017progressive,brock2018large, text-to-image generation mansimov2015generating,reed2016generative,van2016conditional,odena2017conditional,zhang2017stackgan,johnson2018image, and image-to-image translation isola2017image,zhu2017unpaired,liu2017unsupervised,wang2018high,hong2018learning. As a compact and interpretable representation describing factors of the physical world, semantic attributes farhadi2009describing,kumar2009attribute,parikh2011relative has enjoyed good research attention. In particular, yan2016attribute2image proposed a probabilistic formulation to synthesize diverse and realistic portrait images from visual attributes. choi2018stargan tackled the face image editing problem with a multi-domain image-to-image translation network which treats attributes as additional input condition. Besides attribute-based generation, our work is also relevant to other research work on neural image editing and completion methods zhu2016generative,shu2017neural,li2017generative,sangkloy2017scribbler,xian2018texturegan. Empirical findings in yan2016attribute2image,zhu2016generative demonstrated that a simple linear interpolation on the natural image manifold can produce smooth visual transitions between a pair of input images.
Generating pixel-wise adversarial perturbations has been extensively studied in the past szegedy2013intriguing,goodfellow2014explaining,moosavi2016deepfool,papernot2016limitations,carlini2017towards,xiao2018generating,xiao2018characterizing. The key aspect of the previous work is that a norm constraint on the pixel-wise perturbations is usually adopted in preserving the perceptual realism of the generated adversarial examples. Recently, xiao2018spatially,engstrom2017rotation proposed to spatially transform the image patches instead of adding pixel-wise perturbations, which opens a new challenge on defending against such adversarial attacks. While most of existing work is able to generate adversarial examples that are perceptually realistic, the study on generating semantically meaningful perturbations is relatively under-explored. In contrast, our proposed focuses on generating unrestricted perturbation brown2018unrestricted (e.g., our perturbation is not bounded by the norm) with semantically meaningful patterns guided by visual attributes. Regarding the semantic adversarial perturbations, our work is also related to the concurrent work bhattad2019big which applies semantic transformation in the color or texture space. However, we argue that the proposed is able to generate adversarial examples in a more controllable fashion using visual attributes. We further analyze the robustness of the recognition system by generating adversarial examples guided by different visual attributes.
Semantic Adversarial Examples
Let be a machine learning model trained on a dataset consisting of image-label pairs, where and denote the image and the ground-truth label, respectively. Here, , , , and denote the image height, image width, number of image channels, and label dimensions, respectively. For each image , our model makes a prediction . To simplify the notations in our presentation, we assume the machine learning model is oracle such that holds for every image in the dataset. Given a target image-label pair and , a traditional attacker aims to synthesize adversarial examples by adding pixel-wise perturbations to or spatially transforming the original image such that . In this work, we introduce the concept of semantic attacker that aims at generating adversarial examples by adding semantically meaningful perturbation with a conditional generative model . Compared to traditional attacker that usually produces unstructured pixel-wise perturbations, the proposed method is able to produce structured perturbations with semantic meaning.
Semantic image editing.
Let be an attribute representation reflecting the semantic factors (e.g., expression or hair color of a portrait image) of image , where indicates the attribute dimension and indicates the appearance of -th attribute. Here, our goal is to use the conditional generator for semantic image editing. For example, given a portrait image of a girl with black hair and blonde hair as the new attribute, our generator is supposed to synthesize a new image that turns the girl’s hair from black to blonde. More specifically, we denote the augmented (new) attribute as such that the synthesized image is given by . In the special case when there is no attribute change (), the generator simply reconstructs the input: . Supported by the findings mentioned in bengio2013better,reed2014learning, our synthesized image should fall close to the data manifold if we constrain the change of attribute values to be sufficiently small (e.g., we only update one semantic attribute at a time). In addition, we can potentially generate many such images by linearly interpolating between the semantic embeddings of the conditional generator using original image and the synthesized image with the augmented attribute.
We start with a simple solution (detailed in Eq. Attribute-space interpolation.) assuming the adversarial example can be found by directly interpolating in the attribute-space. Let
be the adversarial attribute vector that used as input to the attribute-conditioned generator. This is also supported by the empirical results on attribute-conditioned image progression yan2016attribute2image,radford2015unsupervised that a well-trained generative model has the capability to synthesize a sequence of images with smooth attribute transitions.x^adv &= G(x, c^adv)
c^adv &= α⋅c + (1 - α) ⋅c^new , where
Alternatively, we propose to interpolate using the feature map produced by the generator .
Here, is the encoder module that takes the image as input and outputs the feature map.
Similarly, is the decoder module that takes the feature map as input and outputs the synthesized image.
Let be the feature map of an intermediate layer in the generator, where , and indicate the height, width, and number of channels in the feature map.
f^adv &= α ⊙G_enc(x, c)
+ (1 - α) ⊙G_enc(x, c^new)
x^adv &= G_dec(f
^adv) Compared to attribute-space interpolation which is parameterized by a scalar, we parameterize feature-map interpolation by a tensor(, where , , and ) with the same shape as the feature map. Compared to linear interpolation over attribute-space, such design introduces more flexibility for interpolating between original image and the synthesized image.
Adversarial Optimization Objectives
As we see in Eq. Document, we find the adversarial image by minimizing the objective with respect to the synthesized image .
Here, each synthesized image is produced by the interpolation using the conditional generator .
In our objective function, the first term is the adversarial metric, the second term is a smoothness constraint, and is used to control the balance between the two terms.
The adversarial metric is minimized once the model has been successfully attacked towards the target image-label pair .
In identify verification, is the identity representation of the target image; In landmark detection, represents certain coordinates.
&= _x^* L(x^*) , by Eq.(Attribute-space interpolation.) and Eq.(Feature-map interpolation.)
L(x^*) &= L_adv(x^*; M, y^tgt) + λ⋅L_smooth(x^*)
In the identity verification task, two images are considered to be the same identity if the corresponding identity embeddings from the verification model are reasonably close.
As we see in Eq. Identity verification., measures the distance between two identity embeddings from the model , where the normalized distance is used in our setting. In addition, we introduce the parameter representing the constant related to the false positive rate (FPR) threshold computed from the development set.
For structured prediction tasks such as landmark detection, we use Houdini objective proposed in cisse2017houdini as our adversarial metric. Specifically, we directly attack the target landmark as the corresponding image is not defined. In addition, is a scoring function for each image-label pair and is the threshold.
where is task loss decided by the specific adversarial target.
Interpolation smoothness .
As the interpolation tensor in the feature-map case has far more parameters compared to the attribute-space case, we propose to enforce a smoothness constraint on the tensor used in feature-map interpolation. As we see in Eq. Interpolation smoothness ., the smoothness loss encourages the interpolation tensors to consist of piece-wise constant patches spatially, which has been widely used as a pixel-wise de-noising objective for natural image processing mahendran2015understanding,johnson2016perceptual.
In this paper, we focus on analyzing the proposed in attacking state-of-the-art face recognition systems. First, face recognition has been extensively studied for decades and the state-of-the-art recognition systems are assumed to be reasonably robust. Second, face recognition has many real-world applications such as (1) face identification for mobile payment and (2) landmark detection for face editing and stylization. Detailed analysis of semantic adversarial examples on such applications allows better understanding of the vulnerability of the current systems. We believe the proposed is a general semantic attack method that is applicable to many other domains as well. The experimental section is organized as follows. First, we analyze the quality of generated adversarial examples and compare our method with a pixel-wise optimization based method carlini2017towards qualitatively. Second, we provide both qualitative and quantitative results by controlling each of the semantic attributes at a time. In terms of attack transferability, we evaluate our proposed on various settings and further demonstrate the effectiveness of our method via black-box attacks against online face verification platforms. Third, we compare our method with the baseline against different defense methods on the face verification task. Finally, we demonstrate that the proposed also applies to the face landmark detection.
We select ResNet-50 and ResNet-101 he2016deep trained on MS-Celeb-1M guo2016ms as our face verification models. The models are trained using two different objectives, namely, softmax loss sun2014deep,zhang2018accelerated and cosine loss wang2018cosface. For simplicity, we use the notation “R-N-S” to indicate the model with -layer residual blocks as backbone trained using softmax loss, while “R-N-C” indicates the same backbone trained using cosine loss. For R-101-S model, we decide the parameter based on the false positive rate (FPR) for the identity verification task. Three different FPRs have been used: (with ), (with ), and (with ). Please check the supplementary materials LABEL:sup:threshold for more details. To distinguish between the FPR we used in generating adversarial examples and the other FPR used in evaluation, we introduce two notations “Generating FPR (G-FPR)” and “Test FPR (T-FPR)”. For the experiment with black-box API attacks, we use the online face verification services provided by Face++ faceplusplus and AliYun aliyun.
We select Face Alignment Network (FAN) bulat2017far trained on 300W-LP zhu2016face and fine-tuned on 300-W sagonas2013300 for 2D landmark detection. The network is constructed by stacking Hour-Glass network newell2016stacked with hierarchical block bulat2017binarized. Given a portrait image as input, FAN outputs 2D heatmaps which can be subsequently leveraged to yield 2D landmarks.
In our experiments, we randomly sample distinct identities form CelebA liu2015deep. To reduce the reconstruction error brought by the generator (e.g., ) in practice, we take one more step to obtain the updated feature map , where in feature-map interpolation. In our experiments, we use the last conv layer before upsampling in the generator as our as feature-map given by the attack effectiveness. We also fix the parameter (e.g., balances the adversarial loss and smoothness constraint in Eq. Document) to be for face verification and 0.001 for landmark detection, respectively. We use Adam kingma2014adam optimizer to produce adversarial examples. We used the StarGAN choi2018stargan for attribute-conditional image editing. In particular, we re-trained model on CelebA dataset liu2015deep by aligning the face landmarks and then resizing images to resolution . In addition, we select 17 identity-preserving attributes as our input condition, as such attributes related to facial expression and hair color. For each distinct identity pair , we perform guided by each of the 17 attributes (e.g., we intentionally add or remove one specific attribute while keeping the rest unchanged). In total, for each image , we generate 17 adversarial images with different augmented attributes. In the experiments, we select a pixel-wise adversarial attack method carlini2017towards (referred to CW) as our baseline for comparison. Compared to our proposed method, CW does not require visual attributes as part of the system, as it only generates one adversarial example for each instance. We refer the corresponding attack success rate as instance-wise success rate in which the attack success is calculated for each instance.
on Identity Verification
We first compare our proposed with the pixel-wise attack method CW both quantitatively and qualitatively. For fair comparisons, we also calculate the instance-wise attack success rate. For each instance with 17 adversarial images using different augmented attributes, if one of the 17 resulting images can attack successfully, we count the attack of this instance as one success, vice verse. The attack success rate using attribute-space interpolation is much lower than the one using feature-map interpolation. We believe the feature-map interpolation adds more flexibility compared to attribute-space interpolation with more parameters to optimize. In addition, we found that the attack success rate is higher with a larger G-FPR. Figure Overall analysis. shows the generated adversarial images and corresponding perturbations against R-101-S of and CW respectively. The text below each figure is the name of augmented attribute, the sign before the name represents “adding” (in red) or “removing” (in blue) the corresponding attribute from the original image. We see that is able to generate perceptually realistic examples guided by the corresponding attribute. In particular, is able to generate perturbations on the corresponding regions correlated with the augmented attribute, while the perturbations of CW have no specific pattern and are evenly distributed across the image.
Analysis: controlling single attribute.
One of the key advantages of is that we can generate adversarial perturbations in a more controllable fashion guided by the semantic attributes. This allows to analyze the robustness of a recognition system against different types of semantic attacks. We group the adversarial examples by augmented attributes in various settings. In Figure Overall analysis., we present attack success rate against two face verification models, namely, R-101-S and R-101-C, guided by different attributes. We highlight the bar with light blue for G-FPR equals to and blue for G-FPR equals to , respectively. As we see in this figure, with a larger G-FPR , our can achieve almost 100% attack success rate across different attributes. With a smaller G-FPR , we find that guided by some attributes such as Mouth Slightly Open and Arched Eyebrows achieve less than 50% attack success rate, while the other attributes such as Pale Skin and Eyeglasses are relatively less affected. In summary, we found that guided by attributes describing the local shape (e.g., mouth, earrings) achieve relatively lower attack success rate compared to attributes relevant to the color (e.g., hair color) or entire face region (e.g., skin). This suggests that the face verification models used in our experiments are more robustly trained in terms of detecting local shapes compared to colors. Figure Overall analysis. shows the adversarial examples with augmented semantic attributes against R-101-S model. The attribute names are shown in the bottom. The upper images are generated by StarGAN with augmented attribute where the lower images are the corresponding adversarial images with the same augmented attribute.
Analysis: semantic attack transferability.
To further understand the property of , we analyze the transferability of on various settings. For each model with different FPRs, we select the successfully attacked adversarial examples from Section Document to construct our evaluation dataset. We evaluate them on different models. Table Analysis: semantic attack transferability. illustrates the transferability of among different models by using the same FPRs (G-FPR = T-FPR = ). Table Analysis: semantic attack transferability. illustrates the result with different FPRs (G-FPR = and T-FPR = ) for generation and evaluation. As we see in Table Analysis: semantic attack transferability., adversarial examples generated against models trained with softmax loss exhibit certain transferability compared to models trained with cosine loss. We conduct the same experiment by generating adversarial examples with CW and found it does not have transferability compared to our (results in supplementary materials LABEL:sup:trans). Surprisingly, as we see in Table Analysis: semantic attack transferability., the adversarial examples generated against model with smaller G-FPR= exhibit strong attack success rate when evaluating on the model with larger T-FPR=. Especially, we found the adversarial examples generated against R-101-S have the best attack performance on other models. These findings motivate the analysis of black-box API attack detailed in the following paragraph.
Black-box API attack.
In this experiment, we generate adversarial examples against R-101-S with G-FPR=. We evaluate our algorithm on two industry level APIs, namely, Face++ and AliYun face verification platform. To demonstrate the effectiveness of our method, we also generate pixel-wise adversarial examples by using CW method with the same settings. As Table Black-box API attack. shows, our method achieves much higher attack success rate than CW among both APIs and all FPR thresholds (e.g., our adversarial examples generated with G-FPR ¡ achieves 64.63% attack success rate on Face++ platform with T-FPR=).
|Attacker Evaluation Metric||T-FPR =||T-FPR =||T-FPR =||T-FPR =|
|CW (G-FPR = )||16.24||3.55||4.50||0.00|
|(G-FPR = )||27.32||9.79||7.50||2.00|
|CW (G-FPR = )||30.61||15.82||12.50||4.50|
|(G-FPR = )||57.22||38.66||29.50||17.50|
|CW (G-FPR ¡ )||41.62||24.37||19.00||12.00|
|(G-FPR ¡ )||64.63||42.69||35.50||22.17|
To measure the perceptual realism of the adversarial images generated by , we conduct a user study on Amazon Mechanical Turk (AMT). In total, we collect annotations from 77 participants. In of trials the adversarial images generated by are selected as realistic images and in of trails, the adversarial images generated by CW are selected as realistic images. It indicates that our semantic adversarial examples are more perceptual realistic than CW.
against Defense Methods
We evaluate the strength of the proposed attack by testing against four existing defense methods, namely, Feature squeezing xu2017feature, Blurring li2017adversarial, JPEG dziugaite2016study and AMI tao2018attacks. For AMI tao2018attacks, we first extract attribute witnesses with our aligned face images and then leverage them to construct attribute-steered model. We use fc7 of pretrained VGG parkhi2015deep as the face representation. AMI yields a consistency score for each face image to indicate whether it is a benign image. The score is measured by the cosine similarity between the representations from original model and attribute-steered model. With false positives on benign inputs, it only achieves detection accuracy for and detection accuracy for CW. Figure User study. illustrates is more robust against these defense methods comparing with CW. The same G-FPR and T-FPR are used for evaluation. Under the condition that T-FPR is , both and CW achieve high attack success rate, while marginally outperforms CW when FPR goes down to
. While defense methods have proven to be effective against CW attacks on classifiers trained with ImageNet krizhevsky2012imagenet, our results indicate that these methods are still vulnerable in face verification system with small T-FPR.
on Landmark Detection
We also evaluate the effectiveness of on Face Landmark Detection, which is structure based prediction model. We select two attack targets, “Rotating Eyes” and “Out of Region”. “Rotating Eyes” means we rotate the coordinates of the eyes in the image counter-clockwise by 90; “Out of Region” means that we set a target bounding box and push all points out of the box. The evaluation metrics and overall attack success rate among different attributes have been shown in the supplementary materials LABEL:sup:landmark. We find that our method can also attack landmark detection successfully. Figure Document illustrates the adversarial examples on landmark detection.
In this paper, we presented a novel attack method called capable of generating structured adversarial perturbations guided by semantic attributes. Compared to existing methods, our proposed method works in a more controllable fashion, which allows for detailed analysis. Experimental evaluations on face verification and landmark detection demonstrated several nice properties including attack transferability and attack effectiveness against existing defense methods. We believe this work could potentially open up great research opportunities and challenges in the field of adversarial learning in the long run.