Query-Free Attacks on Industry-Grade Face Recognition Systems under Resource Constraints

02/13/2018 ∙ by Di Tang, et al. ∙ Indiana University Bloomington The Chinese University of Hong Kong 0

To attack a deep neural network (DNN) based Face Recognition (FR) system, one needs to build substitute models to simulate the target, so the adversarial examples discovered could also mislead the target. Such transferability is achieved in recent studies through querying the target to obtain data for training the substitutes. A real-world target, likes the FR system of law enforcement, however, is less accessible to the adversary. To attack such a system, a substitute with similar quality as the target is needed to identify their common defects. This is hard since the adversary often does not have the enough resources to train such a model (hundreds of millions of images for training a commercial FR system). We found in our research, however, that a resource-constrained adversary could still effectively approximate the target's capability to recognize specific individuals, by training biased substitutes on additional images of those who want to evade recognition (the subject) or the victims to be impersonated (called Point of Interest, or PoI). This is made possible by a new property we discovered, called Nearly Local Linearity (NLL), which models the observation that an ideal DNN model produces the image representations whose distances among themselves truthfully describe the differences in the input images seen by human. By simulating this property around the PoIs using the additional subject or victim data, we significantly improve the transferability of black-box impersonation attacks by nearly 50%. Particularly, we successfully attacked a commercial system trained over 20 million images, using 4 million images and 1/5 of the training time but achieving 60% transferability in an impersonation attack and 89% in a dodging attack.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With its commercial success, Deep Learning (DL) based Face Recognition (FR) is haunted by the security risks posed by the adversary who have already been adaptive to the AI innovation. Prior research shows that

adversarial examples can be found to mislead even the state-of-the-art recognition algorithms [4, 7, 27, 29], causing them to misclassify these examples. More specifically, such adversarial examples are images derived from adding perturbation on normal images, for the purpose of inducing classification errors while maintaining the level of changes low so they can appear less distinguishable from the original images by humans. Indeed, a recently approach [2] alters merely 16 pixels to ensure misclassification on 3232 images.

Attacking a straw-man. On the other hand, such adversarial learning risks need to be put into perspective. Still we are less clear how realistic the discovered threats could be, given that most of them are reliant on a white-box assumption about the target (the FR system they aim at), that is, the availability of full information about the target’s parameters. In practice, however, an industry-grade system’s parameters are often commercial secret and cannot be easily acquired by unauthorized parties.

A more realistic way to understand a DL system’s security properties is the black-box approach, in which the adversary queries the target, utilizes the features inferred through the queries to learn a substitute model and then searches for the adversarial examples that also work on the target model. Such an approach is based upon transferability of adversarial examples across different models [13]: some examples mislabeled by one DL model are also found to be misclassified by another. A direct attempt to transfer adversarial examples through an ensemble learning [13] was found to be less effective. To ensure a high transferability, more recent approaches aggressively query the target to obtain adequate input-output samples for accurately simulating the target model. As a prominent example, a recent black-box attack needs to interact with the target for at least 1,000 times [17].

With all the progresses being made, a big gap still exists between hypothetic attacks proposed and credible threats with practical impacts. Particularly, querying security-critical FR systems is often expensive or even infeasible in practice: e.g., an FR ATM can immediately alert a card holder to a potential fraud once an impersonation attempt fails, making further probes less likely to continue. Another problem of prior transferability studies is the simple dataset used to train their models, e.g., the MNIST database 

[12] includes only tens of thousands of images for recognizing ten handwritten digits. A real-world FR system, however, is typically trained over tens or even hundreds of millions of images for identifying millions of identities. Less clear is whether what is learned from such small-scale studies over toy examples is indeed applicable to real FR systems.

Cross-class transferability. To better understand the security guarantee of real-world FR systems, we revisited transferability in our research, assuming that the adversary cannot get any feedback from the target model and has limited resources. In our study, we trained multiple common deep neural networks (DNN), including VGG, GoogLeNet and ResNet, and evaluated the transferability of adversarial examples across these models using the standard ensemble learning based attack reported in the prior research [13], under various settings (shadower substitute networks, different structures and fewer data) to simulate a resource-constrained attacker. This research sheds new light on transferability: e.g., for ResNet, the transferability from a 50-layer substitute to a 101-layer target is about 16.8%, compared with 24.6% between the 101-layer substitute and the same target, in an impersonation attack. More interesting is the significant impact of training data sizes: the transferability has dropped from 24.6% in an impersonation attack to 14.5% when the substitute was trained on a dataset one order of magnitude smaller than that of the target, and further to 7.1% for a training set two orders of magnitude smaller. Intuitively, substitutes learned with fewer data or a shallower model would have a looser boundary and thus is less likely to ensure a misclassification on the target (which is better trained with more data). Overall, we only witnessed a limited success on transferability, particularly when it comes to the impersonation attack: about 20% under different settings.

Our work. To attack an industry-grade FR system without querying it, a set of high-quality substitute models need to be built to find common

defects of the DNN models similar to or even better trained than the target model. However, constructing such substitute models is hard, particularly with limited resource. In our research, we studied what the adversary could do to narrow this gap and enhance his odds of success. A unique observation we have is that even though the target model generally has a more precise decision boundary, the substitute model could still

partially approach this boundary in some regions: for example, a criminal may leveraging a large number of his own and his victim’s photos to boost the substitute model’s accuracy with regard to the identification of (just) these two individuals, for the purpose of finding the right makeup to cheat an FR ATM into authenticating him as the victim. This attack, which we call Asymmetric Cross-Class Image Transfer or EXCIT, is found to be completely feasible in our research, due to a new property called Nearly Local Linearity (NLL) discovered in our study.

More specifically, under a well-trained DL model, the difference between a pair of images’ representations

(i.e., the Cosine distance between the embedding vectors produced by the DL model) should be nearly linear to their similarity as seen by human eye: in other words, when these images become increasingly dissimilar, the difference between their representations grows large proportionally. This NLL property, as discovered in our research, can be approximated during the training process of our substitute models: using our synthesized additional images of victim and the attacker himself, we minimize the gap between the distances across different representations produced by the model and what are supposed to be according to NLL. We found that such a model can effectively simulate a better-trained target model’s behaviors around the images of interest to the attackers. (For the sake of simplicity, in the following article, we call these images as the Points of Interesting or PoIs.)

In our research, we implemented EXCIT and, first, evaluated it under the settings of our transferability study. We observed that the new technique vastly enhanced the effectiveness of the attacks, particularly for impersonations, from 20% (based upon the prior attack [13]) to 50%, even when the adversary only used 10% of the training data and half of the layers (thus saving the training time by 5 orders of magnitude). Further we ran this approach against industry-grade systems including ColorReco, Facevis, Face++ and SenseTime (the SenseTime system trained over tens of millions of photos). Using 4 million images collected from the web (the largest scale for this type of research), EXCIT was found to significantly elevate the chance of successful cross-model attacks compared with the naive query-free attack [13], from 11% to 62%, without any communication with the target model before the attack.

Contributions. The contributions of the paper are outlined as follows:

The NLL property and understanding of transferability. Our large-scale study reveals a fact that the training data size can have on the successful transferring of an adversarial instance from one model to another. More importantly, we discovered the nearly linear relation between input images and their representations (in terms of their differences) under an ideal model, which enables our query-free attack and might lead to better understanding of the fundamental defects in DL models.

New techniques for query-free attacks. Based upon the new discovery, we designed a new attack technique that finds adversarial examples against a well-trained target model without querying the target and using limited resources. At the center of the technique is leverage of additional synthesized images of victim and attacker and the NLL property to train substitute models that are capable of simulating the target model around PoIs, even when the adversary only possess a small amount of training data and much less computing resources. This makes an important step toward understanding the realistic threat of adversarial learning.

Implementation and evaluation. We implemented the technique and evaluated it over industry FR systems.

Ii Background

Ii-a Deep Learning and Face Recognition

Deep Neural Network. Deep Neural Network (DNN) is a function that projects the input domain onto an output domain for classification and other purposes. Following the prior research [15], we formulate the DNN for image processing as below:

where is the image serving as the input to the DNN,

is its output, typically a vector of probabilities for the image to be in different classes, and

is the “logits”, the output of the layer right before the “softmax” layer, and, in other words,

is the unscaled probability vector serving as the inputs to the “softmax” layer.

In all our concerned DNN structures (VGG, GoogLeNet and ResNet), the can be further decomposed as:

where is the feature vector of the input extracted by our DNN model. Specifically, the DNN projects onto a feature space and is the representation of the in that space. is the classification function that transforms into “logits”. Usually, is a linear mapping function and has the form of

where is the weight matrix and

is the bias vector.

A well-trained DNN is characterized by its capability to generate similar representations for similar inputs. This avoids the pitfall when two similar inputs actually are mapped to two very different representations and as a result, are assigned into two different classes. Note that in our research, the similarity between two representations is measured by the cosine distance between them.

Face recognition systems

. Since the introduction of deep Convolutional Neural Networks (CNN) 

[11], FR technologies have been evolving rapidly. As a prominent example, DeepFace [28] close the gap between the recognition capabilities of human beings and machines. Further, DeepID3 [25] attained a 99.53% accuracy on the LFW dataset [6] that exceeds the human performance, 99.2%. More recently, FaceNet [22] exploited a deep architecture to achieve a 99.63% accuracy on the same dataset.

More generically in the image processing area, three DNN models have been extensively used. VGG-16 [24]

running 16 cascaded convolution layers was reported to achieve state-of-the-art recognition results in the ImageNet Large-Scale Visual Recognition Challenge 2014 

[20] (ILSVRC-2014), together with GoogLeNet [26]

, which involves 22 layers and Inception architectures invented by Google for combining information from multi-views. Empowered by the pervasiveness of GPU and Batch Normalization technologies 

[8], ResNet-152 [5] winning the ILSVRC-2015 classification task is armed with 152 layers and capable of transferring shadow features to deep layers.

Ii-B Adversarial Learning

The potential of deploying DNN to real-world systems (e.g., self-driving cars) faces the security challenges of adversarial learning, an attack that manipulates the inputs to a DNN to cause misclassification. This attack was first discussed by Szegedy et al. [27], who pointed out the existence of adversarial examples, i.e., perturbed input is similar to the original input but misclassified by the DNN into a different category. Such attacks can be targeted or not. In the non-targeted case, the attacker seeks adversarial examples that are misclassified into any categories except the one they belong to. For instance, the adversary wants to fool the face recognition system by slightly changing his face and making it is misclassfied as other people. Formally speaking, the attacker changes facial appearance from to , and causes the missed classification result: . Here, the DNN output is a vector that describes the probabilities for the input belonging to different individuals. In a targeted attack, the adversary intends to impersonate a given individual , by seeking a makeup causing . As we focus on FR problems, we will use dodging attacks to represent non-targeted attacks and impersonation attacks to represent targeted attacks.

Attack methods. To find adversarial examples, people need to define the similarity between two images (the inputs), and , based upon a distance metric. Prior research on adversarial learning uses the distance, with being 0, 2 or :

Here is the subtraction between the -th pixel of two input images.

Minimizing the distance, we can get with the smallest number of pixels differing from those on the original input . The Jacobian-based Saliency Map (JSMA) [18] is an attack optimized under the distance. It iteratively picks pixels that have the most impact on the results and modifies them, until either a given threshold (an upper bound for the number of pixels) is reached or an adversarial example is found.

Minimizing the distance, we can obtain that has the least modification, in terms of Euclidean distance, across all pixels on and . The first attempt using this distance is L-BFGS [27] that minimizes the distance under the box-constraint, i.e., , where is the number of pixels. It exploited the classical gradient descend method to find the optimal solution with a pre-defined learning rate :

(1)

Minimizing distance, we can find with the smallest maximum-changes to the pixels. Under this distance, the optimization algorithm seeks a region of pixels with similar intensities to modify. An example of the prior attack is Fast Gradient Sign Method (FGSM) [4], which iteratively updates

to produce an adversarial example by stepping away a small stride along with the direction of

.

In our research, we chose distance, because it is a continuous metric and also sensitive to the change that happens to any pixels in the input images. By comparison, (number of different pixels) is not continuous and (maximum pixel difference) might not capture a small modification on the input.

Transferability. As mentioned earlier, transferability is the key to practical adversarial learning, when the adversary cannot directly access the internal parameters of the target model. Prior research [13] demonstrates that around 20% adversarial examples discovered from one of the three models (ResNet-152, VGG-16 and GoogLeNet) are also misclassified by other two models under a dodging attack. A more recent study [16]

further shows that transferability can happen even across different machine learning techniques: DNN, Logistic Regression (LR), Support Vector Machine (SVM) and Nearest Neighbors (kNN). Particularly, more than 60% of adversarial examples discovered in LR or SVM were found to be still effective on the other model. When it comes to the impersonation attack, also 20% adversarial examples were reported to work across different DNN models 

[13]. These examples were found using an ensemble-based approach that descends along the summation of the gradients of several models.

A primary limitation of these prior studies is that they are all based upon “less-categories” datasets, such as ILSVRC-2012 including 1000 categories. Compared with the industry-grade FR systems such as Facevisa, ColorReco, which are trained to classify tens or even hundreds of thousands of identities, what has been learned from these studies can be less conclusive. Also importantly, the prior research either considers that the substitute models are built upon similar or even identical datasets as the target, or at the very least, assumes that the adversary is capable of continuously querying the target model to collect data (query results) for training the substitute models. As discussed earlier, in many cases, these assumptions are still a far from reality. Our research instead looked into the transferability over a large dataset, when the adversary cannot query the target and has limited resource to train his substitute models.

Ii-C Threat Model

We consider an adversary that intends to perform a dodging attack or a impersonation attack on a target FR model that he cannot query. The adversary does not have access to the internal parameters of the target but has limited information about its architecture (e.g., ResNet, VGG or GoogLeNet) and its depth (e.g., about 100 layers for ResNet, though the precise number of layers is still unknown to him). All such information about a commercial system is often made available through various public sources, such as research papers (e.g., the design of Face++ was described in the paper [3]), technical reports and other online documents.

The target model studied in our research is assumed to be trained over a large amount of data, tens or even hundreds of millions of images, as those commercial FR systems are. On the other hand, the adversary does not have that level of resources, though we do assume that he can still acquire millions of images publicly available online, as we did in the study. Further, the adversary can obtain thousands of images of himself and the victim he want to impersonate, and also sufficient resources from the cloud to train the substitute model over the data. We believe that these assumptions are all realistic, as demonstrated in our research: particularly, all the computing power required for training our attack model can be purchased from Amazon at an approximate cost of 10,000 dollars. Specially, if the adversary can not obtain sufficient images of the victim from the Internet, they can follow the victim and record videos to get enough images that are taken in various scenarios and from different angles.

Iii Understanding Transferability across Asymmetric Models

To understand whether a DNN model is vulnerable to query-free attacks, we need to find out the challenges in simulating the target model’s behaviors, under the limited resources and information. For this purpose, we conducted the largest study on transferability, using a dataset with 4 million images. Our research reveals the importance of training data size to a successful cross-model attack.

Iii-a Settings

Our study utilized MegaFace Challenge 2 [14], a dataset including 672K identities and their above 4 million photos, and Caffe [9], an open-source deep learning framework, to train FR DNN models in our experiments. All such experiments were conducted on a 8-GPUs server with each GPU armed with 12GB memory.

In our studies, we assume our target model is and it outputs a vector , for a input image . Our study covers both dodging attacks and the impersonation attacks. The criteria for a successful dodging attack is:

where is the owner of , . The criteria for a successful impersonation attack is:

where is the victim to be impersonated and we ensure . For the better understanding, during later paragraphs, we may use “subject” to indicate the one whose identity is and “victim” to indicate the one whose identity is .

In following studies, we will use 4 levels of training dataset to train our models. For the sake of clarity, we list them in Table I.

Short name Images Identities
L1
L2
L3
L4
TABLE I: Different levels of dataset.

Besides, we say the transferability between two models is which means there are adversarial examples found on one model that can successfully fool both two models. Sometimes we may use “success rate” to replace transferability.

In this study, we use C&W approach [2], the best method as we know, to find adversarial examples. It optimizes the following objective function:

For the dodging attack, is defined as

For the impersonation attack, becomes

Here, the adversarial example we found is , where is the optimal solution of above function. In that function, is a parameter that balances the importance of two components, the first component minimizing the distance between the adversarial example and the original image , the second component modeling the goal of this attack, either dodging or impersonation. Also, is a threshold indicating when the attack goal is achieved. We set , for both the dodging attack and the impersonation attack in our experiments.

And we further improve the performance of above function by exploiting the standard ensemble-based approach. Specifically, we will use substitutes and assemble them to solve the following function:

(2)

where is the -th substitutes.

Iii-B Impacts of Structural Features

Structures. As mentioned earlier, to understand the impacts of DNN structures on transferability, we looked into the three most prominent structures: VGG111https://github.com/davidgengenbach/vgg-caffe, GoogLeNet 222https://github.com/BVLC/caffe/tree/master/models/bvlc-googlenet and ResNet333https://github.com/KaimingHe/deep-residual-networks. In this study, for every structure, we trained a target model on a dataset and four substitutes on four datasets. And, we ensure that these five training datasets are not overlap with each other except for the images of subjects and victims involved in attacks.

Over these models, we analyzed the transferability of the dodging and impersonation attacks, using an ensemble learning method that integrates adversary examples found in 4 substitutes trained independently to find images causing the target model to misclassify. More specifically, in dodging attacks, we built an attacking set containing 635 images from 100 identities, and in impersonation attacks, we use 600 image-pairs to construct the attacking set. Each image-pair contains two images from two different identities. To be noticed that the attacking set were inside the training sets of both the target model and substitute models.

The experiment results are presented in Table II. As we can see from the table, for dodging, we observed a transferability about 95%, and for impersonation, it became 20%. This finding is pretty much in line with what is reported in the prior research, indicating that transferring adversarial examples across different models are feasible, though less effective in the case of impersonation.

ResNet-101 GoogLeNet VGG-16
Dodging Impersonation Dodging Impersonation Dodging Impersonation
ResNet-50 95.1% 16.8% - -
ResNet-65 95.6% 18.0% - -
ResNet-80 96.3% 23.2% - -
ResNet-101 98% 24.6% 96.7% 20.2% 96.4% 20.3%
GoogLeNet 95% 16.5% 97.8% 23.1% 94.6% 18.5%
VGG-16 93.4% 17% 94.4% 18.1% 97.2% 22.3%
TABLE II: Transferability among different structures. The number in every cell is the success rate corresponds to that using four substitute models with row structure to attack the target model with the column structure.

Depths. Further we looked into the impacts of depths on transferability. For this purpose, we utilized ResNet, since the depth of its structure can be easily adjusted. More specifically, we built 4 ResNets with 50, 65, 80 and 101 cascaded convolutional layers respectively. The compositions of their structures are presented in Table II. Using these structures, we also trained models to repeat experiments that have been done to inspect the impact of different structures (four substitute models trained on four L2 training sets to attack the target model trained on L2 training set).

ResNet-50 ResNet-65 ResNet-80 ResNet-101
Stage 1 1 Conv 1 Conv 1 Conv 1 Conv
Stage 2 3 Blocks 3 Blocks 3 Blocks 3 Blocks
Stage 3 4 Blocks 4 Blocks 4 Blocks 4 Blocks
Stage 4 5 Blocks 10 Blocks 15 Blocks 22 Blocks
Stage 5 3 Blocks 3 Blocks 3 Blocks 3 Blocks
TABLE III: Structures of different depths. ResNet structures can be divided into 5 stages, starting with a convolution layer, followed by four stages, each including a different number of “bottleneck” blocks.

In this study, we ran those models as substitutes to attack the target (ResNet-101), both dodging and impersonation. As expected, the complexity of the network (its depth) indeed affects transferability: more layers make the DNN more capable and enhance transferability. Again, transferability tends to be low for the impersonation attack, around 16% when using ResNet-50 to attack ResNet-101.

Iii-C Impacts of Data Size

Training size and transferability. An important observation is that a real-world adversary typically cannot get as many photos as a large organization uses to train its industry-grade FR system. An important question we were asking is what impacts a relatively smaller dataset could have on the chance of a successful cross-model attack. For this purpose, we trained VGG, GoogLeNet and ResNet-101 models on three levels of datasets: L1, L2 and L3. Again, all these individuals were randomly drawn from our dataset and we made sure that there was no overlap across substitutes’ training set and target model’s training set. In this study, we utilized substitute models of the same structure with the target model for both the dodging attack and impersonation attack. The target model is built upon a L3 training set. The results are presented in Table IV. As we can see here, training data size turns out to significantly affect transferability and such an influence is also consistent across different structures. Compared with the structural impacts, transferability became lower when we reduced the data size from L3 to L2. Compared with the impact of depth, the attack was less likely to succeed when we downsized the size of training set (L3 to L2) than when we removed layers (from 101 to 50).

L3 dataset L2 dataset
Dodging Impersonation Dodging Impersonation
ResNet-101 L3 dataset 98.8% 25.2% -
L2 dataset 81.3% 17.5% -
L1 dataset 34.5% 7.1% 71.2% 14.5%
GoogLeNet L3 dataset 97.3% 24.1% -
L2 dataset 79% 16.2% -
L1 dataset 32% 5.1% 69.4% 13.3%
VGG-16 L3 dataset 97.5% 23.9% -
L2 dataset 77% 16.3% -
L1 dataset 32.3% 6.3% 66.7% 12.9%
TABLE IV: Transferability among different levels of training set. The number in every cell is the success rate corresponds to that using four substitute models trained on row settings to attack the target model trained on column settings.

Further, we trained four substitute models on four L1 training sets and use these substitute models to attack a model that was trained on L2 dataset. This experiment’s results are illustrated in the right column of Table IV. An interesting observation is that the transferability from L1 models to L2 models actually is lower (14.5%) than that for L2 to L3 (17.5%), even though the difference in the training sets is even larger in the latter case ( images) than the former ( images). Intuitively, with the increase in training set, a substitute model becomes closer to a perfect model and adding more data then be less effective in improving the model’s precision than the time when the model only learns from a small size of training set and therefore much less accurate. Further analysis of the observation leads to the conclusion that the enhancement of transferability will slow down when the data size goes up (see Appendix A).

Discussion. Our study shows that although both structural features and the size of training set affect transferability, apparently the impact of the latter is more prominent. In practice, the structural information of many commercial FR systems can often be found, from research articles, public paper and other sources. On the other hand, a deeper network with more data certainly need more computing resources to train. For example, on our system with 8 GPUs, training a ResNet-101 model took 9 hours for a L1 training set, while only half of the time was needed for a 50-layer model over the same set. Most importantly, collecting a large number of high-quality images is often a challenge for the adversary: for example, SenseTime Ltd’s model is reported to be built from above 20M images and the dataset of this scale could not be found on the Internet, up to our knowledge. Therefore, we believe that whether transferability could be enhanced in the presence of a relatively small set of training data is critical question for assessing the practical impacts of adversarial learning on FR systems.

Also to attack a real-world system without querying it, the adversary needs to estimate his chance of success based upon the features of a given adversarial example, for example, the percentage of the pixels modified. This can

not be easily done since the adversary does not have access to the target system and therefore cannot figure out the probability of success by testing his adversarial examples on the target. However, our study described above shows that the transferability between the substitute models and the target model can actually be gagged using the transferability between a target model learned from a smaller dataset and smaller substitute models. This is because the probability of success in the latter case is expected to be higher than that in the former. A more formal analysis of this observation is presented in Appendix A.

Iv Query-Free Asymmetric Attack

To enhance transferability, ideally we need to make the substitute model very similar to or even more accurate than the target model. Although this is nearly impossible for most real-world adversaries, given their limited resources, particularly a much smaller training set they are able to obtain, still something can be done to narrow the gap between the two models. A key observation is that a unique resource the adversary often has is abundant photos of the subject (often himself) in a dodging attack and also those of the victim in an impersonation attack. Leveraging such images, we could train a model biased toward the subject or the subject and victim pair. Even though such a model may be overfitting and therefore its overall accuracy could may be below that of the target model, all we care about here is just the target model’s behavior around the subject and/or the victim (PoIs), which we could potentially simulate in the substitute models using this resource (extra photos).

However, effective use of such resource turns out to be challenging. Table V shows the experimental results when we directly duplicate those photos of subjects and the victims to the same size with the original dataset (L2 level). And use the duplicated images and original images to enhance the transferability under the VGG, GoogLeNet and ResNet models in attacking the target models trained over L3 dataset. From the table, we do not see a significant improvement in the effectiveness of the attack, compared with those without such data augmentation.

Dodging Impersonation
ResNet-101 83%(81.3%) 18.2%(17.5%)
GoogLeNet 80.2%(79%) 15.8%(16.2%)
VGG-16 77.3%(77%) 16.8%(16.3%)
TABLE V: Transferability on naively augmented training set. Use four L2 substitutes to attack L3 model. The number in the bracket is the original transferability copied from Table IV.

Intuitively, the subject and victim’s photos alone are insufficient for simulating the relations established by a better trained model between them and between the subject and other identities in the dataset. Such relations need to be built upon other images and the way a well-trained DNN maps the input to feature vectors. Following we show how such relations can be modeled using a new property discovered in our research, called Nearly Local Linearity (NLL), the key technique behind our EXCIT attack, which helps improving the substitute model for simulating the target model’s behaviors around the subject and victims, boosting the transferability from below 30% (for impersonation) to above 60% on commercial systems (Section V-C).

Iv-a Nearly Local Linearity

A well-trained model will produce more accurate results than the poor-trained model. However, this property is useless for obtaining better transferability. Thus, we need more delicate findings.

Observation. The key idea is that the representations produced by an ideal DNN model should be accurately model the human perceptions: when two images look very different, the distance between the representations should be large, and when the images appear to be similar, the distance should become small. So we built experiments to figure out how the representation of a well-trained DNN model changes during the procedure changing the input image from the subject to the victim. And, to measure the changing, right metrics need to be chosen. In our research, we found that (as used in the prior research [2]) and Cosine distances can serve these purposes.

In our research, we trained three ResNet-101 models on three different datasets: L1, L2 and L4. From each dataset, we selected, uniformly at random, pairs of images , with the images from two different identities (identity and identity ) in one pair. Then between each image pair

, we synthesized a series of 99 images by equidistant interpolation. Formally, the

-th image can be represented as:

Specially, and .

Then, we ran all three models on these interpolated images to get their representations. Altogether, representations were produced from the image pairs. Further we calculated the mean for the Cosine distances between every and as:

where

After that, we compare with the corresponding regularized distance . The results are shown on Fig 1.

(a) for models of , and respectively.
(b) for models of , and respectively.
(c) for models of , and respectively.
Fig. 1: NLL property: From left to right, there are 3 figures showing the results of ResNet-101, VGG-16 and GoogLeNet models respectively.

To more clearly demonstrate the different behaviors of different models, on Fig 1, we also listed every model’s , the metric to measure the difference between the mean Cosine distance curve and the diagonal :

As we can see from the figures, the relation between the distance and the Cosine distance approaches linear with the increase of the training data size. Particularly, it becomes almost linear for the ResNet-101 model trained on L4 dataset (4 million images of identities), with . Under the identical experimental settings, we observed the same and Cosine distance relation in the VGG-16 and GoogLeNet models. This indicates that the relations between the subject and interpolated images, and between the representations of them can be captured by this NLL property.

Concept. Formally, we define the Nearly Local Linearity (NLL) as follows:

(3)

where

Iv-B The EXCIT Attack

The discovery of the Nearly Local Linearity property in a well-trained DNN model enables us to train a substitute model by not only leveraging the extra images of subjects and victims but also integrate such images into the model by approximating the relations between synthesized images from those images and the existing images in the dataset. For this purpose, we need to build a set of “transitional” images as those interpolated images mentioned earlier, and redefine the optimization goals when training the substitute model to connect these images to others in an expected way. These are the key steps for our EXCIT attack, as elaborated below.

Subject-oriented data augmentation

. To synthesize “transitional” images and enrich the training dataset, we designed a subject-oriented data augmentation algorithm (see Algorithm 1). This algorithm follows a key rule is to keep balance. Detailedly, the algorithm keeps balance in three levels: first, it keeps the total number of synthesized images being similar with the number of original images; second, it keeps the number of synthesized images between every image-pairs being the same; third, it uses the uniform distribution to control the generation of

. Thus, adversaries can use limited images to synthesize as many as possible behaviors of a well-trained DNN model.

In our algorithm, again, is the subject and is the victim. For the sake of simplicity, we set to represent the dodging attack. Our algorithm takes the original dataset , a subject-victim pair and the number of images needed to be synthesized between one image-pair as its input and outputs the augmented dataset . In the rest of the paper, we keep being 10.

Input: , ,
Output:
1 ;
2 ;
3 ;
4 if  then
5       ;
6      
7 end if
8else
9       ;
10      
11 end if
12pairs = [];
13 for  to  do
14       randomly select a from ;
15       randomly select a from ;
16       pairs.append((,)) ;
17      
18 end for
19for  to  do
20       randomly select a from ;
21       randomly select a from ;
22       pairs.append((,)) ;
23      
24 end for
25for  (,) in pairs do
26       for i = 1 to m do
27             Sample from ;
28             ;
29             ;
30            
31       end for
32      
33 end for
34;
Algorithm 1 Subject-oriented data augmentation algorithm.

Training NLL-enhanced substitutes. With enriched data, we want to train a substitute model to approximate the NLL property for a given identity-pair (the subject and victim pair). For this purpose, we first train our substitute model on the original dataset and fine-tune the substitute model on the augmented dataset . And the find-tuned model is expected to be closer to or even surpass a better-trained model around the PoIs, thus, which will elevate the transferability of the adversary example our discovered.

Naturally, the standard “softmax” function was used as the objective function to train our substitute models on the original dataset. On the augmented dataset, we built a triplet loss function according to the NLL: for a tuple

, we set as:

where

However, can not be used alone, as it just control the relations among , and but dose not give any constraints about the absolute values of , and neither the relations among different in different tuples. So using only the triplet loss may break some valuable structures that have been learned from the original dataset.

To solve this problem, we study what the should be when the follows the NLL property. Here, we directly give the conclusion and left the detailed deduction in our Appendix B. For a tuple , a well-trained DNN model will produce in the following form:

(4)

To determine the value of , we tested three ResNet-101 models that trained on L2, L3 and L4 datasets respectively. The results are shown on Fig 2. According to the results, we choose . The curve of when is also shown on Fig 2.

Fig. 2: DNN output’s patterns: the y-value is the value of the -th element in the output vector of DNN models.

Combining triplet loss and the results about the expected output of a well-trained DNN, we obtain the objective function for fine-tuning our substitute models as:

(5)

where

Here,

is the Kullback–Leibler divergence between

and . And, indeed, can be simplified to contain just two terms about the -th and -th elements of the output vector.

Finding adversarial examples. After generating multiple substitute models enhanced by the NLL property, we need to effectively assemble them to find transferable adversarial examples. The first question is how to assemble the gradients. The standard approach is to average the gradients discovered from individual substitute model (Eq 2). However, we found a way to outperform the standard approach by computing a weighted average gradients and clip it according to the agreement. The details are listed in Algorithm 2.

Input: , ,
Output:
1 ;
2 ;
3 for k = 1 to  do
4       ;
5       ;
6      
7 end for
8;
9 for i = 1 to Dim(w) do
10       ;
11       ;
12       ;
13      
14 end for
15;
16 for i = 1 to Dim(w) do
17       if  then
18             ;
19            
20       end if
21      
22 end for
Algorithm 2 Gradients assembling algorithm.

Based on previous knowledge, if an adversarial example which representation is very close, in the terms of Cosine distance, to the representation of the target image, this example will be classified as the target’s identity. So, we want to find an adversarial example that can simultaneously shrink the Cosine distance to a vary small value in all of our substitute models. Thus we make heavier for those weights of models in which current modification can hardly shrink the Cosine distance and loose those weights of models in which the current modification has already got a small Cosine distance. This weighting operation have been done in the part before the

-th line of the algorithm. The rest of the algorithm tries to clip those gradients in such dimensions where our substitute models do not agree on the direction. More specifically, in these dimensions the variance of gradients from our substitutes is large or the mean is small. In the favor of large mean value, we use

to measure the agreement in every dimension and set the bar to be .

To be noticed is that the derivate of Cosine distance is:

So, in our searching algorithm, we regularized the derivate by multiplying the corresponding norm, (the -th line), which can accelerate the decreasing of the Cosine distance.

The second question is how to find an adversarial example with small modification. To solve this problem, we designed the Algorithm 3, inspired by the success of multi-step searching algorithm.

Input: , , ,
Output:
1 ;
2 ;
3 ;
4 while  do
5       for i = 1 to  do
6             ;
7             ;
8             ;
9             ;
10             ;
11             update ;
12            
13       end for
14       ;
15      
16 end while
17;
Algorithm 3 Searching adversarial example algorithm.

Our searching algorithm is a multi-step algorithm that enlarges the modification limitation step by step. In each step, the algorithm tries times to find the optimal solution to minimize the following objective function:

(6)

where we chose is which ensure that even in the extreme case where and the parameter in front of is still about 7 times larger than the parameter in front of . Another advantage of the objective function is that it enforces the algorithm fast decreasing the when is small and bouncing around border when is near to the modification limitation.

Iv-C Analysis

To find out how EXCIT enhances the transferability in a query-free, black-box attack, we analyzed our implementation using a ResNet-101 model trained on L3 dataset as the target model, and a set of ResNet-101 models trained on L2 dataset (no overlap with the target’s training set except the subjects and victims) as substitute models. Specifically, in the dodging attacks, the attacking set we used is the same with the attacking set mentioned in Section III-B, that contains 100 identities and their 635 images. In the impersonation attacks, we selected 600 image-pairs from 10 identity-pairs (subject-victim). For all identities involved in impersonation attacks, we ensure that each of them have at least 100 images in the Megaface dataset. Under both attack settings, four substitute models were trained using EXCIT method and on augmented dataset, and the target model was attacked by the adversarial example found by these four substitute models. We compare the results with the results of previous studies (Section III) and the analysis is following.

Transferability. As we can see from Fig 3 and Fig 4, in both attacks, EXCIT improved the transferability, which were evident for dodging (from 81.2% to 89.6%) and dramatic for impersonation (from 17.5% to above 49.8%). Interesting here is that for the dodging attack, our approach is close to the attack using the substitute models trained on the same level dataset with the target model, indicating that our EXCIT model is actually effective in recognizing the subject. This is further supported by the findings for the impersonation attack, in which none of the substitute models without the NLL enhancement could come even close to our performance, even for those as well-trained as the target model. Actually, even for a ResNet-101 trained on L3 dataset, we found that our substitute models got a transferability of 49.8%.

Fig. 3: Dodging performance: The left figure shows the distribution of modifications made by approaches with and without NLL enhance. The right figure shows the transferabilities of them.
Fig. 4: Impersonation performance: The left figure shows the distribution of modifications made by approaches with and without NLL enhance. The right figure shows the transferabilities of them.
ResNet-101 GoogLeNet VGG-16
with NLL 49.8% (72.1%) 46.8% 43.5%
without NLL 17.5% (25.2%) 16.2% 16.3%
TABLE VI: Impersonation transferability of NLL-enhanced approach on different structures. The number in the bracket is the transferability using substitute models trained on L3 datasets.

Further our study shows that EXCIT also works on other DNN structures: we use the same setting (four L2 substitute models to attack L3 target model) to test the effectivity of EXCIT on VGG-16 and GoogLeNet structures, and found that (Table VI) both attacks achieved around 45% transferability, way above the 16% reported in previous data-size study (Section III-C).

Distance from the subject. Without querying the target, naturally the adversarial examples discovered by EXCIT tend to be farther away from the original image. What we want to know, however, is for a given distance from the subject, whether the adversarial examples found by our approach still have a higher probability of success, compared with the attack without the NLL enhancement. For this purpose, we use four substitute models trained on L2 datasets to attack the target model trained on L3 dataset, and compare both methods’ average transferability under various distances. The results are given in Table VII. A more detailed study, by restricting the searching space within a certain radius is given in the Appendix C. From these results, we observe that NLL improves the transferability in every distance.

Dodging with NLL 87.7% 88.3% 89.6%
Dodging without NLL 79.5% 81.2% 81.2%
Impersonation with NLL 50.5% 47.2% 49%
Impersonation without NLL 14.5% 17.5% 17.5%
TABLE VII: Transferability under different distance constraints.

Training cost. To understand how EXCIT helps a resource-limited adversary, we evaluated the training cost of substitute models over our 8-GPU server (with the 12GB memory for each GPU). As illustrated in Table X, it took about 200 hours to train a ResNet-101 on L3 dataset and more than 500 hours to build the model on L4 dataset, while constructing a substitute models on L2 dataset used 75 hours. Note that we could fully parallelize the training of 4 substitute models, but could not do this for a target model over the same amount of computing resources, due to the communication overheads.

To further analyze the cost EXCIT could reduce, we trained four substitute models of ResNet-50 and four of ResNet-101 on L1 datasets to perform impersonation attacks against the target ResNet-101 models trained on L3 and L4 datasets. As we can see in Tabel VIII, with the small amount of training data, our approach elevated the transferability of these attacks to 30-40% for target model trained on L3 datasets and around 20% for the target model trained on L4 dataset, while the training time stayed at 5 to 9 hours per substitute model.

L3 L4
ResNet-50 L1 38.5% 16.8%
ResNet-101 L1 44.2% 23.2%
TABLE VIII: Impersonation transferability with NLL-enhanced approach using L1 models as substitutes. Cell is the result of using model to attack model .

Also we compared the efficiency of our approach against the attacks without the NLL enhancement. In the latter case, the only way to improve the transferability is to train more substitute models to find their common adversary examples. In our study, we built three experiments using 4, 8 and 16 substitute models respectively, each model trained on a L1 dataset with ResNet-101 structure. In these experiments, the target model is trained on L4 dataset and also with ResNet-101 structure. The results of the standard ensemble method over these models are presented in Table IX. As we can see here, when attacking the L4 target, even with 16 substitutes, the attack could not achieve the same level of transferability as the 4 NLL-enhanced substitutes, even the substitutes with only 50 layers. In this case, the cost of EXCIT, in terms of training time, is no more than 16.8% of the direct attack (with 16 substitutes).

4 models 8 models 16 models
7.2% 12.7% 16.5%
TABLE IX: Impersonation transferability of standard ensemble method using four substitute models (no NLL) trained on L1 datasets.
Depth Dataset Time Memory
50 L1 5h 8x4.5G
101 L1 9h 8x6.5G
101 L2 75h 8x7.5G
101 L4 >500h 8x8G
TABLE X: Cost of training different models.

V Evaluation on Real-World Systems

We evaluated our approach, EXCIT, on four real world systems. Three of them are online, with APIs available for the public, and the last one is a commercial system without open access, one of the products from SenseTime Ltd. We performed both dodging and impersonation attacks against them. The details of our experiments and our findings are elaborated below.

V-a Experimental Settings

Unlike the models built in our analysis, which were trained over the subject and victim’s images and output a vector specifying the possibility of the input image belongs to every identity, a real world FR system takes two photos as inputs and calculates a score about the similarity of the individuals in these two photos. Here is how we determined whether an adversarial example worked on the real world FR systems:

In a successful dodging attack, we expect that the target system outputs a low score () for two images: one is the subject’s original photo and the other is the adversarial example generated by our approach from the original photo. In a successful impersonation attack, the target system is supposed to output a high score () for two images: the victim’s photo and the adversarial example generated by our approach from the subject’s image, indicating that they are belong to the same individual. As usual, the thresholds and are specified by the FR system.

In our experiments, we first trained 4 ResNet-101 models on randomly sampled 3M photos from 600K identities in the MegaFace Challenge 2 dataset444We did not use all 4M for each substitute in an attempt to make these substitutes diverse..

From all identities, we selected 10 individuals as the subjects in our dodging attack. For the impersonation, we sampled 10 subject-victim pairs. Every identity involved has at least 100 images in the Megaface dataset. In the experiments, we randomly chose 10 of each individual’s images for the dodging attack and 10 photo-pairs for each subject-victim pair to execute the impersonation attack. For each of these subjects or subject-victim pairs, we used our method to augment the dataset and fine-tuned the substitute models on this augmented dataset.

V-B Attack on Online APIs

The three online APIs attacked in our research are ColorReco555http://www.colorreco.com/faceCompare, FaceVisa666http://www.facevisa.com/web/index/demo and Face++777https://www.faceplusplus.com/face-comparing/#demo. The models behind these APIs were trained with a large amount of data. For example, Face++ was built upon 5M photos of 20K identities [31] and FaceVisa was upon 2M photos. Also they all demonstrated a high recognition accuracy over the Labeled Faces in the Wild (LFW) dataset [6]: for ColorReco, for FaceVisa and for Face++. In our experiments, we ran a python script to automatically upload our test photo pairs to ColorReco and FaceVisa. For Face++, we had to do it manually due to the requirement of CAPTCHA solving.

The success rates of our attacks are presented in Table XI. Note that in these experiments, the thresholds for different APIs are different and defined by the APIs themselves. Besides, we set that an attack failed if the target FR system could not detect face from the adversarial example submitted, even for the dodging attack. As we can see from the table, our approach achieved a higher accuracy in the dodging attack, compared with the attacks without the NLL enhancement (Table VI). A much bigger boost, however, is observed for the impersonation attack, in which EXCIT raised the success rates for all three systems from around 20% to 69-85%. Fig 5 further illustrates the distributions for the scores of our submitted photo pairs.

Fig. 5: Distributions of scores for impersonation attacks: from left to right, they are the results of ColorReco, Facevisa and Face++ respectively.
ColorReco Facevisa Face++
dodging 98% 96% 95%
0.75 0.64 0.623
impersonation 74% 85% 69%
0.80 0.74 0.691
TABLE XI: Success rate against online APIs.

V-C Attack on Industrial System

Commercial FR systems are often better trained and more capable than the free FR APIs, which are mostly used for online demo. Such industry-grade systems are typically characterized by a large number of layers, and being trained over a massive amount of data on clusters of GPUs. The services they provide are not open to the public and only available for purchase. In our research, we obtained the commercial SDKs from SenseTime Ltd. through our collaborations. SenseTime’s products are known to be among the leading FR systems [25]

. So the system we analyzed represents the state-of-the-art in FR technologies. It was trained over 20M photos for 1M individuals, using a ResNet-like model, though the details of the structure are commercial secrets. The model tested in our study was estimated to require at least 14,000 hours (50 epochs) to train, over our GPU server. By comparison, all 4 EXCIT substitutes used in our attack were trained for 2,500 hours in total. With less than 1/5 of the time spent on training the models, our approach achieved a high success rate for the 100 individual selected for the dodge attack and 100 pairs for the impersonation attack: in the former case, 89% of transferability was achieved, compared to 70% without the NLL enhancement, and the in the latter, we raised the transferability from 11% to 62%. Fig 

6 further shows examples for the successful attacks. We have reported our findings to SenseTime and are helping them improve their system.

Fig. 6: Successful impersonation attacks on SenseTime. The three columns are the subject photos, our generated adversarial examples and the target photos respectively. The modification of the first case is 13.41 and the second case is 7.22.

Vi Related Works

Our approach utilizes a synthesized dataset to fine-tune our substitutes, for the purpose of approximating the NLL property at PoIs. Synthesized data have also been used in the prior research [17], to support completely different techniques and to different purpose. Specifically, the prior research uses synthesized data to query the target model to train the substitute (1000 times to achieve an 84.24% success rate in transferring adversarial examples). This is exact the attack scenario our query-free approach is designed to avoid. Without communicating with the target model, the only thing we can do is to build a substitute as well-trained as the target, so as to captures their common structural weaknesses to enhance transferrability. Such an attack is found to be completely feasible through simulating the target’s behaviors around PoIs, based upon the NLL property we discovered and additional images collected from the victims and the attackers.

Our approach also exploited the ensemble method. The original ensemble-based approach is proposed by Liu et al. [13]. Their work focus on the transferability among DNN models with different structures, whereas the data size is the factor that really matters in face recognition domain, as have been demonstrated before. Compared with them, our method can increase the transferability from a model trained with insufficient data to a model trained with plenty of data, which goes beyond their method’s capability. Another ensemble method is proposed by Sarkar et al. [21]. They trained a DL model to find “universal” perturbations fooling all their pre-trained target models. Particularly, their object function combines both the sum of targets’ (mis)classification loss and the scale of the finding perturbations. The difference from their method is similar with above. In our attacking scenario, their method may not work.

Concurrent to our work, mixup [30] also utilizes the interpolated images to augment the dataset for optimizing KL-divergence loss function. However, mixup labels with , as we can see from Fig 2, which is just a rough approximation of the DNN’s real output. In contrast, based on the NLL, we deduce the real formation of the output (see Eq 4) and further improve our data augmentation method using this formation. Also, mixup is designed to improve the classification accuracy while our method is meant to transfer the NLL property across models. So we believe that our approach is likely to perform better when it comes to transferability, given our better prediction of the DNN outputs (which mixup just uses a linear function to estimate).

Vii Discussion

Understanding NLL. Unlike existing cross-model attacks, EXCIT does not even interact with the target, so there is no way for our approach to exploit the specific defects of the target. The reason we can still find highly transferable examples is that by simulating better-trained models around PoIs, our approach is likely to discover some common (potentially structural) defects fundamental to a certainly type of DNN, and in the meantime, avoid exploring the subspace unlikely to contain adversarial examples, given the reduction of training-specific weaknesses (e.g., lack of sufficient data) around PoIs. Under an NLL-enhanced substitute, we could even discover transferable examples with modifications are restricted to a given facial region: e.g., around the eyes (Appendix D), which allows the prior attack [23] (using printout glasses to evade detection) to work in a query-free, black-box setting.

In the meantime, our understanding of transferability is still limited. Still less clear are the questions such as whether there exist adversarial examples inherent to certain DNN structures or even the fundamental design of artificial neural networks. Further studies on these issues are certainly important.

Our current definition of NLL describes a relation between Cosine distance and the distance. However, such a relation may not be general, particularly when it comes to non-FR problems: as an example, we found that although the NLL-enhanced models still improve transferability over Cifar10 [10], a dataset for image classification, the enhancement is less significant (Table XII). This could be attributed to the unique features of FR: e.g., differences between two faces can be added to another face to form a new face, which makes the “transitional” images easy to construct; also FR tasks are characterized by a large number of training categories, compared with other tasks (e.g., 1K categories for ILSVRC vs. 600K entities for Megaface), which forces the DNN model to map input images to a high-dimensional sphere with a maximum space utilization. All these features make NLL more effective on the FR tasks. What is less clear, however, is how to extend the concept to improve the transferability of other tasks, which should be studied in future research.

Dodging Impersonation
with 100% 57%
without 100% 63%
TABLE XII: Transferability on Cifar10. We used 4 substitutes (ResNet-20) trained on 10K images to attack the target model (ResNet-20) trained on 30K images.

Defense. Defensive  [19] has been demonstrated to be effective against most of previous attacks. However, as pointed out by Carlini et al [2], a modified version of existing attacks will break them defense. We also found that does not work on EXCIT either: more specifically, We implemented the defense on our L3 target model with temperature , and ran four L2 substitute models to attack it. The result is that can only reduce our transferability from to , for dodging, and from to , for impersonation.

Alternatively, we can consider to insert “secret” into the commercial system. Specifically, train the system on a custom dataset where all the photos are covered by a secret pattern. Since queries are not supposed to be made to the target during an attack, the secret added to a commercial system could help mitigate the EXCIT threat. In general, however, defense against adversarial learning is known to be hard [1]. Further research is needed to find an effective way to defeat our attack.

Cost of the attack. As mentioned earlier, training the substitutes to attack SenseTime’s FR system took 2,500 hours on our server. An estimate cost for such resources is about 10000 dollars on Amazon AWS. The computing time here can be shortened through parallelization, since all 4 substitutes can be trained together. Also, the computing cost could be reduced when the adversary attempts to impersonate multiple victims or hide multiple subjects. In this case, only one set of substitutes need to be trained over our dataset, which can later be augmented with NLL for different subjects or subject-victim pairs to support dodging or impersonation attacks.

Viii Conclusion

In this paper, we present our new understanding of DNN-based FR systems, in terms of their vulnerability to the transferable attack under a resource-constrained adversary. Our research shows that limited resources, particularly smaller training sets, can have a significant impact on the effectiveness of the attack. This is important since a real-world adversary typically cannot query the target frequently and needs to build substitutes as capable as, or even more powerful than the target model under his limited resources. Narrowing such a resource gap, however, turns out to be feasible through a novel technique we developed. Specifically, we found that the adversary could make an effective use of the extra information (images) about subjects and victims in his possession, by approximating the relations of these PoIs with other images in the training set characterized by a Near Local Linearity (NLL) property we discovered. As a result, we can grossly elevate the transferability in both a dodging and an impersonation attack by training NLL-enhanced models by nearly 50% in attacking industry-grade systems. With our new techniques and findings, still more effort needs to be made to better understand transferability and mitigate the threat it poses.

References

Appendix A Transferability prediction

Given an adversarial example discovered, the attacker needs to have some idea how likely the example could also mislead the target model. Also in the presence of multiple examples, the most promising one would be given preference. One way to estimate the transferability of an example is to train multiple models as capable as the target, called target simulators or simply simulators, run substitutes to attack them and then collect the statistics about the relation between the features of the adversarial examples discovered in substitutes and the transferability of the examples. Given such a relation, the attacker can look at the features of an example to estimate the likelihood that it could fool the target. This simple approach, however, does not work in practice, as we do not have the resources (e.g., a large number of images) to build such powerful simulators.

Therefore in our research, we took a different path including three steps: firstly, we train simulators on the data we have; secondly, we estimate the difference between simulators and the target, thirdly, we plus the estimated difference to our simulators’ outputs to predict the target’s output. In these process, estimation the difference is challenge, as we don’t know how the target model looks like. But we can know the scale of the training set of the target model. Thus we built a function to estimate the difference between models trained on images and images. Specifically, our study shows that when training data grows, the substitute becomes similar to the target, and the impact of their data size difference becomes less prominent. Next, plussing the average difference to every simulator’s output, we derive what the target model would output for adversarial examples and can choose the best one with the largest likelihood that it will fool the target.

To build measuring the difference between two models trained on images and images, we need to find a “bridge” to connect them. A nature idea is leveraging their loss that measures how far away they are from the perfect model and further implementing the triangle inequality to estimate the difference between themselves. While the classic softmax loss is inappropriate here, cause it not satisfies the triangle inequality. Thus we used the Cosine distance again. We define the Cosine distance loss of a model as following:

And we count the mean and the standard deviation of Cosine distance loss for models trained on different data size. The results are showed on Fig 

7.

Fig. 7: Cosines distance loss of models trained on different size of data.

Now, we can infer the mean () and the standard deviation () of the target model trained on

photos, according to the fitted curve. Further, we assume the target model’s and simulators’ losses obey the normal distribution

and respectively, and roughly estimate the difference by assuming it also obey the normal distribution . Thus, for a certain simulator , we can calculate out what is the likelihood that plus the estimated difference will surpass for dodging attack or lower than for impersonation attack.

62258 1541811 0.0188 0.1176
62258 4019407 0.0371 0.1225
1541811 4019407 0.0163 0.1001
TABLE XIII: Statistics of .

Besides, in Table XIII, we list true values of the distance of some pairs of , and observe that, along with the increase of data size, the simulator becomes similar to the target, and the impact of their data size difference becomes less prominent (small and ).

Appendix B Deduction of Equation 4

For a DNN model,

And for an ideally-trained DNN model ,

as the perfection of the model. Thus, for a tuple ,

Based on the NLL property, we know that and hold true. So we get:

Using the results previously get we can also get:

where

So,

Using the same, we get:

As the conclusion,

where

Besides, for a ideally-trained DNN model, we can assume that . From this assumption we can infer that , which is what we desired.

Combined,

Appendix C Performance in Difference Distance

Without querying the target model, naturally the adversarial examples discovered by EXCIT, through simulating a “better” model, tend to be farther away from the subject. What we want to know, however, is for a given distance constraint, whether the examples found by our approach still have a higher probability of success, compared with the attack without the NLL enhancement. For this purpose, we need to modify the objective function of the DNN to limit its search within a given distance (in terms of distance) constraint, as follows:

Here we use an function that penalizes the distance when exceeding . More specifically, we calculated its derivative as follows:

where represents . As we can see here, When , the component involving the function grows quickly, moving the objective function away from the optimality. Therefore, in the optimal situation, should not exceed much.

In our research, we evaluated our approach using the objective function when , and , and exploiting 4 EXCITs trained on 600K photos of 100K identities to attack the target trained on 1.9M photos of 300K identities. The results are presented in Fig 8 and Table XIV. As we can see, under various distances, the adversarial examples found by our approach are always much more transferable than the one without the NLL enhancement. In the meantime, our approach tends to pick up the examples away from the subject, given the fact that the NLL property moves the decision boundary of the substitute model (with regard to the subject and the victim) closer to the ideal one, making it harder to find the adversarial examples close to the subject’s image, though once such an image is found, it is more likely to lead to a successful attack.

Fig. 8: Distributions of modifications under different .
5 20 30
with NLL 6.3% 51.3% 74%
without NLL 3.8% 21.8% 49.5%
TABLE XIV: Impersonation transferability of NLL-enhanced approach under different distance constraints.

Appendix D Restrict the modification to certain region

The trivial method to restrict modifications within a certain region is to quench those derivatives out of the region, while finding the adversarial examples. However, in this setting, finding adversarial examples becomes harder than before. So it is need to totally release the constrain on the magnitude of modifications. We demonstrate two examples restricting modifications around eyes on Fig 9. We observe that the modifications become severe: the distance between generated adversarial example and the original photo of the first case is 24.63, and of the second case is 27.04.

Fig. 9: Successful impersonation attacks within restricted region.