Adversarial Learning with Margin-based Triplet Embedding Regularization

09/20/2019 ∙ by Yaoyao Zhong, et al. ∙ 0

The Deep neural networks (DNNs) have achieved great success on a variety of computer vision tasks, however, they are highly vulnerable to adversarial attacks. To address this problem, we propose to improve the local smoothness of the representation space, by integrating a margin-based triplet embedding regularization term into the classification objective, so that the obtained model learns to resist adversarial examples. The regularization term consists of two steps optimizations which find potential perturbations and punish them by a large margin in an iterative way. Experimental results on MNIST, CASIA-WebFace, VGGFace2 and MS-Celeb-1M reveal that our approach increases the robustness of the network against both feature and label adversarial attacks in simple object classification and deep face recognition.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Deep neural networks (DNNs) have achieved great success [22, 40, 17, 19], significantly improving the development of a variety of challenging applications such as deep face recognition [7, 38, 49, 27, 47, 9, 48, 26] and automatic driving [2, 8].

However, contradictions between the vulnerability of DNNs and the demand of security have become increasingly obvious. On one hand, DNNs are vulnerable. Previous works have discovered, with elaborate strategies, DNNs can be easily fooled by test images with imperceptible noise [44]. This type of images is named as adversarial examples. Moreover, adversarial examples are transferable in different models [33, 10], which means black-box attacks can be launched without knowing the details of target models (architectures, parameters and defense methods). On the other hand, the demand of security arises in safety crucial domains driven by DNNs. Adversarial examples can attack physical-world DNNs applications [23]. For instance, DNNs in an automatic vehicle system can be confused by carefully manipulated road signs [12], and DNNs in a face recognition system are susceptible to feature level adversarial attacks [37, 3, 41].

The existence of adversarial examples has given birth to a variety of researches on adversarial defenses. One straightforward defense strategy is to increase the robustness of the model by injecting adversarial examples in the training process  [44, 24, 46], which is essentially a regularization of the training data augmentation. This strategy is effective in close-set classification like object classification, while may not suitable in open-set settings like deep face recognition where training categories could be in million level. Another strategy is to detect adversarial examples at inference time [29, 52, 41]. This strategy is appropriate for both open-set and close-set classification settings, while it can be easily broken in the white-box setting where the specific defense method is known [5].

Figure 1: Schematic illustration of the defense before MTER training (top) versus after training (bottom). Arrows indicate the gradients arising from the optimization of the cost function. The same color represents the same predict class.

In this paper, we propose a margin-based triplet embedding regularization (MTER) method to train DNNs with robustness. Our intuition is that by training a model to improve the local smoothness of embedding space with fewer singular points, it will be more resistant to adversarial examples. The regularization term consists of two steps optimizations in an iterative way which first find potential perturbations in the embedding space, and then punish them by a large margin. A schematic illustration of the defense before MTER versus after MTER is shown in Figure 1. Specifically, in the embedding space, a potential attack is generated from a source to a target. We improve the robustness by encouraging the hypothetical attacks to gradually approach the source class, meanwhile move far away from all the other target classes. The result of an embedding space visualization experiment is shown in Figure 2. In the optimization, the large margin is not trivial, which strictly ensures the inter-class distance and the intra-class smoothness in the embedding space. Our contributions are as follows:

1. We propose to improve the robustness of DNNs by smoothing the embedding space, which is appropriate for DNNs trained in both open-set and close-set classification settings.

2. We introduce the large margin into adversarial learning, which further guarantees the inter-class distance and the intra-class smoothness in the embedding space, therefore improves the robustness of DNNs.

3. Experimental results on MNIST [16], CASIA-WebFace [54], VGGFace2 [4] and MS-Celeb-1M [15] demonstrate the effectiveness of our methods in simple object classification and deep face recognition.

2 Related work

2.1 Adversarial attacks

Szegedy  [44] first find that they can cause DNNs to misclassify images by a certain hardly perceptible perturbation which is generated using a box-constrained L-BFGS method. Compared with the L-BFGS attack [44], Goodfellow  [13] propose a more time-saving and practical method ”fast“ method (FGSM) to generate adversarial examples by performing one-step gradient update along the direction of the sign of gradient at each pixel.

Kurakin  [23] introduce a straightforward method called ”basic iterative“ method (BIM), to extend the ”fast“ method (FGSM) [13] by applying it multiple times with small step size and clip pixel values after each step to ensure the constraint. Moreover, to generate adversarial examples of a specific desired target class, Kurakin  [23] introduce the iterative least likely method. Iterative methods could attacks DNNs with higher rate compared with the fast method in the same constraint level [23]. Similarly, another iterative attack method proposed by Moosavi-Dezfooli  [31], called Deepfool, is also reliable and efficient. With linear approximation, Deepfool try to generate the minimal perturbation in each step by moving towards the lineared decision boundary [31]. Based on Deepfool [31], Moosavi-Dezfooli  [30]

propose image-agnostic adversarial attacks, which could fool DNNs with a universal perturbation on images with high probability.

Apart from the generation of adversarial examples, there are also works focus on the transferability of adversarial examples [33, 10, 51], adversarial examples in physical world [23, 12], and in specific tasks such as face recognition system [39, 14].

2.2 Defense methods

The defense methods can be classified into two categories, one is to improve the robustness of DNNs, the other one is to detect adversarial examples at inference time 

[29, 52, 41]. We mainly discuss the former type, which is more related to our work.

Network distillation [1, 18] is originally proposed to reduce the model size. Papernot  [34] introduce distillation as a defense method to improve the robustness by feeding back the class probability to train the original model.

Adversarial training could provide regularization to DNNs [13]. Goodfellow  [13]

first propose adversarial training which could increase the robustness by injecting adversarial examples in the training process. Then adversarial training is applied and analyzed in large training dataset ImageNet 

[24].

Although the success of adversarial training on white-box defenses, the defense against black-box attacks is still a problem, due to the transferability of adversarial examples. To deal with the transferred black-box attacks, Tramer  [46] introduce ensemble adversarial training technique transferring one-step adversarial examples from other training models, while Na  [32] propose cascade adversarial trained transferring iterative attacks from already trained model.

Around the same time, Dong  [11] and Na  [32] minimize both the cross-entropy loss and the distance of original and adversarial embedding to improve the vanilla adversarial training. They are the most related work to ours. However, our method mainly differs from them in two aspects: (1) Our MTER method is a straightforward and thorough feature adversary which will not be limited by number of training categories, therefore is also appropriate for open-set classification. (2) We introduce the large margin into adversarial learning, which guarantees not only the intra-class smoothness but also the inter-class distance in the embedding space.

3 Margin-based Regularization

Our purpose is training a DNN to have smooth representations with fewer singular points. Therefore we consider a regularization term which exploits the vulnerability and further fix them in an iterative way.

0:  
  Training set , model parameter

and hyperparameter margin

, mini-batch size .
  
  The final model parameter .
  
  // Constructing the source images queue and the .
  =={}.
  Random select categories in denoted as source , the complementary set is target .
  .append(); .append().
  shuffle and .
  
  while  is not empty and is not empty do
     Take out a mini-batch and with samples respectively in and in .
      Calculate perturbations (3) in an iterative way (5) on the batch and , based on the current model .
      on batch , and .
  end while
Algorithm 1 Margin-based triplet embedding regularization (MTER)

3.1 Exploitation the Vulnerability

First we consider the vulnerability exploitation. We start from some notations. Let denote a labeled dataset where and respectively denote an input image and the corresponding label. A DNN can be formulated in a chain

(1)

parameterized by . The network is originally trained on a dataset by cross entropy

(2)

where giving the sum of the cross entropies between the predictions and the labels .

Given a trained DNN, a source image and a target image, denoted as , where , we could find small perturbations to the source image that produce an internal representation that is remarkably similar to that of the target image  [37]. The vulnerability exploitation in embedding space can be described as:

(3)

subject to

(4)

is the deep representation in the embedding space, which is normalized to unit length from . is the function from the image to its representation at the layer. limits the maximum deviation of the perturbation.

For computational efficiency, we adopt the direction defined by the gradient of the metric loss function and form adversarial perturbations in an iterative way, referred to as iterative feature target gradient sign method (IFTGSM):

(5)

where

(6)

the iteration is chosen heuristically

. can also be generated using a fast method, referred to as fast feature target gradient sign method (FFTGSM). We formulate it as follows:

(7)

We will use this fast attack method in the experiment on face recognition in Section 4.5.

3.2 Fix the Vulnerability

Our target is to improve the robustness of DNNs without modifying their architectures. The aforementioned vulnerability attacks a DNN by finding singular points in the internal representation space of a DNN. Considering the existence of the vulnerability, we find it is possible that we smooth the embedding space by jointly optimizing the original cross entropy and a large-margin based triplet distance constraint as a regularization term.

With a source and a target image , consider a triplet , where is the aforementioned perturbation. Ideally, for all triplets which are generated in the training set, we would like the following constraint to be satisfied:

(8)

However, due to the first step optimization in objective (3

), the actual situation at a certain moment in the training process may be:

(9)

Therefore, the vulnerability exploitation and fixing together constitute two optimization steps in the adversarial learning, which strive to attack each other but also together improve the robustness of DNNs gradually. Formally, we define the margin-based triplet embedding regularization (MTER) as follows:

(10)

where the is obtained and upgraded by objective (3), and the parameter is the margin. In practice, we apply the vulnerability exploitation and fixing in an iterative way, which is precisely described in Algorithm 1. Parameter controls that the similarity between the source image and the perturbed image should be much higher than that between the perturbed image and the target image. is chosen based on the training dataset and the model capacity. We will discuss the parameter in the following ablation study in Section 4.3.

4 Experiment

4.1 Experiment on Simple Image Classification

In this section, we first analyze the effect of the margin-based triplet embedding regularization (MTER) method on MNIST [16], a simple image classification task. We train ResNet [17] models using original training loss functions, adversarial training, and our MTER method, respectively. We test different models assuming that the adversary knows the classification algorithm, model architecture and parameters, because the reliability of a model could be demonstrated if a model is robust in the white-box setting.

We first give a brief description of the adversarial training method [24] and attack methods FGSM [13], BIM [23], FTGSM [23], ITGSM [23] which we will test and compare with our method.

Fast gradient sign method (FGSM) [13] generates adversarial examples by perturbing inputs in a manner that increases the sign of the gradients of the original loss function the input image :

(11)

where giving the cross entropy between the predictions and the labels , limits the maximum deviation of the perturbation.

Basic iterative method (BIM) [23] is a modification of the FGSM [13] by applying it multiple times:

(12)

where is used, is referred to as equation (6), and number of iterations is .

Compared with BIM [23], iterative target gradient sign method (ITGSM) [23] leads the model to misclassify an image as another target category:

(13)

where is the target label we would like the model to predict, . Also, the target attack can be launched in a Fast style, referred to as fast target gradient sign method (FTGSM) [23]:

(14)

We use ResNet-18 [17] for training. The models are trained from scratch. There is no data augmentation for both datasets. is applied for MNIST [16]. In ITGSM [23] attack, the least likely class is used as the target class. We use two type of original loss functions Softmax and ArcFace [9], which is a type of large-margin loss and first used in deep face recognition. The feature scale is set to 10 and angular margin is set to 0.4 in ArcFace [9]. We improve the robustness of the two type of original loss respectively using adversarial training [24] and our method. The margin in our MTER method is set to 0.2. We follow the adversarial training implemented in Kurakin  [24], which increases the robustness by replacing half of the mini-batch clean examples with their adversarial examples into the training process. More specifically, we generate adversarial examples using FGSM [13] perturbations with respect to predicted rather than true labels following works [32, 36], to prevent ”label leaking“ [24] where the model tend to learn to classify adversarial examples more accurately than regular examples. The relative weight of adversarial examples in the loss is set to 0.3 following [24].

The results are shown in Table 1. As shown in the table, even though adversarial training is done with the predicted label, the label leaking phenomenon [24] still happens. Our MTER method improves the robustness of the original models using different loss functions under FGSM [13], BIM [23] and ITGSM [23] attacks. For models trained with Softmax, our method sacrifices a little performance on the clean images. while for models trained with ArcFace [9], a large margin loss function, it even improves the accuracy on clean images. Besides, it outperforms adversarial training method under BIM [23] and ITGSM [23] attacks. Even though we did not use these type of adversarial examples for data augmentation in training like adversarial training method, our method could still gain robust improvements under unknown attacks. This indicates that our method can improve the robustness of models on simple image classification like MNIST [16].

Method Clean FGSM [13] BIM [23] ITGSM [23]
Softmax loss 99.6 10.4 0.0 6.0
Adversarial training 99.6 99.9 3.1 47.2
Softmax+MTER (ours) 99.5 96.8 98.7 95.1
ArcFace Loss 99.5 28.6 1.7 24.7
Adversarial training 99.5 99.5 30.5 71.3
ArcFace Loss+MTER (ours) 99.6 96.6 98.0 95.3
Table 1: MNIST [16] test results (%) for Resnet-18 [17] models ( at test time). The higher the accuracy is, the more robust is the target model.

4.2 Embedding Space Visualization

MNIST [16], the popular and sweet dataset is used for embedding space visualization. We use the ResNet-18 [17] by changing the original fully connected layer to two fully connected layer and then modifying the embedding dimension to 2. We retrain networks on MNIST [16] with Softmax and Softmax combined our MTER method, respectively. Then we use the clean examples of test dataset for visualization. Besides, for each class, we randomly choose a test sample and generate adversarial examples of it using BIM [23] for visualization.

The results are shown in Figure 2. The round represents the clean examples of MNIST [16] test set. The triangle represents BIM [23] adversarial examples, which we draw from to for one sample image per each class. We can observe that with our MTER method, the adversarial examples are close to clean examples and distributed in the original class region in the embedding space. The inter-class margin is enlarged and the intra-class smoothness is improved, which guarantee the robustness of the model.

Figure 2: Embedding space visualization on ResNet-18 [17] which is modified embedding dimension to 2. Models are trained on MNIST [16] with Softmax and Softmax combined with MTER method, respectively. The round represents the clean examples of MNIST [16] test set. The triangle represents BIM [23] adversarial examples, which we draw from to for one sample image per each class.

4.3 Analysis on Margin

We use MNIST [16] to conduct adversarial study [44, 13, 32, 53], to further analyze our MTER method. The only hyperparameter in our method is the margin . So we would like to explore the influence of and give advice on choice of it under different settings.

First we train LeNet-5 [25] and ResNet-18 [17] by varying in . Then we test these models using aforementioned attack methods, FGSM [13], BIM [23] and ITGSM [23]. The results are illustrated in Figure 3, from which we can discover significant difference between the two type of models, LeNet-5 [25] and ResNet-18 [17]. Although the two type of models both could obtain good test accuracy on MNIST [16], the accuracy for LeNet-5 [25] and ResNet-18 [17] is 99.2% and 99.6% respectively. However, for LeNet-5 [25], along with the increase of margin , the robustness to different attacks improves gradually and the accuracy on clean images decreases slightly. While for ResNet-18 [17], both the accuracy on clean test set and the robustness against adversarial attacks, have reached a relative high level and remain unchanged when the margin increases.

Figure 3: LeNet-5 [25] and ResNet-18 [17] trained using MTER by varying margin in . The two type of models both could obtain good test accuracy on MNIST [16]. However, for LeNet-5 [25], along with the increase of margin , the robustness to attacks improves gradually. While for ResNet-18 [17], the robustness has reached a relative high level and remains unchanged when the margin increases.
Figure 4: Accuracy on clean images, and adversarial examples generated by FGSM [13], BIM [23], and ITGSM [23] of models with different architectures. “L5”,“R6”,“R8”,“R10”,“R18”,“34” and “R50” on the x-axis denote LeNet-5 [25], ResNet-6 [17], ResNet-8 [17], ResNet-10 [17], ResNet-18 [17], ResNet-34 [17] and ResNet-50 [17]. For models trained using Softmax combined with MTER(), the accuracy on the clean images and three adversarial examples increases when the model size becomes bigger (from LeNet-5 [25] to ResNet-10 [17]), while remain relatively stable when the model size reaches a certain value (after ResNet-10 [17])

Furthermore, we fix the margin . We use this relative small margin because we would like to observe the performance of different networks under relaxed state of our method. We train models under Softmax and our MTER method respectively, using different architectures, LeNet-5 [25], ResNet-6 [17], ResNet-8 [17], ResNet-10 [17], ResNet-18 [17], ResNet-34 [17] and ResNet-50 [17]. We still test these trained models under the three aforementioned attacks. The results are shown in Figure 4. In the figure 4, for models trained using Softmax, the accuracy on the clean images increases when the model size becomes bigger (from LeNet-5 [25] to ResNet-50 [17]). For models trained using Softmax combined with MTER (), the accuracy on the clean images and three adversarial examples increases when the model size becomes bigger (from LeNet-5 [25] to ResNet-10 [17]), while remain relatively stable when the model size reaches a certain value (after ResNet-10 [17]).

We then infer that a specific classification task need a certain amount of computing power to fit the clean set and its augmentation set, adversarial examples. It is easy for our method to push a large model to learn both the clean images and the adversarial examples under a relative relaxed state (small ), while it will not work for a relative small model with less computing power. Therefore we recommend to increase the margin to push the “lazy” model fighting with adversarial examples but sacrificing a little performance on the original dataset, if a small model is used and is still concerned about robustness.

4.4 Black-box Attack Analysis

We also use MNIST [16] for analysis of adversarial attacks and defense under black-box settings. We report black box attack accuracy of the adversarial examples generated from a source network and tested on another target network. In our experiment, both the source network and the target network are trained with adversarial learning methods in different architectures. Specifically, we use four models, the architecture of them are LeNet-5 [25] or ResNet-18 [17], and training methods are adversarial training [24] or our MTER method. The adversarial examples from the source models are generated by FGSM [13] or BIM [23] with .

The results on FGSM [13] and BIM [23] are shown in Table 2 and Table 3, respectively. The row and the column denote the source and target model respectively. The LeNet-5 [25] is denoted as ”L5“, and ResNet-18 [17] is denoted as ”R18“. ”ADV“ is a shorthand of adversarial training [24]. ”MTER-R18“ means this ResNet-18 [17] model is trained supervised by Softmax combined with our MTER method. The higher the accuracy is, the more robust is the target model.

As shown in the Table 2 and Table 3, if the source models and the architecture of targets models are both equal, the accuracy of target models trained using MTER method is higher than that of adversarial training method [24]. This phenomenon indicates our MTER method show better robust performance under black box attack scenario, on both FGSM [13] and BIM [23] adversarial examples, even if the adversarial training have used FGSM [13] examples for augmentation. Besides, we find that the ResNet-18 [17] models are more robust than LeNet-5 [25] models, while adversarial examples generated from LeNet-5 [25] are more aggressive and have better transferability than those of ResNet-18 [17].

SourceTarget ADV-L5 MTER-L5 ADV-R18 MTER-R18
ADV-L5 92.0 95.6 97.3
MTER-L5 85.3 94.5 97.7
ADV-R18 89.8 94.9 97.4
MTER-R18 45.2 94.2 94.1
Table 2: MNIST [16] test result (%) on FGSM [13] () adversarial examples under black box settings. The row and the column denote the source and target model respectively. The LeNet-5 [25] is denoted as ”L5“, and ResNet-18 [17] is denoted as ”R18“. ”ADV“ is a shorthand of adversarial training [24]. The higher the accuracy is, the more robust is the target model.
SourceTarget ADV-L5 MTER-L5 ADV-R18 MTER-R18
ADV-L5 90.1 89.1 97.3
MTER-L5 79.3 92.8 97.5
ADV-R18 88.2 94.9 97.3
MTER-R18 84.0 94.6 95.6
Table 3: MNIST [16] test result (%) on BIM [23] () adversarial examples under black box settings. The row and the column denote the source and target model respectively. The LeNet-5 [25] is denoted as ”L5“, and ResNet-18 [17] is denoted as ”R18“. ”ADV“ is a shorthand of adversarial training [24]. The higher the accuracy is, the more robust is the target model.

4.5 Experiment on Deep Face Recognition

In a deep face recognition system, an adversary may try to disguise a face as an authorized user. We simulate this scenario using state-of-art face recognition models and test our MTER method. Deep face recognition is a open-set problem, which indicates the training identities and the test identities are usually different. We don’t directly classify an identity by end-to-end classification probability, but use a DNN as a deep feature extracter and compare deep features to distinguish faces.

Training datasets. In the experiment, the training datasets are CASIA-WebFace [54], VGGFace2 [4] and MS1M-IBUG [15]. The CASIA-WebFace [54] dataset is the first widely used large-scale training dataset in deep face recognition, containing 0.49M images from 10,575 celebrities. VGGFace2 [4] is a large-scale dataset containing 3.31M images from 9131 celebrities. There are diverse and abundant images in VGGFace2 [4], which have large variations in pose, age, illumination, ethnicity and profession. MS1M-IBUG [15] (referred to as MS1M [15]) is a refined version of MS-Celeb-1M dataset [15], which is public available and widely used. The original MS-Celeb-1M dataset [15] contains about 10k celebrities with 10M images. MS1M [15] is refined by Deng  [9] to decrease the noise and finally contains 3.8M images of 85,164 celebrities.

Network settings. For all the embedding networks, we adopt the ResNet-50 [17], but make changes as [9] which apply the “BN [21]-Dropout [42]-FC-BN” sturcture to get the final 512- embedding feature. For image preprocessing, the images are cropped and aligned to the normalized face following [9]. In the training process, the original models are supervised by an effective loss function ArcFace [9], which has been widely accepted by industry. The feature scale is set to 64 and angular margin is set to 0.5 for CASIA-WebFace [54] and MS1M [15], and 0.3 for VGGFace2 [4] following the original paper [9]

. Different from the object classification experiment, the face models are finetuned with ArcFace combined with our MTER method. The face models finetuned with MTER method have fine convergence speed and usually converge in no more than 3 epoches. We set

to 0.2, 1.2 and 1.4 for CASIA-WebFace [54], VGGFace2 [4] and MS1M [15], respectively.

Figure 5: The first row is the five target identities, the second row is five attackers which are randomly selected in all the 13233 attackers.
Figure 6: ROC curves of different models on LFW [20]. We define the distance threshold of a network for attacking (or distinguishing a positive and a negative pair) to have a low false accept rate (FAR = ) on LFW.
Training Method Training Set FFTGSM(=10) IFTGSM(=5) IFTGSM(=10) ITGSM [23](=10) LFW YTF
ArcFace [9] CASIA-WebFace [54] 99.3 100.0 100.0 98.3 99.5 95.6
ArcFace [9]+adv. CASIA-WebFace [54] 5.1 ( 94.2) 5.7 ( 94.3) 49.5 ( 50.5) 0.8 ( 97.5) 99.4 94.7
ArcFace [9]+MTER CASIA-WebFace [54] 2.0 ( 97.3) 3.5 ( 96.5) 27.4 ( 72.6) 0.1 ( 98.2) 99.5 94.8
ArcFace [9] VGGFace2 [4] 98.3 100.0 100.0 100.0 99.7 97.7
ArcFace [9]+adv. VGGFace2 [4] 3.1 ( 95.2) 5.5 ( 94.5) 63.8 ( 36.2) 0.1 ( 99.9) 99.5 97.2
ArcFace [9]+MTER VGGFace2 [4] 4.6 ( 93.7) 6.1 ( 93.9) 35.6 ( 64.4) 0.1 ( 99.9) 99.6 97.5
ArcFace [9] MS1M [15] 99.8 100.0 100.0 69.2 99.7 97.0
ArcFace [9]+adv. MS1M [15] 45.4 ( 54.4) 20.1 ( 79.9) 62.6 ( 37.4) 0.1 ( 69.1) 99.6 96.2
ArcFace [9]+MTER MS1M [15] 4.1 ( 95.7) 7.9 ( 92.1) 61.4 ( 38.6) 0.0 ( 69.2) 99.8 96.9
ArcFace [9]+MTER MS1M [15]+VGGFace2 [4] 9.6 ( 90.2) 6.2 ( 93.8) 19.5 ( 80.5) 0.1 ( 69.1) 99.5 96.8
Table 4: The average hit rate of models trained on CASIA-WebFace [54], VGGFace2 [4] and MS1M [15] supervised by ArcFace loss [9], ArcFace [9]+adv., and ArcFace [9]+MTER, respectively. Attacks are launched from attackers to disguise targets in the feature level using FFTGSM, IFTGSM, and in the label level using ITGSM [23]. The lower is the hit rate, the stronger is robustness of models.

Recognition performance. We test the recognition performance of all the models on LFW [20] and YTF [50]. LFW [20] contains 13233 face images from 5749 different identities. We follow the unrestricted with labeled outside data protocol on LFW [20] and test on the 3000 positive (same identity) and 3000 (different identity) negative pairs. YTF [50] is a database of face video collected from YouTube, which consists of 3,425 videos of 1,595 different people. Each video varies from 48 to 6,070 frames, with an average length as 181.3 frames. We follow the unrestricted with labeled outside data protocol on all the test datasets.

Robustness performance. To simulate the face disguise scenario, we select five person as target identities, as shown in the first row of Figure 5. Then we use the 13233 face images in LFW [20] as attackers to disguise another five target person respectively, which construct a attack matrix to simulate random attacks. We test the robustness under two attack settings:(1) feature attacks and (2) label attacks. The feature attack is more practical in face recognition, while we use the label attack for demonstrating the effectiveness of our method in label defense of close-set settings which is a rarity in deep face recognition.

First we define the feature attacks settings. The attacks are launched from attackers to disguise targets. Specifically, the attack goal is to get face embedding representations of attackers closer to those of targets than the distance threshold of a face recognition system. Next, we define the threshold of a DNN in our simulation. Using positive and negative pairs of LFW [20], we compute the Euclidean distance of normalized deep features to get ROC curves, as shown in Figure 6. Then we identify distance thresholds for judging a pair is positive or negative. Since we would like to compare the adversarial robustness of the trained models like real-world applications, we define the distance threshold for attacking (or distinguishing a positive and a negative pair) to have a low false accept rate (FAR = ). We will generate attacks (3) using IFTGSM (5) and FFTGSM (7). Then we define the evaluation criteria to measure the robustness of models. The attack goal is to get face embedding representations of attackers closer to those of targets than the Euclidean distance threshold. The defense goal is to keep the distance between the representations of attackers and targets larger than the threshold. Therefore, an attack is defined as a hit if the embedding distance between the attacker and target is lower than the threshold. We use the average hit rate of the five targets to report the robustness performance of the trained models. The lower is the average hit rate, the stronger is robustness of the model.

Finally we introduce the label attacks settings. Although the training identities are different from the test ones, we could use the predicted identity of the targets as their labels and let the attackers to launch attacks towards the predicted labels. Meanwhile, a hit is defined as the predicted label of an attacker is the same as that of the target. We also use the average hit rate of the five targets to report the robustness performance. ITGSM [23] will be used to generate label attacks. We will not report result of FTGSM [23] attacks because we find the this method often fails to attack face models.

Results. The results of defense performance against feature and label level attacks are listed in Table 4. The hit rates of original models are close to 100 percent under settings where 13233 different attackers disguise targets. This may indicate that the state-of-art face models are indeed highly vulnerable to adversarial attacks, and an arbitrary attackers would have high probability to disguise another identity. While with our MTER method, the hit rate decrease significantly, which indicates that our method improve the robustness of the state-of-art face models in both open-set and close-set settings and prevent the face disguise feature attacks to a certain degree. Besides, We discover that the robust performance of our method on MS1M [15] is less significant than that on CASIA-WebFace [54] and VGGFace2 [4]. Therefore we recommend to use the large datasets with less identities to finetune the models with large identities to get better robustness performance, finetune the original model trained on MS1M [15] using VGGFace2 [4]. To further evaluate our method, we also compare with a strong baseline by finetuning the original models and incorporating adversarial examples generated using IFTGSM (5). The result shows that our method further benefits from additional embedding regularization, which indicates that incorporating adversarial examples in the training process could improve robustness, while how to optimize with them is also crucial.

The results of face recognition performance on original models and robust models are shown in Table 5. We also list the state-of-art models in face recognition community. We could also observe that, the accuracy of robust models on LFW [20] and YTF [50] decreased slightly, which indicates that we may sacrifice a certain degree of recognition performance for the improvement of adversarial robustness.

Training Method Training Set LFW YTF
DeepFace [45] 4M 97.35 91.4
FaceNet [38] 200M 99.63 95.1
VGG Face [35] 2.6M 98.95 97.3
DeepID2+ [43] 0.3M 99.47 93.2
Center Face [49] 0.7M 99.28 94.9
Noisy Softmax [6] WebFace+ 99.18 94.88
Triplet Loss [38] WebFace [54] 98.70 93.4
L-Softmax Loss [28] WebFace [54] 99.10 94.0
Softmax+Center Loss [49] WebFace [54] 99.05 94.4
SphereFace [27] WebFace [54] 99.42 95.0
CosFace [47] WebFace [54] 99.33 96.1
ArcFace [9] MS1MV2 (5.8M) 99.83 98.02
ArcFace [9] WebFace [54] 99.5 95.6
ArcFace [9]+adv. WebFace [54] 99.4 94.7
ArcFace [9]+MTER WebFace [54] 99.5 94.8
ArcFace [9] VGGFace2 [4] 99.7 97.7
ArcFace [9]+adv. VGGFace2 [4] 99.5 97.2
ArcFace [9]+MTER VGGFace2 [4] 99.6 97.5
ArcFace [9] MS1M [15] 99.7 97.0
ArcFace [9]+adv. MS1M [15] 99.6 96.2
ArcFace [9]+MTER MS1M [15] 99.8 96.9
ArcFace [9]+MTER MS1M [15]+VGGFace2 [4] 99.5 96.8
Table 5: The accuracy on LFW [20] and YTF [50]. The state-of-art models in face recognition community are listed in the first cell. Other cells are our models used in the face disguise experiment.

5 Conclusion

We have proposed a margin-based triplet embedding regularization (MTER) method to improve the robustness of DNNs. Experiments on MNIST [16], CASIA-WebFace [54], VGGFace2 [4] and MS1M [15] have demonstrated the effectiveness of our method in simple object classification and deep face recognition.

6 Acknowledgments

This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61573068 and 61871052.

References

  • [1] J. Ba and R. Caruana (2014) Do deep nets really need to be deep?. In NIPS, Cited by: §2.2.
  • [2] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba (2016) End to end learning for self-driving cars. arXiv:1604.07316. Cited by: §1.
  • [3] T. E. Boult. (2017) LOTS about attacking deep features. In IJCB, Cited by: §1.
  • [4] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman (2018) VGGFace2: a dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition, Cited by: §1, §4.5, §4.5, §4.5, Table 4, Table 5, §5.
  • [5] N. Carlini and D. Wagner (2017) Adversarial examples are not easily detected: bypassing ten detection methods. In ACM Workshop, Cited by: §1.
  • [6] B. Chen, W. Deng, and J. Du (2017) Noisy softmax: improving the generalization ability of dcnn via postponing the early softmax saturation. In CVPR, Cited by: Table 5.
  • [7] Y. Chen, Y. Chen, X. Wang, and X. Tang (2014) Deep learning face representation by joint identification-verification. In NIPS, Cited by: §1.
  • [8] F. Codevilla, M. Miiller, A. López, V. Koltun, and A. Dosovitskiy (2018)

    End-to-end driving via conditional imitation learning

    .
    In ICRA, Cited by: §1.
  • [9] J. Deng, J. Guo, and S. Zafeiriou (2019) ArcFace: additive angular margin loss for deep face recognition. CVPR. Cited by: §1, §4.1, §4.1, §4.5, §4.5, Table 4, Table 5.
  • [10] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 9185–9193. Cited by: §1, §2.1.
  • [11] Y. Dong, H. Su, J. Zhu, and F. Bao (2017) Towards interpretable deep neural networks by leveraging adversarial examples. arXiv:1708.05493. Cited by: §2.2.
  • [12] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song (2018) Robust physical-world attacks on deep learning visual classification. In CVPR, Cited by: §1, §2.1.
  • [13] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In ICLR, Cited by: §2.1, §2.1, §2.2, Figure 4, §4.1, §4.1, §4.1, §4.1, §4.1, §4.3, §4.3, §4.4, §4.4, §4.4, Table 1, Table 2.
  • [14] G. Goswami (2018) Unravelling robustness of deep learning based face recognition against adversarial attacks. In AAAI, Cited by: §2.1.
  • [15] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao (2016) MS-celeb-1m: a dataset and benchmark for large-scale face recognition. In ECCV, Cited by: §1, §4.5, §4.5, §4.5, Table 4, Table 5, §5.
  • [16] P. Haffner. (1998) Gradient-based learning applied to document recognition. In Proceedings of the IEEE, Cited by: §1, Figure 2, Figure 3, §4.1, §4.1, §4.1, §4.2, §4.2, §4.3, §4.3, §4.4, Table 1, Table 2, Table 3, §5.
  • [17] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, Cited by: §1, Figure 2, Figure 3, Figure 4, §4.1, §4.1, §4.2, §4.3, §4.3, §4.4, §4.4, §4.4, §4.5, Table 1, Table 2, Table 3.
  • [18] G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. arXiv:1503.02531. Cited by: §2.2.
  • [19] J. Hu, L. Shen, and G. Sun (2017) Squeeze-and-excitation networks. arXiv:1709.01507. Cited by: §1.
  • [20] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller (2007-10) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report Technical Report 07-49, University of Massachusetts, Amherst. Cited by: Figure 6, §4.5, §4.5, §4.5, §4.5, Table 5.
  • [21] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167. Cited by: §4.5.
  • [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    .
    In NIPS, Cited by: §1.
  • [23] A. Kurakin, I. J. Goodfellow, and S. Bengio (2017) Adversarial examples in the physical world. In ICLR Workshop, Cited by: §1, §2.1, §2.1, Figure 2, Figure 4, §4.1, §4.1, §4.1, §4.1, §4.1, §4.2, §4.2, §4.3, §4.4, §4.4, §4.4, §4.5, Table 1, Table 3, Table 4.
  • [24] A. Kurakin, I. J. Goodfellow, and S. Bengio (2017) Adversarial machine learning at scale. In ICLR, Cited by: §1, §2.2, §4.1, §4.1, §4.1, §4.4, §4.4, §4.4, Table 2, Table 3.
  • [25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998-11) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: Figure 3, Figure 4, §4.3, §4.3, §4.4, §4.4, §4.4, Table 2, Table 3.
  • [26] S. Li and W. Deng (2018) Deep facial expression recognition: A survey. arXiv:1804.08348. Cited by: §1.
  • [27] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song (2017) SphereFace: deep hypersphere embedding for face recognition. In CVPR, Cited by: §1, Table 5.
  • [28] W. Liu, Y. Wen, Z. Yu, and M. Yang (2016) Large-margin softmax loss for convolutional neural networks.. In ICML, Cited by: Table 5.
  • [29] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff (2017) On detecting adversarial perturbations. In ICLR, Cited by: §1, §2.2.
  • [30] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard (2017) Universal adversarial perturbations. In CVPR, Cited by: §2.1.
  • [31] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) DeepFool: a simple and accurate method to fool deep neural networks. In CVPR, Cited by: §2.1.
  • [32] T. Na, J. H. Ko, and S. Mukhopadhyay (2018) Cascade adversarial machine learning regularized with a unified embedding. In ICLR, Cited by: §2.2, §2.2, §4.1, §4.3.
  • [33] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2017) Practical black-box attacks against deep learning systems using adversarial examples. In ASIACCS, Cited by: §1, §2.1.
  • [34] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In SP, Cited by: §2.2.
  • [35] O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. (2015) Deep face recognition.. In BMVC, Cited by: Table 5.
  • [36] A. S. Ross and F. Doshi-Velez (2018) Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI, Cited by: §4.1.
  • [37] S. Sabour, Y. Cao, F. Faghri, and D. J. Fleet (2016) Adversarial manipulation of deep representations. In ICLR, Cited by: §1, §3.1.
  • [38] F. Schroff, D. Kalenichenko, and J. Philbin (2015) FaceNet: a unified embedding for face recognition and clustering. In CVPR, Cited by: §1, Table 5.
  • [39] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In CCS, Cited by: §2.1.
  • [40] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Cited by: §1.
  • [41] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman (2018) PixelDefend: leveraging generative models to understand and defend against adversarial examples. In ICLR, Cited by: §1, §1, §2.2.
  • [42] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: §4.5.
  • [43] Y. Sun, X. Wang, and X. Tang (2015) Deeply learned face representations are sparse, selective, and robust. In CVPR, Cited by: Table 5.
  • [44] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, and I. Goodfellow (2014) Intriguing properties of neural networks. In ICLR, Cited by: §1, §1, §2.1, §4.3.
  • [45] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf (2014) DeepFace: closing the gap to human-level performance in face verification. In CVPR, Cited by: Table 5.
  • [46] F. Tramer, A. Kurakin, N. Papernot, D. Boneh, and P. McDaniel (2018) Ensemble adversarial training: attacks and defenses. In ICLR, Cited by: §1, §2.2.
  • [47] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu (2018) CosFace: large margin cosine loss for deep face recognition. In CVPR, Cited by: §1, Table 5.
  • [48] M. Wang and W. Deng (2018) Deep face recognition: A survey. arXiv:1804.06655. Cited by: §1.
  • [49] Y. Wen, K. Zhang, Z. Li, and Y. Qiao (2016) A discriminative feature learning approach for deep face recognition. In ECCV, Cited by: §1, Table 5.
  • [50] L. Wolf, T. Hassner, and I. Maoz (2011) Face recognition in unconstrained videos with matched background similarity. In CVPR, Cited by: §4.5, §4.5, Table 5.
  • [51] L. Wu, Z. Zhu, C. Tai, and W. E (2018) Understanding and enhancing the transferability of adversarial examples. arXiv:1802.09707. Cited by: §2.1.
  • [52] W. Xu, D. Evans, and Y. Qi (2018) Feature squeezing: detecting adversarial examples in deep neural networks. In NDSS, Cited by: §1, §2.2.
  • [53] Z. Yan, Y. Guo, and C. Zhang (2018) Deep defense: training dnns with improved adversarial robustness. In NIPS, Cited by: §4.3.
  • [54] D. Yi, Z. Lei, S. Liao, and S. Z. Li (2014) Learning face representation from scratch. arXiv:1411.7923. Cited by: §1, §4.5, §4.5, §4.5, Table 4, Table 5, §5.