Meta Anti-spoofing: Learning to Learn in Face Anti-spoofing

04/29/2019 ∙ by Chenxu Zhao, et al. ∙ 4

Face anti-spoofing is crucial to the security of face recognition systems. Previously, most methods formulate face anti-spoofing as a supervised learning problem to detect various predefined presentation attacks (PA). However, new attack methods keep evolving that produce new forms of spoofing faces to compromise the existing detectors. This requires researchers to collect a large number of samples to train classifiers for detecting new attacks, which is often costly and leads the later newly evolved attack samples to remain in small scales. Alternatively, we define face anti-spoofing as a few-shot learning problem with evolving new attacks and propose a novel face anti-spoofing approach via meta-learning named Meta Face Anti-spoofing (Meta-FAS). Meta-FAS addresses the above-mentioned problems by training the classifiers how to learn to detect the spoofing faces with few examples. To assess the effectiveness of the proposed approach, we propose a series of evaluation benchmarks based on public datasets (e.g., OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, MSU-MFSD, 3D-MAD, and CASIA-SURF), and the proposed approach shows its superior performances to compared methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 5

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: The few-shot face anti-spoofing issue (best viewed in color). Deep anti-spoofing model detects new evolved attacks (left block) by fine-tuning the original model (red pipeline) from few-shot examples (central yellow block). In contrast, meta face anti-spoofing model learns how to detect the new evolved attacks via meta-learner (green pipeline).

Face anti-spoofing (FAS) has become an indispensable component in many face recognition systems. A great number of FAS methods [5, 18, 28, 31] have been proposed to discriminate between the living and spoofing faces. With only texture features considered, previous approaches fall into two categories. The first is traditional methods designed to train shallow detection classifiers with hand-crafted features, e.g., LBP [10], SIFT [34], and SURF [6]

. The second is deep learning methods with binary or depth supervision, which provides an alternative way to learn discriminative representations for anti-spoofing in an end-to-end fashion, such as

[28, 31, 18, 48, 49, 39, 27]. However, the above methods detect spoofing faces mainly based on the prior experiences from the specific forms of presentation attacks (PAs), in addition, the new attack methods develop much more frequently than the high-cost data collection. Therefore, the FAS model should have the capability to learn quickly and to detect the new forms of presentation scene and attack instruments (PSAI) (compared with presentation attack, we add scene to denote the real face) accurately with limited samples. From this viewpoint, the community encounters the current issues as follow:

  • A variety of application scenarios and unpredictable novel PAs keep evolving. Most of them do not exist in the predefined classes.

  • To detect new attacks, existing methods need to collect sufficient samples to train detectors. However, the forms of attacks evolves very rapidly, making it infeasible to collect labeled data from every new attack.

  • Learning deep anti-spoofing models leads to limited performance in cross-testing configuration, since the models are well trained on existing attacks but prone to fail on new attacks with few sample collected.

To overcome these challenges, we approach face anti-spoofing from a few-shot learning perspective. Compared with the conventional face anti-spoofing methods that focus on detecting existing forms of spoofing faces, we aim to train face anti-spoofing model to learn the capability of fast learning the discrimination between living and spoofing faces via meta-learning. Such capability can be generalized to handle new attacks as shown in Fig. 1. We propose evaluations in several dimensions to validate the superiority of our approach for few-shot issue.

Single-domain Dimension. To assess the learning capability of detecting different PSAIs from a single domain (scenario) with few-shot samples.

Cross-domain Dimension. To assess the learning capability of detecting the PSAIs across different domains with few-shot samples.

Cross-modal Dimension. To assess the learning capability of detecting the PSAIs when multiple modalities are involved with few-shot samples.

Zero-shot Capability Dimension.

We consider that zero-shot scenario is a boundary condition of the few-shot learning issue of presentation attack detection (PAD) for the anti-spoofing model should be capable of not only learning spoofing clues from few data, but also detecting spoofing faces of the never seen

PSAIs.

To validate the effectiveness of our approach, we propose benchmarks with meta-learning-fashion based standard protocol on several conventional datasets. For the single-domain dimension, we propose MiniOULU as a few-shot benchmark from Oulu-NPU [7]. For the cross-domain dimension, we merge Oulu-NPU [7], SiW [27], Replay-Attack [8], CASIA-MFSD [51], MSU-MFSD [47], 3DMAD [14] and MSU-USSA [34] into MiniCross. For the cross-modal dimension, MiniSURF is derived from CASIA-SURF [40]. For the zero-shot capability dimension, we utilize SiW [27] in our experiments to evaluate the performance of spoofing models in this dimension. The main contributions of this paper can be summarized as follows:

  • To the best of our knowledge, we are the first to define face anti-spoofing as a few-shot learning problem with evolving new attacks. We further consider zero-shot scenario as a boundary condition of this issue.

  • We propose a novel meta-learning based approach – Meta-FAS – by imitating the process of few-shot scenarios to learn the capability of learning the discrimination between living and spoofing faces.

  • Three novel few-shot face anti-spoofing benchmarks are developed: MiniOULU for the single-domain, MiniCross for the cross-domain, and MiniSURF for the cross-modal. These benchmarks provide protocols for evaluation of both supervised learning based models and meta-learning based models.

  • Extensive experiments demonstrate that Meta-FAS achieves state-of-the-art results on all three few-shot anti-spoofing benchmarks as well as one general anti-spoofing benchmark.

2 Background

2.1 Few-shot Learning and Meta-learning

Few-shot learning problem[44, 41] has been studied by lots of researchers. In this field, the few-shot learning task is commonly defined as N-way K-shot task which contains a Support set with K examples for each of the N ways (way is equivalent to class or category) and a Query set with several test examples. The few-shot learning process can be described as: the model firstly learns from the Support set, and then is tested on the Query set to evaluate its ability of learning from the Support set.

Recently, large amount of works show that meta-learning[4, 3, 38, 17, 32, 13, 30, 26, 19, 36] is a successful method to address few-shot learning problem. To solve few-shot learning problem, meta-learning approaches generally train a meta-learner on a distribution of few-shot learning tasks so that the meta-learner can generalize and perform well on the unseen few-shot learning task.

MAML[17] and Reptile[32] train the meta-learner to learn suitable weight initialization, and with the initialized weight and the Support, the meta-learner updates itself accurately and then performs well on the Query set. Considering that the meta-leaner should update itself more sufficiently, Meta-SGD[26] forces the meta-learner to learn not only the weight initialization but also a weight updater. Also, the meta-learner can do a good few-shot learning work by memorizing the information of the Support set[30]. Besides, LLAML[19], which builds on MAML, uses a local Laplace approximation to model the task parameters. AML and RAML[36] utilize attention mechanism[45, 43] and past knowledge to boost up meta-learner’s ability.

2.2 Face Anti-spoofing

As the development of face recognition system, face anti-spoofing is held in high regard today. In the past research, there are two main kinds of methods to detect presentation attack.

The first is traditional face anti-spoofing methods [10, 11, 29, 34, 6, 22, 35]. These methods usually extract hand-crafted features from the original images and train a binary classifier to discriminate between the living and spoofing faces. In prior works, LBP [10, 11, 29], SIFT [34], SURF [6], HoG [22, 50], and DoG [35, 42]

are commonly utilized to be fed into the classical classifiers, such as SVM and Random Forest. Since approaches with manually-engineered features can only grasp human-defined spoofing clues, they are usually sensitive to various conditions, such as lighting condition, blur pattern, input camera devices, and presentation attack instruments, damaging the generality of face anti-spoofing.

The second is deep learning based face anti-spoofing methods [28, 31, 18, 15, 25, 33, 49]

. As the CNN makes great progress in lots of computer vision tasks, researchers in PAD start to leverage CNN to improve the performance of face anti-spoofing. Single-frame based methods 

[25, 33]

fine-tune pre-trained model on ImageNet 

[12] and recognize face anti-spoofing as a common binary classification. Multi-frame based methods take advantage of more adequate information to cover more spoofing clues in the temporal domain, such as Shao et al. [39], which focuses on the movement of eyes. LSTM [48], 3D convolution [18] and optical flow [15] are also exploited in temporal-based face anti-spoofing. Most of these supervised learning methods usually consider face anti-spoofing as a binary classification problem and use softmax loss in their training stage. Recently, depth-based methods [2, 27, 46] arise and improve the generality of PAD. Atoum et al. [2] is a two-stream CNN-based approach for face anti-spoofing, extracting patch-based features and using facial depth to learn the 3D reconstruction of faces. Liu et al. [27] proposes a face anti-spoofing with the help of auxiliary supervision, consisting of depth map and rPPG signals. Wang et al. [46] exploits the depth information in the temporal domain, and integrates OFFB [46] and ConvGRU [9] modules to encode the spatiotemporal information to detect the abnormality of facial depth. However, these normal deep learning based methods with both binary-supervision and depth-supervision depend on a large amount of training data and will lose effectiveness on newly evolved attacks.

3 Proposed Approach and Benchmarks

In this work, compared with the strategies of traditional face anti-spoofing methods which simply train deep models by exploiting the predefined living and spoofing faces, we prefer a learning-to-learn strategy via meta-learning. In this section, we successively introduce the details of our approach and the few-shot anti-spoofing benchmarks based on the existing public datasets.

Figure 2: Workflow of Meta-FAS. We construct general face anti-spoofing datasets into tasks. The meta-learner inner-updates itself based on the Support of train set and outer-updates itself based on the Query of train set in the Train Phase. It becomes a FAS model in the Test Phase after inner-updating itself based on the Support of test set and then evaluates performance on the Query of test set.

3.1 Face Anti-spoofing via Meta-learning

As mentioned above, the face anti-spoofing model should have the capability to learn quickly and to detect the new attacks with limited samples accurately. In other words, the face anti-spoofing model should own the ability of learning from few data. In this paper, we propose a novel framework: Meta Face Anti-spoofing (Meta-FAS). The common style of Meta-FAS is shown in Fig. 2. Given few data of the Support, the meta-learner of Meta-FAS inner-updates itself accurately and becomes a FAS model which performs well on the Query. Based on this framework, we propose two methods: Meta Face Anti-spoofing for Classification Supervision (Meta-FAS-CS) and Meta Face Anti-spoofing for Depth Regression (Meta-FAS-DR). Our experimental results reveal that our methods perform well on the few-shot PAD problem.

3.1.1 Meta-FAS-CS

Figure 3: (a) Network structure of Meta-FAS-CS which aims to train a meta-learner through classification label. (b) Network structure of Meta-FAS-DR which aims to train a meta-learner through depth label.

In practice, face anti-spoofing is commonly treated as a binary classification problem, which aims to discriminate spoofing face from living face. Therefore, we propose Meta-FAS-CS to solve the few-shot face anti-spoofing problem in the manner of classification via meta-learning.

In order to adapt to the meta-learning based approach, we organize the face anti-spoofing datasets in a hierarchical structure. Initially, we divide the dataset into coarse way (CW) which contains two ways – spoofing and living faces. Secondly, we divide spoofing faces and living faces into fine way (FW) in a more fine-grained pattern, i.e., different attacking types, input camera devices, and sessions. According to the above manner, the dataset can be presented as , where each sample is associated with two labels: a FW label and a CW label . Here, denotes the set of all the FW, and denotes the set of all the CW. It should be noted that we take FW label as strong supervision to train the meta-learner, and the CW label is mainly used to calculate the performance.

Other general meta-learning methods, which are tested on MiniImagenet [37] or Omniglot [24] dataset, focus on the problem of general image classification and are indifferent to the choice of specific class in each task. However, Meta-FAS-CS is a meta-learning based face anti-spoofing method. Thus spoofing and living classes are both necessary in the training/testing stage. In other words, a N-way K-shot task should comprise both living and spoofing way.

In our method, we utilize ResNet-10 [20] as the backbone, and all facial images are resized to 256x256 resolution. The framework of Meta-FAS-CS is shown in Fig. 3(a). The training process of Meta-FAS-CS method is shown in Algorithm.1, where and

denote the softmax loss function and the

FW label, respectively. In Algorithm.1, the meta-learner initializes itself with Xavier initialization method (Line 7), and inner-updates itself based on the Support (Line 11-12), and outer-updates itself by minimizing its loss on Query (Line 15).

input: Face anti-spoofing Dataset , Few-shot face anti-spoofing task list , learning rate , inner-update lr , number of positive ways per task n, number of ways per task N.
output: Meta-learner’s weight
2: for each task in the task list do
3:  sample n positive classes from
4:  sample N-n negative classes from
5:  sample Support and Query set for the task.
6: end
7: initialize
8: while not done do
9:  sample batch tasks
10:for each of do
11:   Ll(f(x),y)
12:   L()
13:   L l(f(x),y)
14:end
15: - L()
16: end

Algorithm 1 Meta Face Anti-Spoofing learning algorithm

3.1.2 Meta-FAS-DR

Depth-supervised face anti-spoofing methods take advantage of the discrimination between spoofing and living faces based on 3D shape, and provide more detail information for the face anti-spoofing model to perform better than classification supervised methods with regard to the planar attacks. Motivated by this, we integrate the anti-spoofing power of depth regression method into Meta-FAS, and propose Meta Face Anti-spoofing for Depth Regression (Meta-FAS-DR).

The network structure of Meta-FAS-DR is shown in Fig. 3

(b). There are three blocks cascaded, and all their features are concatenated into one tensor, and then the tensor goes through the fourth block to predict the facial depth. We can formulate the facial depth prediction process as Eq. 

1, where is the RGB facial image, is the predicted facial depth, and is the network’s weights.

(1)

We set D as “ground truth” facial depth. To distinguish spoofing face from living face, we set D as zero, and the facial depth of living face in normalization of [0, 1]. In this paper, the facial depth of living face is generated by the PRNet [16]

. We make the estimated sparse 3D facial shape tight through interpolation and then map and normalize the estimated 3D shape into a plane 2D image. We show the generated facial depth in Fig. 

4. In our experiments, all the facial depth maps are resized into 3232 resolution, and same as Meta-FAS-CS, all the input facial images are resized into 256256 resolution.

The loss function we use in Meta-FAS-DR is Contrastive Depth Loss (CDL) [46], which restricts the contrast between the pixel and its neighbors and activates the anti-spoofing model to learn the topography of each pixel. The CDL can be formulated as

(2)

where is the kernel of CDL, where i {0,1,2,3,4,5,6,7}.

Moreover, according to [36], prior knowledge is helpful to meta-learner. Therefore, different from Meta-FAS-CS, in Meta-FAS-DR, we initialize the meta-learner’s weight by pre-training it to learn prior knowledge about face anti-spoofing problem on the train set in the depth regression manner. In the following meta-training stage, same as Meta-FAS-CS, we train the meta-learner with Algorithm. 1. Benefiting from the pre-learned prior knowledge, the meta-learner will learn more quickly and effectively. Note that, in Meta-FAS-DR, and in Algorithm. 1 denote the CDL loss function and the “ground truth” facial depth label, respectively.

Figure 4: Living and spoofing faces, and the corresponding depth label used in our experiment. In CW, we adopt living vs. spoofing binary label. In FW, there are several subclasses of living face and spoofing face.

3.2 Few-shot Benchmarks for Face Anti-spoofing

In order to validate the effectiveness of our approach, we reorganize the existing face anti-spoofing datasets and propose the benchmarks with corresponding standard protocols in meta-learning-fashion. We propose three benchmarks specifically for the above three dimensions: MiniOULU, MiniCross, MiniSURF, and utilize one general face anti-spoofing benchmark SiW [27] for the zero-shot dimension, as shown in Tab. 1. The proposed benchmarks provide protocols for both traditional supervised learning based methods and meta-learning based methods. In addition, the few-shot anti-spoofing benchmarks we proposed, MiniOULU, MiniCross, and MiniSURF, will be publicly available later.

DataSet Domain Modal Supervised Learning Meta-Learning
# of vid. (V),
ima. (I)
Train Val Test Train Val Test
CW FW CW FW CW FW CW FW CW FW Support Query
CW FW CW FW
SiW [27] Single RGB 2 7/4/3 - - 2 7/2/6 - - - - - - - - 4620V
MiniOULU Single RGB 2 50/40/45 2 15/15/15 2 25/35/30 2 50/40/45 2 15/15/15 2 25/35/30 2 25/35/30 50400I
MiniCross Cross RGB 2 26/26 2 9/9 2 28/18 2 26/26 2 9/9 2 28/18 2 28/18 60803I
MiniSURF Single RGB;NIR;Depth 2 14 2 7 2 7 2 14 2 7 2 7 2 7 116047I
Table 1: The few-shot benchmarks we proposed and SiW [27]. Our benchmarks provide protocols for evaluation of both supervised learning based methods and meta-learning based methods. CW and FW denote the number of coarse ways and fine ways, respectively. “-” means not applicable. “/” separates different protocols.

3.2.1 Single-domain Benchmark

The single-domain benchmark aims to validate the performance of face anti-spoofing model in front of new PSAIs, such as new lighting, devices and presentation attacks. Oulu-NPU [7] is a widely used large-scale dataset for face anti-spoofing and contains a variety of PAs. Considering this, we develop a benchmark named as MiniOULU based on Oulu-NPU. As shown in Tab. 2, MiniOULU contains three protocols, and the FW of MiniOULU is composed of different devices, sessions and PAs. The protocol 1, 2, and 3 evaluate the performance of face anti-spoofing models among different sessions, PAs and devices, respectively.

Evaluation protocol The process of evaluating the performance on all protocols of this benchmark can be summarized as five steps: a) train a model/meta-learner in the way of supervised learning or meta-learning on the train set; b) select the model/meta-learner with the highest performance on the validation set; c) generate 100 few-shot learning tasks from test set; d) fine-tune/inner-update the model/meta-learner for each task by using the Support set, and calculate its performance on the Query set; e) average the performances of these 100 tasks to get the final evaluation.

For one subject (person), we usually obtain more kinds of images/videos of presentation attacks than those of living scene. Therefore, in order to imitate the real scenario, we take 4 spoofing ways and 1 living way in one task. Moreover, for each way, we randomly sample K samples as the Support set and randomly sample 15 samples as the Query.

Protocol Set Device Session Subjects PSAI FW
# Real/Spoofing Images
Prot.1 Train Phone1,2,4,5,6 Session1,2 1-20 Real1-3;Print1,2;Replay1,2 50 3000/12000
Val Phone3 Session1-3 21-35 Real1-3;Print1,2;Replay1,2 15 180/720
Test Support Phone1,2,4,5,6 Session3 36-40 Real1-3;Print1,2;Replay1,2 25 100/400
Test Query Phone1,2,4,5,6 Session3 41-55 Real1-3;Print1,2;Replay1,2 25 300/1200
Prot.2 Train Phone1,2,4,5,6 Session1-3 1-20 Real2,3;Print1;Replay1 40 3000/9000
Val Phone3 Session1-3 21-35 Real1-3;Print1,2;Replay1,2 15 180/720
Test Support Phone1,2,4,5,6 Session1-3 36-40 Real1;Print2;Replay2 35 100/600
Test Query Phone1,2,4,5,6 Session1-3 41-55 Real1;Print2;Replay2 35 300/1800
Prot.3 Train Phone4,5,6 Session1-3 1-20 Real1-3;Print1,2;Replay1,2 45 2700/10800
Val Phone3 Session1-3 21-35 Real1-3;Print1,2;Replay1,2 15 180/720
Test Support Phone1,2 Session1-3 36-40 Real1-3;Print1,2;Replay1,2 30 120/480
Test Query Phone1,2 Session1-3 41-55 Real1-3;Print1,2;Replay1,2 30 360/1440
Table 2: The few-shot face anti-spoofing benchmark for single-domain: MiniOULU.

3.2.2 Cross-domain Benchmark

The cross-domain benchmark is a more challenging benchmark than the single-domain. In this benchmark, we test the performance of face anti-spoofing model while encountering the emergence of new domains. In this part, we recognize different databases as different domains, including CASIA-MFSD [51], MSU-MFSD [47], SiW [27], MSU-USSA [34], 3DMAD [14], Oulu-NPU [7] and Replay-Attack [8]. We develop two protocols: one for few-way task is same as the normal form of few-shot learning protocols; the other for many-way task provides a great number of samples which are more beneficial to the supervised learning methods intuitively. We name this benchmark as MiniCross, where FW is composed of different PSAIs from different domains. More details can refer to Tab. 3.

Evaluation protocol The process of evaluating the performance on protocol 1 of MiniCross is similar to that of MiniOULU. In protocol 2 of MiniCross, we also train the model on the train set. However, we only test the model on a single task with 18-way. In this task, we randomly sample K images per way as the Support, and all the other images as the Query, and the process of evaluating the performance of the model is the same as the protocol 1.

Protocol Set Domains PSAI FW
# Real/Spoofing Images
Prot.1 Train CASIA-MFSD, MSU-MFSD, SiW Real1-7;Print1-4;Replay1-7 26 5151/13798
Val MSU-USSA Real1;Print1-2;Replay1-6 9 1040/8066
Test Support 3DMAD, Oulu-NPU, Replay-Attack Real1-9;3D Mask;Print1-3;Replay1-4 28 304/584
Test Query 3DMAD, Oulu-NPU, Replay-Attack Real1-9;3D Mask;Print1-3;Replay1-4 28 593/1271
Prot.2 Train CASIA-MFSD, MSU-MFSD, SiW Real1-7;Print1-4;Replay1-7 26 5151/13798
Val MSU-USSA Real1;Print1-2;Replay1-6 9 1040/8066
Test Support Oulu-NPU Real1-6;Print1-2;Replay1-2 18 30/60
Test Query Oulu-NPU Real1-6;Print1-2;Replay1-2 18 480/960
Table 3: The few-shot face anti-spoofing benchmark for cross-domain: MiniCross.

3.2.3 Cross-modal Benchmark

The third benchmark is the most challenging one in our study for few-shot face anti-spoofing, where the model is challenged by the problem of emergence of new modality. CASIA-SURF is a large-scale multi-modal dataset and each sample contains 3 modalities (i.e., RGB, Depth and IR) so that CASIA-SURF is suitable for us to address this dimension. We extract several samples from CASIA-SURF and develop a cross-modal benchmark named MiniSURF as shown in Tab. 4. In this benchmark, the train set contains RGB and Depth modalities, and the test/validation set contains IR modality. In the training stage, the model is trained on the train set, while in the test/validation stage, the model is tested on the test/validation set with a new never seen modality: IR modality. Based on MiniSURF, we can test the model’s ability of learning fast from the new modality. The FW of MiniSURF consists of different modalities and PSAIs as shown in Tab. 4.

Evaluation protocol The process of evaluating the performance on MiniSURF is the same as on MiniOULU.

Protocol Set Modals PSAI FW
# Real/Spoofing
Images
Prot.1 Train
RGB;Depth
Real1;Print1;Cut1-5
14 17902/81106
Val IR
Real1;Print1;Cut1-5
7 3000 /13341
Test Support IR
Real1;Print1;Cut1-5
7 20/120
Test Query IR
Real1;Print1;Cut1-5
7 78/480
Table 4: The few-shot face anti-spoofing benchmark for cross-modal: MiniSURF.

3.2.4 Zero-shot Benchmark

We consider that the face anti-spoofing models should be able to solve zero-shot face anti-spoofing problem, and the meta-learner of Meta-FAS should achieve a comparable performance on the examples of never seen PSAIs. In other words, Meta-FAS should also be capable of not only learning spoofing clues from few data, but also detecting spoofing faces of the never seen PSAIs. Therefore, we utilize the SiW [27] dataset as the zero-shot benchmark.

Evaluation protocol for meta-learning The process of evaluating the performance for a meta-learning based method on SiW is described as following: 1) pre-train the model on the train set of each protocol; 2) select the best pre-trained weights to initialize the meta-learner; 3) train the meta-learner on the training tasks generated from the train set with Algorithm. 1; 4) update the meta-learner from the last task of train set and obtain an anti-spoofing model; 5) test performance of the model on each protocol of SiW.

4 Experiments

4.1 Experimental Setup

Benchmarks In our experiment, we evaluate our Meta-FAS’s few-shot learning performance on multiple benchmarks. We utilize MiniOULU, MiniCross and MiniSURF as few-shot learning benchmarks. Also, we select SiW [27] as the zero-shot benchmark.

Metrics We follow the general face anti-spoofing metrics. They are: 1) Attack Presentation Classification Error Rate , which evaluates the highest error among all PAs (e.g. print or display); 2) Bona Fide Presentation Classification Error Rate , which evaluates error of real access data; 3)  [21], which evaluates the mean of and . The model updates itself based on the Support, and then tests on the Query to get the ACER. The average ACER of 100 tasks is defined as:

(3)

where denotes the number of all testing tasks. The final result of

with 95% confidence interval of all testing tasks is marked as:

(4)

where

is the standard deviation of

of the test tasks. We follow this setting in most of the protocols except protocol 2 of MiniCross, where Query contains a great number of samples more than 15.

Comparison methods We use supervised learning approaches as the baseline, consisting of binary classification and depth regression methods. In the category of binary classification method, both AlexNet [23] and ResNet-10 [20] are set as backbone in our protocols. Considering that FAS-TD [46] is currently the state-of-the-art depth regression based face anti-spoofing method, we utilize the single-frame part of FAS-TD, named as FAS-TD-SF, as the corresponding baseline in our experiment.

4.2 Implementation Details

Training setting In the training phase, we train the models with the binary softmax loss for the supervised learning based method ResNet-10 and AlexNet, CDL for FAS-TD-SF, respectively.

For Meta-FAS method, we train the meta-learner on training tasks which are generated from the train set. The detail of the meta-training process is shown in Algorithm. 1. For each N-way K-shot task, we randomly sample K images as the Support and 15 images as the Query, respectively.

Testing setting In the testing phase, for both the the supervised learning methods and Meta-FAS, there are 100 5-way 1-shot/5-shot testing tasks generated from the test set, and for each task, we randomly sample 15 images as the Query in all protocols except the MiniCross-protocol 2.

For each method, we select the trained model/meta-learner which performs best on the validation set. In terms of each task, the supervised learning methods fine-tune the model by using the Support, and calculate the ACER as Eq. 3 on the Query. In contrast, the meta-learner trained with Meta-FAS inner-updates itself based on the Support, and turns into a FAS-model, and obtains its ACER on the Query. It should be noted that, in all our experiments, the meta-learner is trained/tested on 5-way tasks. While, as the goal of FAS model is to detect the spoofing faces, we report the ACER in a 2-way manner with the CW label.

Hyperparameter setting

We implement Meta-FAS based on Tensorflow 

[1] library. We set the learning rate , the batch size and the inner-update learning rate to 1e-3, 5 and 1e-2 for Meta-FAS-CS. And we set , batch size and to 1e-4, 4 and 1e-3 for Meta-FAS-DR. Further, for both Meta-FAS-CS and Meta-FAS-DR, we set the number of positive ways per task n to 1, and the number of training and testing tasks to 20,000 and 100, and the number of inner-updates to 5 in meta-training stage and 20 in meta-testing stage.

4.3 Ablation Study

We conduct ablation experiments in two main aspects: 1) the effect of each individual module; 2) the effect of the number of way and shot.

Tab. 5 demonstrates that pre-trained module and CDL are both effective in our framework, and the combination of them can boost the effectiveness of models greatly.

In Fig. 5(a), when we fix the number of way and increase the number of shot, these curves demonstrate the trend that ACER will decrease when the number of shot increases for almost all models. Obviously, the performance of Meta-FAS-DR keeps the best one through the shot number 1 to 7. Fig. 5(b) shows that when we fix the number of training samples (15 here) in one task, the number of way doesn’t affect the performance too much.

Model Pre-trained Module Contrastive Depth Loss ACER(%)
Model 1 5.261.46
Model 2 4.141.30
Model 3 4.321.52
Model 4 1.10.47
Table 5: The 5-shot results of Meta-FAS-DR for ablation study on protocol 1 of MiniCross.
Figure 5: The results of ablation study on the influence of different settings of shot and way on protocol 1 of MiniCross

4.4 Single-domain Experiments

Tab. 6 shows the results of three protocols of MiniOULU. We can see that Meta-FAS-DR outperforms the state-of-the-art supervised learning based methods in all protocols. Compared with FAS-TD-SF, the ACER of Meta-FAS-DR decreases by 8%, 34%, and 45% in 1-shot tasks of protocol 1, 2, and 3, respectively, and decreases by 63%, 50%, and 71% in 5-shot tasks. For classification based methods, although Meta-FAS-CS is inferior to FAS-TD-SF, it improves the performance compared with its baseline – ResNet-10 by decreasing ACER by approximately more than 50% at all kinds of tasks and all protocols. This demonstrates the superiority of our meta-learner in the few-shot face anti-spoofing problem in single domain.

Prot. Method ACER(%)
1-shot 5-shot
1 AlexNet [23] 19.930.8 16.211.04
ResNet-10 [20] 18.680.92 14.380.88
FAS-TD-SF [46] 2.190.74 1.310.45
Meta-FAS-CS(Ours) 9.382.21 6.751.35
Meta-FAS-DR(Ours) 2.010.63 0.490.28
2 AlexNet [23] 15.321.21 13.270.93
ResNet-10 [20] 13.511.19 10.240.87
FAS-TD-SF [46] 3.570.79 2.070.48
Meta-FAS-CS(Ours) 4.540.89 4.370.85
Meta-FAS-DR(Ours) 2.350.76 1.030.42
3 AlexNet [23] 10.131.04 8.540.82
ResNet-10 [20] 10.321.04 7.460.68
FAS-TD-SF [46] 4.221.23 1.680.53
Meta-FAS-CS(Ours) 4.630.93 3.410.93
Meta-FAS-DR(Ours) 2.300.59 0.490.32
Table 6: The results of single-domain experiment on three protocols of MiniOULU.

4.5 Cross-domain Experiments

Results of two protocols of MiniCross are shown in Tab. 7. Protocol 1 evaluates the performance of models in a normal few-shot and few-way manner. The ACER of Meta-FAS-DR decreases by 55% in 1-shot tasks and 87% in 5-shot tasks compared with that of FAS-TD-SF. Performance of Meta-FAS-CS is much lower than that of Meta-FAS-DR, and improves only 2% in 1-shot tasks and 10% in 5-shot tasks in contrast to the baseline of classification methods, indicating that Meta-FAS-DR is much better in the cross-domain scenario. Protocol 2 consists of many ways in one task as well as many samples when fine-tuning, which is beneficial for the supervised learning based methods. In this protocol, we can see that supervised learning based method FAS-TD-SF performs better because of more images provided in the fine-tuning phase. Besides, our Meta-FAS-DR still works and outperforms FAS-TD-SF in this situation. Results of both protocols demonstrate the effectiveness of our Meta-FAS in the few-shot cross-domain scenario.

Prot. Method ACER(%)
1-shot 5-shot
1 AlexNet [23] 18.41.57 14.041.34
ResNet-10 [20] 18.021.08 14.241.67
FAS-TD-SF [46] 9.492.53 8.411.09
Meta-FAS-CS(Ours) 15.302.72 10.851.39
Meta-FAS-DR(Ours) 4.230.86 1.10.47
2 AlexNet [23] 26.41 25.62
ResNet-10 [20] 21.09 17.94
FAS-TD-SF [46] 12.55 12.18
Meta-FAS-CS(Ours) 17.83 16.06
Meta-FAS-DR(Ours) 12.23 11.71
Table 7: The results of cross-domain experiment on two protocols of MiniCross.

4.6 Cross-modal Experiments

MiniSURF is the most challenging benchmark in our experiment. Spoofing clues in different modalities (RGB, depth and IR) appear exceedingly different. In Tab. 8, we can see that our Meta-FAS-DR improves the performance by decreasing 10.5 and 6.0 for ACER(%) of two protocols compared with that of the SOTA method. Moreover, Meta-FAS-CS obtain the best ACER of 5-shot tasks among all methods. The results prove that Meta-FAS-CS and Meta-FAS-DR methods perform better than the supervised learning based methods with a large margin, and reveal that our methods enable models to learn how to learn quickly to detect the spoofing clues rather than only learn to discriminate the living from spoofing faces.

Prot. Method ACER(%)
1-shot 5-shot
1 AlexNet [23] 45.280.84 44.260.85
ResNet-10 [20] 45.621.43 41.960.86
FAS-TD-SF [46] 31.371.15 31.381.06
Meta-FAS-CS(Ours) 32.492.38 22.532.35
Meta-FAS-DR(Ours) 20.851.08 25.351.11
Table 8: The results of cross-modal experiment on MiniSURF.

4.7 Zero-shot Experiments

Prot. Method APCER(%) BPCER(%) ACER(%)
1 FAS-BAS [27] 3.58 3.58 3.58
FAS-TD-SF [46] 1.27 0.83 1.05
FAS-TD-SF-CASIA-SURF [40] 1.27 0.33 0.80
Meta-FAS-DR(Ours) 0.52 0.50 0.51
2 FAS-BAS [27] 0.570.69 0.570.69 0.570.69
FAS-TD-SF [46] 0.330.27 0.290.39 0.310.28
FAS-TD-SF-CASIA-SURF [40] 0.080.17 0.250.22 0.170.16
Meta-FAS-DR(Ours) 0.250.32 0.330.27 0.290.28
3 FAS-BAS [27] 8.313.81 8.313.80 8.313.81
FAS-TD-SF [46] 7.703.88 7.764.09 7.733.99
FAS-TD-SF-CASIA-SURF [40] 6.274.36 6.434.42 6.354.39
Meta-FAS-DR(Ours) 7.984.98 7.355.67 7.665.32
Table 9: The results of the zero-shot experiment on three protocols of SiW [27].

We utilize SiW [27] dataset as the zero-shot benchmark. Tab. 9 shows the comparison of zero-shot capability between our method and other methods  [27, 46, 40].The superiority of our method on these three protocols demonstrates the zero-shot capability.

5 Conclusion

In this paper, we redefine the face anti-spoofing as a few-shot learning issue with evolving new attacks. To address this issue, we develop Meta-FAS to train the models via meta-learning. In this way, our well-designed models perform well in the few-shot face anti-spoofing benchmarks we proposed. The more distinguishable the train set and test set is, the larger improvement our method makes. We further propose zero-shot PAD as a boundary condition of our proposed few-shot face anti-spoofing concepts and validate that Meta-FAS still works under this condition. In the future research, we will discuss more boundary conditions of this issue, such as many-way problem, and exploit more excellent frameworks to overcome the few-shot challenges in face anti-spoofing.

References

  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al.

    Tensorflow: a system for large-scale machine learning.

    In OSDI, volume 16, pages 265–283, 2016.
  • [2] Y. Atoum, Y. Liu, A. Jourabloo, and X. Liu. Face anti-spoofing using patch and depth-based cnns. In IJCB, pages 319–328, 2017.
  • [3] S. Bengio, Y. Bengio, J. Cloutier, and J. Gecsei. On the optimization of a synaptic learning rule. In

    Preprints Conf. Optimality in Artificial and Biological Neural Networks

    , pages 6–8. Univ. of Texas, 1992.
  • [4] Y. Bengio, S. Bengio, and J. Cloutier. Learning a synaptic learning rule. Université de Montréal, Département d’informatique et de recherche opérationnelle, 1990.
  • [5] Z. Boulkenafet, J. Komulainen, and A. Hadid. Face spoofing detection using colour texture analysis. IEEE Transactions on Information Forensics and Security, 11(8):1818–1830, 2016.
  • [6] Z. Boulkenafet, J. Komulainen, and A. Hadid.

    Face antispoofing using speeded-up robust features and fisher vector encoding.

    IEEE Signal Processing Letters, 24(2):141–145, 2017.
  • [7] Z. Boulkenafet, J. Komulainen, L. Li, X. Feng, and A. Hadid. Oulu-npu: A mobile face presentation attack database with real-world variations. In FGR, pages 612–618, 2017.
  • [8] I. Chingovska, A. Anjos, and S. Marcel. On the effectiveness of local binary patterns in face anti-spoofing.
  • [9] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science, 2014.
  • [10] T. de Freitas Pereira, A. Anjos, J. M. De Martino, and S. Marcel. Lbp- top based countermeasure against face spoofing attacks. In ACCV, pages 121–132, 2012.
  • [11] T. de Freitas Pereira, A. Anjos, J. M. De Martino, and S. Marcel. Can face anti-spoofing countermeasures work in a real world scenario? In ICB, pages 1–8, 2013.
  • [12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In

    Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on

    , pages 248–255. Ieee, 2009.
  • [13] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
  • [14] N. Erdogmus and S. Marcel. Spoofing in 2d face recognition with 3d masks and anti-spoofing with kinect.
  • [15] L. Feng, L.-M. Po, Y. Li, X. Xu, F. Yuan, T. C.-H. Cheung, and K.-W. Cheung. Integration of image quality and motion cues for face anti-spoofing: A neural network approach. Journal of Visual Communication and Image Representation, 38:451–460, 2016.
  • [16] Y. Feng, F. Wu, X. Shao, Y. Wang, and X. Zhou. Joint 3d face reconstruction and dense alignment with position map regression network. In CVPR, 2017.
  • [17] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.
  • [18] J. Gan, S. Li, Y. Zhai, and C. Liu.

    3d convolutional neural network based on face anti-spoofing.

    In ICMIP, pages 1–5, 2017.
  • [19] E. Grant, C. Finn, S. Levine, T. Darrell, and T. Griffiths. Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930, 2018.
  • [20] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  • [21] international organization for standardization. Iso/iec jtc 1/sc 37 biometrics: Information technology biometric presentation attack detection part 1: Framework. In https://www.iso.org/obp/ui/iso, 2016.
  • [22] J. Komulainen, A. Hadid, and M. Pietikainen. Context based face anti-spoofing. In BTAS, pages 1–8, 2013.
  • [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. pages 1097–1105, 2012.
  • [24] B. Lake, R. Salakhutdinov, J. Gross, and J. Tenenbaum. One shot learning of simple visual concepts. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 33, 2011.
  • [25] L. Li, X. Feng, Z. Boulkenafet, Z. Xia, M. Li, and A. Hadid. An original face anti-spoofing approach using partial convolutional neural network. In IPTA, pages 1–6, 2016.
  • [26] Z. Li, F. Zhou, F. Chen, and H. Li. Meta-sgd: Learning to learn quickly for few shot learning. arXiv preprint arXiv:1707.09835, 2017.
  • [27] Y. Liu, A. Jourabloo, and X. Liu. Learning deep models for face anti-spoofing: Binary or auxiliary supervision. In CVPR, pages 389–398, 2018.
  • [28] O. Lucena, A. Junior, V. Moia, R. Souza, E. Valle, and R. Lotufo. Transfer learning using convolutional neural networks for face anti-spoofing. In International Conference Image Analysis and Recognition, pages 27–34, 2017.
  • [29] J. Mtt, A. Hadid, and M. Pietikinen. Face spoofing detection from single images using micro-texture analysis. In IJCB, pages 1–7, 2011.
  • [30] N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. A simple neural attentive meta-learner. 2018.
  • [31] C. Nagpal and S. R. Dubey. A performance evaluation of convolutional neural networks for face anti spoofing. arXiv preprint arXiv:1805.04176, 2018.
  • [32] A. Nichol, J. Achiam, and J. Schulman. On first-order meta-learning algorithms. 2018.
  • [33] K. Patel, H. Han, and A. K. Jain. Cross-database face antispoofing with robust feature representation. In Chinese Conference on Biometric Recognition, pages 611–619, 2016.
  • [34] K. Patel, H. Han, and A. K. Jain. Secure face unlock: Spoof detection on smartphones. IEEE transactions on information forensics and security, 11(10):2268–2283, 2016.
  • [35] B. Peixoto, C. Michelassi, and A. Rocha. Face liveness detection under bad illumination conditions. In ICIP, pages 3557–3560. IEEE, 2011.
  • [36] Y. Qin, W. Zhang, C. Zhao, Z. Wang, H. Shi, G. Qi, J. Shi, and Z. Lei. Rethink and redesign meta learning. CoRR, abs/1812.04955, 2018.
  • [37] S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. 2016.
  • [38] J. Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992.
  • [39] R. Shao, X. Lan, and P. C. Yuen. Deep convolutional dynamic texture learning with adaptive channel-discriminability for 3d mask face anti-spoofing. In IJCB, pages 748–755, 2017.
  • [40] Z. Shifeng, X. Wang, A. Liu, C. Zhao, J. Wan, S. Escalera, H. Shi, Z. Wang, and S. Z. Li. Casia-surf: A dataset and benchmark for large-scale multi-modal face anti-spoofing. arXiv preprint arXiv:1812.00408, 2018.
  • [41] J. Snell, K. Swersky, and R. Zemel. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, pages 4077–4087, 2017.
  • [42] X. Tan, Y. Li, J. Liu, and L. Jiang. Face liveness detection from a single image with sparse low rank bilinear discriminative model. In ECCV, pages 504–517, 2010.
  • [43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
  • [44] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pages 3630–3638, 2016.
  • [45] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang. Residual attention network for image classification. arXiv preprint arXiv:1704.06904, 2017.
  • [46] Z. Wang, C. Zhao, Y. Qin, Q. Zhou, and Z. Lei. Exploiting temporal and depth information for multi-frame face anti-spoofing. arXiv preprint arXiv:1811.05118, 2018.
  • [47] D. Wen, H. Han, and A. K. Jain. Face spoof detection with image distortion analysis.
  • [48] Z. Xu, S. Li, and W. Deng. Learning temporal features using lstm-cnn architecture for face anti-spoofing. In ACPR, pages 141–145, 2015.
  • [49] J. Yang, Z. Lei, and S. Z. Li. Learn convolutional neural network for face anti-spoofing. Computer Science, 9218:373–384, 2014.
  • [50] J. Yang, Z. Lei, S. Liao, and S. Z. Li. Face liveness detection with component dependent descriptor. In ICB, page 2, 2013.
  • [51] Z. Zhang, J. Yan, S. Liu, Z. Lei, D. Yi, and S. Z. Li. A face antispoofing database with diverse attacks. In ICB, pages 26–31, 2012.

6 Supplementary Ablation Study

We supplement ablation experiments in three main aspects: 1) the effect of each individual meta-learning algorithm (MAML [17] and Meta-SGD [26]); 2) the effect of the number of task and query; 3) the effect of different backbone for supervised learning methods.

Tab. 10 demonstrates that MAML and Meta-SGD are both effective in our framework, and MAML based model performs better.

According to Fig. 6 and Fig. 7, when we fix the other parameters, the number of query and task has less impact on the performance. Obviously, the performance of Meta-FAS-DR keeps the best over the different settings of the number of query and task.

Tab. 11 demonstrates that we select ResNet-10 as our baseline method for classification supervision because its performance goes worse through our benchmarks when the network goes deeper.

Model MAML Meta-SGD ACER(%)
Meta-FAS-DR 1.10.47
Meta-FAS-DR 1.740.74
Meta-FAS-CS 10.851.39
Meta-FAS-CS 12.51.35
Table 10: The 5-shot results of Meta-FAS for ablation study on protocol 1 of MiniCross.
Figure 6: The results of ablation study on the influence of different settings of query on protocol 1 of MiniCross
Figure 7: The results of ablation study on the influence of different settings of task on protocol 1 of MiniCross
Prot. Method ACER(%)
1-shot 5-shot
1 ResNet-50 [20] 24.961.42 16.871.21
ResNet-18 [20] 20.721.14 15.551.02
ResNet-10 [20] 18.021.08 14.241.67
Table 11: The results of our baseline method for ablation study on protocol 1 of MiniCross.

7 Qualitative Analysis

Figure 8: The living/spoofing depth map generated by Meta-FAS-DR after different update steps.

Figure 8 presents the depth map generated by Meta-FAS-DR after different update steps. We supplement this part in order to show the capability acquired by meta-learning of fast learning to discriminate between living and spoofing faces. In Figure. 8, the facial depth under the number 0 is the meta-learner’s predicted facial depth before inner-updating, and the facial depth above/under the number n(n 0) is the meta-learner’s predicted facial depth after inner-updating n steps.

It is clear that the facial depths predicted by the meta-learner are not distinguishable enough before meta-learner inner-updates. The more the meta-learner inner-updates, the more distinguishable facial depths are predicted.