FaceSpoof Buster: a Presentation Attack Detector Based on Intrinsic Image Properties and Deep Learning

02/07/2019 ∙ by Rodrigo Bresan, et al. ∙ 0

Nowadays, the adoption of face recognition for biometric authentication systems is usual, mainly because this is one of the most accessible biometric modalities. Techniques that rely on trespassing these kind of systems by using a forged biometric sample, such as a printed paper or a recorded video of a genuine access, are known as presentation attacks, but may be also referred in the literature as face spoofing. Presentation attack detection is a crucial step for preventing this kind of unauthorized accesses into restricted areas and/or devices. In this paper, we propose a novel approach which relies in a combination between intrinsic image properties and deep neural networks to detect presentation attack attempts. Our method explores depth, salience and illumination maps, associated with a pre-trained Convolutional Neural Network in order to produce robust and discriminant features. Each one of these properties are individually classified and, in the end of the process, they are combined by a meta learning classifier, which achieves outstanding results on the most popular datasets for PAD. Results show that proposed method is able to overpass state-of-the-art results in an inter-dataset protocol, which is defined as the most challenging in the literature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The task of identifying a given individual by its physiological traits (e.g., face, iris or fingerprint) or behavioral patterns (e.g., keystroke dynamics, gait) is known as biometrics. Due to the major adoption of devices that rely on this kind of access, the development of new techniques that seek to impersonate a legitimate user increased significantly, adding up major challenges for security authentication systems. The process of attack a biometric system is known in the literature as presentation attack, but may also be referred as spoofing attack, and it consists in present to the acquisition sensor a synthetic biometric sample, containing the biometric pattern of a valid user, to authenticate itself as a legitimate user.

In this work, we present a new tool named FaceSpoof Buster, for detecting presentation attacks, without the needs of any extra hardware components (e.g., depth sensor, infrared sensor). Using different intrinsic properties from a given biometric sample, the presented method reach great results, in comparison to previous works in the literature, with recognizable results on the task of classification on an unseen dataset, commonly known as inter-dataset evaluation.

Our hypothesis is based on the fact that, by extracting these intrinsic properties, such as depth, illumination, and saliency, we may obtain telltales that may reveal additional information about the authenticity from a given biometric sample.

Combining a Convolutional Neural Networks (CNN) and the transfer learning process, we are able to extract robust and discriminative features, which are combined with SVM classifiers in a two step classification process, to perform the detection of attack samples. Our method outperforms many existing approaches proposed for face presentation attack detection (PAD) problem, with major emphasis on challenging tasks, such as the inter-dataset evaluation.

We can summarize our main contributions as follows: (1) proposition of a new method for face Presentation Attack Detection (PAD), named FaceSpoof Buster, which is based in a combination between intrinsic image properties and deep neural networks; (2) evaluation of different intrinsic properties (e.g., saliency, depth and illumination maps) for the problem of PAD, which to the best of our knowledge, have never been evaluated in this context; (3) an HTER that over overpass literature results in both inter and intra dataset protocol for different public datasets; (4) effective application of a previously trained CNN in a PAD context.

2 Related Works

Accordingly to Pan et al. (2008), the techniques for presentation attack detection can be grouped into four major groups: user behavior modeling, data-driven characterization, user cooperation and hardware-based.

The techniques based on the first approach aims of recognizing presentation attacks by modeling the user’s behavior such as head movements and eye blinking. Data-driven techniques are based on finding artifacts of attempted attacks by exploiting the data that came from a standard acquisition sensor. User cooperation based techniques seek to detect presentation attacks based on the interaction between the user and the authentication system, such as asking the user to execute some movement. Finally, there are techniques that use extra hardware, such as depth sensors and infrared cameras, in order to obtain more information about the scenario and thus be able to find cues that may reveal an attempted attack. Since this work focus on data-driven techniques, the rest of this section will be focused on this kind of methods.

Schwartz et al. (2011) presented an anti-spoofing method by exploring the use of several visual descriptors for characterizing facial characteristics in terms of its color, texture, and shape properties. To deal with the high dimensionality of the final representation, the authors proposed the use of the Partial Least Squares (PLS) classifier, a statistical approach for dimensionality reduction and classification, which was designed to distinguish a genuine biometric sample from a fraudulent one.

d. S. Pinto et al. (2012) proposed a data-driven method for video PAD based on Fourier analysis of the residual noise signature extracted from the input videos. The use of well-known texture feature descriptors, such as Local Binary Patterns was also considered in the literature by Määttä et al. (2011), which focuses on detecting micro-texture patterns that are added into the fake biometric samples during the acquisition process. Approaches based on Differences of Gaussian (DoG) (Peixoto et al., 2011a; Tan et al., 2010) and Histogram of Oriented Gradients (HOG) (Komulainen et al., 2013; Yang et al., 2013) were also proposed, but at the cost of the results being affected by illumination conditions and the capture sensor, due to their nature.

Yeh and Chang (2018) proposed an effective approach againt face presentation attacks, based on perceptual image quality assessment, by adopting a Blind Image Quality Evaluatior (BIQE) along with a Effectivate Pixel Similary Deviation (EPSD), in order to generate new features to use on a multi-scale descriptor, showing it’s efficacy when compared to previous works.

3 Proposed Method

This section provide details about proposed method and each step of proposed framework. First step of proposed method will perform the frame extraction from the videos, followed by the extraction of intrinsic properties maps for each frame. Together, these maps represent specific properties (depth, illumination, and saliency) from a video along time.

Then, we use a Convolutional Neural Network (CNN), specifically ResNet-50 (He et al. (2015)), as a feature extractor in a way to encode properties from previously encoded maps. This process also takes advantage of a transfer learning (Yosinski et al. (2014)

) process, which uses previous ImageNet weights, since the number of samples to train our network from scratch is very restricted. Features extracted at this step are named bottleneck features.

Once encoded, bottleneck features are classified using an SVM classifier, which provide confidence degree for each frame in an input video. These confidence scores are used in a final stage, which perform a meta-learning process, to train a new SVM classifier which combines information from illumination, depth and salience maps, resulting in a new artifact that will be referred to as probabilities vector. This new probabilities vector artifact is then fed to our second classifier, which is responsible for the final prediction for the tested samples. Fig. 

1 depicts an overview of full proposed method.

Figure 1: Overview of the proposed method. Initially, we represent intrinsic properties by using different kinds of maps, followed by the bottleneck feature extraction using the ResNet50 architecture; Followed by this, we perform the prediction of the previously extracted features, followed by the prediction of the probabilities; lastly, we perform the classification of the stacked probabilities, in order to generate the final predictions, to decide whether a given facial biometric sample is authentic or not.

3.1 Frame extraction from videos

Most of the benchmarks available for the face presentation attack detection problem are collections of videos, and for these datasets, we first need extracts the frames of each video, since the intrinsic property computation and the classification stage is performed upon images. For this step, we perform a subsampling by extracting 10 frames per second from each video.

3.2 Depth Maps

Due to the fact of presentation attacks being frequently reproduced over a flat surface, such as a sheet of paper with a printed face, or over a tablet reproducing a valid access, we believe that the depth estimation from a given biometric sample can provide relevant information about its authenticity, once that when presented with a flat surface, the estimated depth map should differ from real face.

Our method estimates depth maps using the approach proposed by Godard et al. (2017)

, which uses stereo images and a fully convolutional deep neural network associated with a modified loss function to estimates image depth. As in feature extraction step, described in Section 

3.5, here we also take advantage of transfer learning methodology, transferring weights from Godard et al. (2017) method’s to our estimator.

3.3 Illumination Maps

In digital forensics field, illumination inconsistencies have been constantly used to detect image forgeries (Carvalho et al. (2016); d. Carvalho et al. (2013)). The hypothesis of these works is that illumination is an important clue and very difficult to fake in authentication context.

Inspired by d. Carvalho et al. (2013) approach, we also use illuminant maps to encode illumination information into PAD context. Our hypothesis is that generated illumination maps from a real face will show differences on its reflection when compared to the generated illumination map from a flat surface. This is similar to our hypothesis for the depth maps, but in this case, focusing on the fact that a real human face shows a surface that reflects in a different manner than a tablet or a sheet of paper.

Based on the work proposed by Tan et al. (2008), our method take advantage of Inverse Intensity-Chromaticity Space for estimating illuminant maps from a single image.

3.4 Saliency Maps

As in depth and illumination cases, our method also takes advantage of saliency information to encode valuable information into our method. Again, our hypothesis is that flat objects used in PAD will spoil quality in saliency estimation.

Our saliency maps estimation is based on Zhu et al. (2014) which have two major steps: (1) a background modeling using boundary connectivity, which characterizes the spatial layout of image regions with respect to image boundaries; (2) a principled optimization framework to integrate multiple low level cues, including proposed background measure.

3.5 Bottleneck Features Extraction via ResNet50

Once our intrinsic properties are estimated, we perform an alignment at eye’s level on all of our frames and their property maps, followed by a crop on the face region. The purpose of this extra step in to have all the frames normalized with the same alignment.

Then, our method takes advantage of transfer learning process Yosinski et al. (2014), in a way to avoid the process of laborious handcraft feature extraction. We choose ResNet50 (He et al. (2015)), a robust, well known and effective CNN architecture, associated with ImageNet weights, to extract features from previously generated maps. Removing top layer, we use ResNet50 architecture as a feature extractor, which provides feature vectors commonly known as bottleneck features.

As final output of this step, a feature vector of 2,048 dimensions will be generated, which we will be later on referred as bottleneck feature vector.

3.6 Classification

Adopting a two-step classification pipeline, in which the first classifier is used for feature vectors classification, while the latter one is used for classifying the probabilities generated by the first one, our method shows a major benefit when compared to previous works. By allowing us to stack together many intrinsic properties from a given frame, it’s also possible to make use of additional information that may contribute to the task of PAD.

3.6.1 Bottleneck Vectors Classifier

For the task of bottleneck vectors classification, the adoption of a Support Vector Machine (SVM) 

(Nasrabadi, 2007) classifier was made, due to its robustness in the task of binary classification when using multiple features. Given a bottleneck feature vector, our classifier returns for each frame, the probability of that frame being an attack or not.

3.6.2 Probability Feature Vector Assembly

Given an input video , which already have intrinsic properties estimated, composed by frames , and where denotes the intrinsic property extracted from the video (D, I, S). In previous step (bottleneck vector classifier), we calculated the probability of each frame belonging to a class or another, denoted by .

Using a fusion based approach, we combine information from all intrinsic image properties in a way to use all these information together, resulting in a probability feature vector () defined by

(1)

where is given by

(2)

3.6.3 Probabilities Classifier

As a final step of our classification pipeline, we proceed to feed our vectors into our second classifier, which will be later on referred as probabilities classifier. For this task, another SVM classifier was selected. As an output of this classification, we will have the final classification for each video.

4 Experiments and Results

For evaluation of the proposed method, different rounds of experiments were performed using three public anti-spoofing datasets, which contained samples from real accesses and from presentation attacks. The adoption of protocols focusing on intra-dataset evaluation, where one dataset is tested within the same scenario was performed by following the protocols suggested by datasets’ creators. Evaluation between different datasets scenarios, commonly known as inter-dataset, was also conducted, in order to assess the performance of our method in unknown scenarios.

It is also paramount to realize that, since we are interested in evaluate the efficiency of each intrinsic property separately, final results reported for depth, illumination and saliency reflects a majority vote process among all the frames classified by the first classifier (without probabilities features classification).

4.1 Datasets

In order to address the efficiency of the proposed method, three publicly available anti-spoofing datasets were selected. The criteria for selection of these datasets among many others available was due to their major adoption in previous works that tackle PAD.

4.1.1 Replay-Attack

Consisting of 1300 video clips from both photo and video attacks from 50 subjects, the Replay-Attack dataset shows itself as a reliable dataset for the evaluation of our method, once it is presented with different lighting and environment conditions (Chingovska et al. (2012)). In this dataset, three different types of attack are provided: print attacks, reproduced by using a printed paper with the face of a legitimate user; mobile attacks, reproducing a valid user access over a mobile phone; and video attacks, similarly to the mobile attacks, but reproduced with a tablet. The Replay-Attack dataset is separated into three subsets: training set (containing 360 videos); development set (containing 360 videos); testing set (containing 480 videos); and enrollment set (containing 100 videos);

4.1.2 Casia-Fasd

The CASIA-FASD dataset proposed by Zhang et al. (2012) contains a total amount of 600 videos from 50 different subjects, created with the purpose of providing samples from many of the existent types of presentation attacks. The videos are presented in twelve different scenarios, where each of them is composed by three genuine accesses and three fraudulent from the same person. Three different resolutions were used to capture (low, normal and high), along with three different types of attack (normal, printed attacks, printed and warped, printed with cut on the eyes region and video-based attacks).

4.1.3 NUAA Photograph Imposter Dataset

The NUAA Photograph Imposter Dataset (Tan et al. (2010)) is composed of 15 subjects, comprising a total of 5,105 valid access images and 7,509 presentation attacks collected through a generic webcam at 20 fps with a resolution of 640 x 480 pixels. The subjects were captured over three sections in different places and lighting conditions. The production of the attack samples were made by shooting a high resolution photograph with a Canon digital camera.

4.2 Experimental Protocols

In order to assess the performance of our method, we proceed the experiments by using two different protocols. The first one, consists in evaluates proposed method inside the same anti-spoofing dataset, which is commonly known as intra-dataset evaluation. The second one was conducted in order to address the efficacy of our method when tested on another dataset, commonly being referred as inter-dataset or cross-dataset evaluation. This later one is the most challenging in the literature, due to the differences in scenery that one dataset shows from another one.

For the intra-dataset evaluation, the authors dataset protocols were followed, in order to guarantee a better understanding of how our method performs when compared to other works.

As for the evaluation of our method on a previously unseen dataset, which is called inter-dataset, we followed the same protocols as defined by other previous works, where one database was used as train set, and another one was used as test set. For this evaluation, we also performed the combination between multiple datasets, in order to create a model that comprises characteristics from diverse scenarios and sensors, allowing us to generalize our model to better perform on new datasets.

4.3 Experimental Setup

The parameters configurations used in this work are described all over this section, in a way to provide an easy way to reproduce the results obtained by our method.

Regarding the illumination maps and its segmentation, the used parameters are the same as the presented in the work of d. Carvalho et al. (2013). For the depth maps, the used parameters were also kept as default as proposed by Godard et al. (2017). In the task of generating the saliency maps, default values were also used without any changes.

We conducted the classification process by using two different classifiers. In the first one, our feature vectors classifier, an One-vs-Rest classifier was adopted using a Support Vector Machine (SVM) with default parameter values. The second classifier, used for predicting the class of a given set of probabilities, was performed by using also an SVM classifier, with the adoption of a Radial Basis Function (RBF) Kernel.

In this work, the major used frameworks were Keras (version 2.2.2)

111https://keras.io

and TensorFlow (1.5.0)

222https://www.tensorflow.org333All the source code will be made freely available for usage and replication of the method hereby presented, upon paper acceptance..

4.4 Intra-Dataset Evaluation

In this section, the results obtained for the intra-dataset evaluation are presented, in order to evaluate how our method performs when testing within the same dataset. For the evaluation of our method, we used the same protocols defined by the databases authors, as well as the metrics proposed by them.

4.4.1 Casia

In the evaluation of the CASIA dataset, the best overall result, in which all the attacks were evaluated together, was obtained by using the illumination maps property solely, achieving an HTER value of 3.88%. The usage of the concatenated maps also showed great results for separated types of attack, such as when evaluated on the tablet and print attack types, achieving HTER values of 1.66% and 3.88%, respectively. Attacks performed by using a printed sheet of paper with a cut in the eyes region, the usage of the saliency maps showed great results, with a HTER value of 6.94%.

Table 1 provide all the obtained results for the intra-dataset evaluation on the CASIA-FASD database.

Method Tablet Print Cut Overall
Depth 10.27 13.05 17.50 33.33
Illumination 5.00 8.33 10.00 3.88
Saliency 5.00 6.38 6.94 14.81
Concatenated 1.66 3.88 11.38 15.55
Table 1: Performance Results (in %) considering the Intra-Dataset Protocol for the CASIA dataset.

4.4.2 Replay Attack Dataset

As seen in the Table 2, the usage of the illumination maps showed a great performance on the evaluation of the Replay Attack dataset, with HTER values of 1.56%, 1.25%, 0.62% and 5.50% for High definition, Printed, Mobile and Overall attack types.

Method Highdef Print Mobile All
Depth 30.93 16.87 22.81 31.62
Illumination 1.56 1.25 0.62 5.50
Saliency 10.62 11.87 5.93 12.62
Concatenated 10.00 7.50 10.00 10.75
Table 2: Performance Results (in %) considering the Intra-Dataset Protocol for the Replay Attack dataset.

4.4.3 Nuaa

In Table 3, we display the results obtained for the intra-dataset evaluation protocol in the NUAA dataset. We can observe that the usage of depth maps in this dataset plays a major role on our performance results, achieving a HTER value of 20.75% and a BPCER value of 3.15%.

Method HTER APCER BPCER
Depth 20.75 38.36 3.15
Illumination 48.20 29.37 67.04
Saliency 34.20 27.89 41.11
Concatenated 44.36 28.27 60.45
Table 3: Performance Results (in %) considering the Intra-Dataset Protocol for the NUAA dataset.

4.5 Inter-Dataset Evaluation

Building a method that is highly adaptable from one face anti-spoofing database to another unknown one has been posed as a major challenge in previous works, and it’s an essential ability for real world applications that rely on face recognition for authentication. This task is posed as a major challenge due to the differences in the scenery from a given database to another, such as illumination, depth, as well as hardware based configurations, such as capture sensor and camera processing.

In this section, we present the obtained results for the inter-dataset evaluation protocol, when one dataset was used for training and another was used for test.

4.5.1 Casia

Using a fusion based approach, we were able to achieve state of the art results for the CASIA dataset. By a combination of two different anti-spoofing databases (NUAA and Replay Attack), we obtained an HTER value of 33.14%, as well as great results with the classification of the intrinsic properties alone, with HTER values of 36.82% and 39.32% for the depth and illumination maps, respectively.

Train Set Method HTER APCER BPCER
NUAA Depth 44.56 62.45 26.66
Illumination 55.34 84.01 26.66
Saliency 43.15 27.88 58.42
Concatenated 50.74 87.00 14.44
Replay Attack Depth 43.33 8.88 77.78
Illumination 50.18 0.30 100
Saliency 52.79 8.92 96.67
Concatenated 50.18 1.48 98.88
NUAA Depth 36.82 31.59 42.04
Illumination 39.32 18.65 60.00
Replay Attack Saliency 61.84 37.17 86.51
Concatenated 33.14 46.29 20.00
Table 4: Performance Results (in %) considering the Inter-Dataset Protocol for the CASIA dataset.

4.5.2 Replay Attack

On the evaluation of the Replay-Attack dataset, the best results were obtained when using a combination between the CASIA-FASD and the NUAA databases, by making usage of the illumination maps, resulting in a HTER value of 36.75%. Good results were also achieved by using only the NUAA dataset as train set along with the illumination maps, resulting in a HTER value of 41.64%.

Train Set Method HTER APCER BPCER
NUAA Depth 48.00 43.85 52.14
Illumination 41.64 3.28 80.00
Saliency 47.78 62.71 32.85
Concatenated 43.42 31.14 55.71
CASIA Depth 53.00 22.25 83.75
Illumination 55.37 88.25 22.5
Saliency 61.25 91.25 31.25
Concatenated 60.35 80.00 40.71
NUAA Depth 47.62 27.75 67.50
Illumination 36.75 51.00 22.50
CASIA Saliency 54.75 74.50 35.00
Concatenated 40.21 37.57 42.85
Table 5: Performance Results (in %) considering the Inter-Dataset Protocol for the Replay Attack dataset.

4.5.3 Nuaa

On the task of classifying the NUAA dataset, the best results were obtained for the depth maps, with a HTER value of 37.27%, showing that this property, besides not performing so well in the Replay-Attack and the CASIA databases, still shows great potential for revealing cues that may indicate a presentation attack.

The usage of illumination maps also showed great results when used as training sets the Replay-Attack dataset and a combination between Replay-Attack and CASIA, achieving HTER values of 51.13% and 50.51%, respectively.

Train Set Method HTER APCER BPCER
CASIA Depth 37.27 47.56 26.97
Illumination 50.88 11.66 90.09
Saliency 47.67 41.07 54.26
Concatenated 44.29 25.38 63.19
Replay Attack Depth 64.94 52.84 77.03
Illumination 51.13 3.39 98.86
Saliency 56.29 15.30 97.29
Concatenated 52.55 9.80 95.29
CBSR Depth 51.00 47.18 54.81
Illumination 50.51 4.39 96.63
Replay Attack Saliency 60.26 33.52 86.99
Concatenated 50.61 17.12 84.10
Table 6: Performance Results (in %) considering the Inter-Dataset Protocol for the NUAA dataset.

5 Comparison with State-of-the-art

In table 7, we display the results obtained by using our approach, for both the property maps individually as well for the concatenated ones. Significant results were achieved on the task of inter-dataset classification, mostly on the CASIA-FASD dataset, achieving an HTER value of 33.14% when trained on the combination between the NUAA and the Replay-Attack datasets, overcoming results obtained in previous works (d. S. Pinto et al. (2012); Yang et al. (2014)). The usage of property maps alone also showed great results for the CASIA dataset, with HTER values of 36.82% and 39.32%, respectively.

Outstanding results were also achieved for the Replay-Attack dataset when trained on the combination of NUAA and CASIA databases, achieving HTER values of 36.75% when using the illumination maps, achieving near state of the art results for this dataset.

Method CASIA Replay-Attack NUAA
Yeh and Chang (2018) 39.00 38.10 -
Pinto et al. (2018) 47.16 49.72 -
Yang et al. (2014) 42.04 41.36 -
Patel et al. (2016) - 31.60 -
Tan et al. (2010) - - 45.85
Peixoto et al. (2011b) - - 49.85
Depth 36.82 47.62 37.27
Illumination 39.32 36.75 50.51
Saliency 43.15 47.78 47.67
Proposed Method 33.14 40.21 50.61
Table 7: Comparison Among Existing Aproaches Considering the Inter-Dataset Evaluation Protocol.

6 Conclusions and Future Works

In this paper, we have proposed a method that, by using a two-step classification model, along with intrinsic image properties, such as depth, illumination and saliency, is able to learn features for the task of presentation attack detection.

Evaluating our method in three different databases, we overpass state-of-the-art results achieving a HTER value of 33.14% for the CASIA-FASD dataset for the inter-dataset evaluation, which addresses the efficacy of our method when compared to previous works in the literature. This result, to the best of our knowledge, is the best achieved for the CASIA dataset, setting our method as the state of the art for this specific dataset.

We believe that the finds provided by this paper, such as the efficacy of using image intrinsic properties, can lead to a better understanding on the development of new anti-spoofing methods, as well as to provide better in the development of new datasets.

For future works, we plan to make usage of other PAD datasets, once we were able to achieve great results when combining more than one dataset for evaluation on the inter-dataset protocol, leading to even better results which can help to address how to tackle facial presentation attacks.

7 Acknowledgments

We would like to thank São Paulo Research Foundation (FAPESP)(#2017/12631-6), to the National Council for Scientific and Technological Development - CNPq (#423797/2016-6), and to NVIDIA for the donation of a TITAN XP GPU to be used on this research.

References

  • Abrams et al. (2012) Abrams, A., Hawley, C., Pless, R., 2012.

    Heliometric stereo: Shape from sun position, in: Computer Vision–ECCV 2012. Springer, pp. 357–370.

  • Bao et al. (2009) Bao, W., Li, H., Li, N., Jiang, W., 2009. A liveness detection method for face recognition based on optical flow field, in: Image Analysis and Signal Processing, 2009. IASP 2009. International Conference on, IEEE. pp. 233–236.
  • Buchsbaum (1980) Buchsbaum, G., 1980. A spatial processor model for object colour perception. Journal of the Franklin Institute 310, 1 – 26. URL: http://www.sciencedirect.com/science/article/pii/0016003280900587, doi:https://doi.org/10.1016/0016-0032(80)90058-7.
  • Carvalho et al. (2016) Carvalho, T., Faria, F.A., Pedrini, H., da S. Torres, R., Rocha, A., 2016. Illuminant-based transformed spaces for image forensics. IEEE Transactions on Information Forensics and Security 11, 720–733. doi:10.1109/TIFS.2015.2506548.
  • d. Carvalho et al. (2013) d. Carvalho, T.J., Riess, C., Angelopoulou, E., Pedrini, H., d. R. Rocha, A., 2013. Exposing digital image forgeries by illumination color classification. IEEE Transactions on Information Forensics and Security 8, 1182–1194. doi:10.1109/TIFS.2013.2265677.
  • Chingovska et al. (2012) Chingovska, I., Anjos, A., Marcel, S., 2012. On the effectiveness of local binary patterns in face anti-spoofing, in: 2012 BIOSIG - Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), pp. 1–7.
  • Chingovska et al. (2013) Chingovska, I., Yang, J., Lei, Z., Yi, D., Li, S.Z., Kahm, O., Glaser, C., Damer, N., Kuijper, A., Nouak, A., et al., 2013. The 2nd competition on counter measures to 2d face spoofing attacks, in: Biometrics (ICB), 2013 International Conference on, IEEE. pp. 1–6.
  • Erdogmus and Marcel (2014) Erdogmus, N., Marcel, S., 2014. Spoofing face recognition with 3d masks. IEEE transactions on information forensics and security 9, 1084–1097.
  • Felzenszwalb and Huttenlocher (2004) Felzenszwalb, P.F., Huttenlocher, D.P., 2004. Efficient graph-based image segmentation. Int. J. Comput. Vision 59, 167–181. URL: https://doi.org/10.1023/B:VISI.0000022288.19776.77, doi:10.1023/B:VISI.0000022288.19776.77.
  • Furukawa et al. (2015) Furukawa, Y., Hernández, C., et al., 2015. Multi-view stereo: A tutorial. Foundations and Trends® in Computer Graphics and Vision 9, 1–148.
  • Godard et al. (2017) Godard, C., Aodha, O.M., Brostow, G.J., 2017. Unsupervised monocular depth estimation with left-right consistency, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611. doi:10.1109/CVPR.2017.699.
  • He et al. (2015) He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep residual learning for image recognition. CoRR abs/1512.03385.
  • ISO 30107-3:2017 (E) ISO 30107-3:2017(E), 2017. Information technology – Biometric presentation attack detection – Part 3: Testing and reporting.
  • Kollreider et al. (2009) Kollreider, K., Fronthaler, H., Bigun, J., 2009. Non-intrusive liveness detection by face images. Image and Vision Computing 27, 233–244.
  • Komulainen et al. (2013) Komulainen, J., Hadid, A., Pietikäinen, M., 2013. Context based face anti-spoofing, in: 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–8. doi:10.1109/BTAS.2013.6712690.
  • Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, pp. 1097–1105.
  • Li (2008) Li, J.W., 2008.

    Eye blink detection based on multiple gabor response waves, in: Machine Learning and Cybernetics, 2008 International Conference on, IEEE. pp. 2852–2856.

  • Määttä et al. (2011) Määttä, J., Hadid, A., Pietikäinen, M., 2011. Face spoofing detection from single images using micro-texture analysis, in: 2011 International Joint Conference on Biometrics (IJCB), pp. 1–7. doi:10.1109/IJCB.2011.6117510.
  • Nasrabadi (2007) Nasrabadi, N.M., 2007. Pattern recognition and machine learning. Journal of electronic imaging 16, 049901.
  • Pan et al. (2007) Pan, G., Sun, L., Wu, Z., Lao, S., 2007. Eyeblink-based anti-spoofing in face recognition from a generic webcamera, in: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. doi:10.1109/ICCV.2007.4409068.
  • Pan et al. (2008) Pan, G., Wu, Z., Sun, L., 2008. Liveness detection for face recognition, in: Delac, K., Grgic, M., Bartlett, M.S. (Eds.), Recent Advances in Face Recognition. IntechOpen, Rijeka. chapter 9, pp. 235–252. URL: https://doi.org/10.5772/6397, doi:10.5772/6397.
  • Patel et al. (2016) Patel, K., Han, H., Jain, A.K., 2016. Cross-database face antispoofing with robust feature representation, in: You, Z., Zhou, J., Wang, Y., Sun, Z., Shan, S., Zheng, W., Feng, J., Zhao, Q. (Eds.), Biometric Recognition, Springer International Publishing, Cham. pp. 611–619.
  • Peixoto et al. (2011a) Peixoto, B., Michelassi, C., Rocha, A., 2011a. Face liveness detection under bad illumination conditions, in: 2011 18th IEEE International Conference on Image Processing, pp. 3557–3560. doi:10.1109/ICIP.2011.6116484.
  • Peixoto et al. (2011b) Peixoto, B., Michelassi, C., Rocha, A., 2011b. Face liveness detection under bad illumination conditions, in: Image Processing (ICIP), 2011 18th IEEE International Conference on, IEEE. pp. 3557–3560.
  • Pinto et al. (2018) Pinto, A., Pedrini, H., Krumdick, M., Becker, B., Czajka, A., Bowyer, K.W., Rocha, A., 2018. Counteracting presentation attacks in face, fingerprint, and iris recognition. Deep Learning in Biometrics , 245.
  • d. S. Pinto et al. (2012) d. S. Pinto, A., Pedrini, H., Schwartz, W., Rocha, A., 2012. Video-based face spoofing detection through visual rhythm analysis, in: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 221–228. doi:10.1109/SIBGRAPI.2012.38.
  • Schwartz et al. (2011) Schwartz, W.R., Rocha, A., Pedrini, H., 2011. Face spoofing detection through partial least squares and low-level descriptors, in: 2011 International Joint Conference on Biometrics (IJCB), pp. 1–8. doi:10.1109/IJCB.2011.6117592.
  • Tan et al. (2008) Tan, R.T., Ikeuchi, K., Nishino, K., 2008. Color constancy through inverse-intensity chromaticity space, in: Digitally Archiving Cultural Objects. Springer, pp. 323–351.
  • Tan et al. (2010) Tan, X., Li, Y., Liu, J., Jiang, L., 2010. Face liveness detection from a single image with sparse low rank bilinear discriminative model, in: Proceedings of the 11th European Conference on Computer Vision: Part VI, Springer-Verlag, Berlin, Heidelberg. pp. 504–517. URL: http://dl.acm.org/citation.cfm?id=1888212.1888251.
  • Woodham (1980) Woodham, R.J., 1980. Photometric method for determining surface orientation from multiple images. Optical engineering 19, 191139.
  • Yang et al. (2014) Yang, J., Lei, Z., Li, S.Z., 2014. Learn Convolutional Neural Network for Face Anti-Spoofing .
  • Yang et al. (2013) Yang, J., Lei, Z., Liao, S., Li, S.Z., 2013. Face liveness detection with component dependent descriptor, in: 2013 International Conference on Biometrics (ICB), pp. 1–6. doi:10.1109/ICB.2013.6612955.
  • Yeh and Chang (2018) Yeh, C.H., Chang, H.H., 2018. Face liveness detection based on perceptual image quality assessment features with multi-scale analysis, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE. pp. 49–56.
  • Yosinski et al. (2014) Yosinski, J., Clune, J., Bengio, Y., Lipson, H., 2014. How transferable are features in deep neural networks? CoRR abs/1411.1792.
  • Zhang et al. (2012) Zhang, Z., Yan, J., Liu, S., Lei, Z., Yi, D., Li, S.Z., 2012. A face antispoofing database with diverse attacks, in: 2012 5th IAPR International Conference on Biometrics (ICB), pp. 26–31. doi:10.1109/ICB.2012.6199754.
  • Zhu et al. (2014) Zhu, W., Liang, S., Wei, Y., Sun, J., 2014. Saliency optimization from robust background detection, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2814–2821. doi:10.1109/CVPR.2014.360.