Facial Behavior Analysis using 4D Curvature Statistics for Presentation Attack Detection

10/14/2019 ∙ by Martin Thümmel, et al. ∙ 66

The uniqueness, complexity, and diversity of facial shapes and expressions led to success of facial biometric systems. Regardless of the accuracy of current facial recognition methods, most of them are vulnerable against the presentation of sophisticated masks. In the highly monitored application scenario at airports and banks, fraudsters probably do not wear masks. However, a deception will become more probable due to the increase of unsupervised authentication using kiosks, eGates and mobile phones in self-service. To robustly detect elastic 3D masks, one of the ultimate goals is to automatically analyze the plausibility of the facial behavior based on a sequence of 3D face scans. Most importantly, such a method would also detect all less advanced presentation attacks using static 3D masks, bent photographs with eyeholes, and replay attacks using monitors. Our proposed method achieves this goal by comparing the temporal curvature change between presentation attacks and genuine faces. For evaluation purposes, we recorded a challenging database containing replay attacks, static and elastic 3D masks using a high-quality 3D sensor. Based on the proposed representation, we found a clear separation between the low facial expressiveness of presentation attacks and the plausible behavior of genuine faces.

READ FULL TEXT VIEW PDF

Authors

page 4

page 5

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The digitalization of organizational processes to improve their efficiency in terms of time and resources is progressing extremely fast. The demand for robust, unsupervised authentication methods is getting equally stronger. However, current authentication methods are not able to detect advanced spoofing attacks or identity thefts. Organizational processes which should be robust against fraud are for example automated border control, the opening of a bank account, financial transfer, and mobile payments using self-service eGates, kiosks, and mobile phones. In the following, a presentation attack refers to all cases in which biometric copies are presented.

Biometric methods focus on the shape of the human body as physiological characteristics of e.g. the finger-, palm print, and the iris [31]. The promising field of behavioral biometrics contains the analysis of e.g. the voice [8], walking gait [7] and keystroke dynamics [22]. By definition, these methods are more robust against presentation attacks than biometric methods that rely on static appearance. However, individual characteristics can be either replayed or imitated by another person [1]

. Instead of developing an even more robust behavioral biometric method, we focus on the task of Presentation Attack Detection (PAD) as a mandatory security check for face authentication systems.

We analyze the plausibility of the individual facial trait of a person based on temporal sequences of 3D face scans which we call 4D face scans. At first, we apply an unsupervised preprocessing step for face extraction and pose normalization. Afterward, we account for the huge amount of redundant information in high-resolution 4D face scans by analyzing the plausibility of curvature changes at subsampled radial stripes. Both handcrafted steps are appropriate in this case since it is not possible to create a large and unbiased database with all different presentation attacks to perform classification using e.g

. deep learning.

Ii Related Work

Ii-a Presentation Attack Detection

Similarly to the production of counterfeit money, imposters of an authentication system will always try to create better biometric copies. Most current 2D face authentication methods can be attacked by presenting photographs and 3D masks of another person’s face [3]. Therefore, several methods like [17] exist for distinguishing an image of a face photograph from an image of a genuine face. Since these methods are not able to detect attacks using high-quality photographs, challenge-response authentication methods were developed. For authentication, a user is asked to read words aloud [19], move its head [10], blink its eyes [28], or show facial expressions. However, these methods can be attacked by presenting a video on a monitor where the face behaves the same way as so-called replay attacks. Again, a solution to detect these attacks is to analyze if the 3D facial structure is plausible based on 3D landmark locations [30] or the mean curvature [20]. Such methods are robust against the presentation of bent photographs, but can still be attacked using 3D masks [26]. Even the 3D facial recognition system Face ID [25] analyzes only the static facial appearance in 3D to planar presentation attacks. To detect even elastic 3D masks, we have developed a method that combines both, the temporal analysis of 3D face scans and a challenge-response protocol.

Recent deep learning methods [23] can detect many kinds of 3D masks from 2D color videos of subjects with a neutral facial expression. They do not take the depth information into account and focus on subconscious facial movements like blinking. Hence, they are still vulnerable to replay attacks and unseen or partial 3D masks. Deep learning methods for PAD based on sequences of 2.5D depth images [14] are also able to detect 3D masks. Even though they achieve impressive results, their training databases are biased due to a few different masks and genuine faces. Even if a challenging training database containing all state-of-the-art mask types would be available, a novel mask type or adversarial example [27] could still be used to attack the system. Thus, these methods are also not robust against unseen presentation attacks. Furthermore, these methods are not invariant under Euclidean transformations as they do not directly extract features from the 3D facial surface. In contrast, our method requires no costly training on such biased databases and only obtains the lower limit of the plausible amount of facial expression changes from recordings of genuine faces.

Liveness detection methods analyze the differences between genuine facial skin and masks or partial face modifications. One example of such a method is the estimation of the heart pulse rate from a regular camera

[21]. However, emitting a green flashing light onto a mask with a plausible frequency can fool these methods since the light green channel varies the most with the pulse [9]. Furthermore, these methods can be attacked by presenting thin/partial masks and require almost static facial expression, pose, and illumination.

Alternatively, presentation attacks could be detected as all cases where the resulting shape and expression parameters of a fitted 3D morphable model (3DMM) [4] exceed their expected range. However, 3DMMs cannot adapt to 3D scans of photographs/monitors and would result in underfitting. Thus, the shape and expression parameters would still remain in the plausible range of genuine faces because of their inherent, statistical bounds of all faces during training.

Ii-B Curvature Analysis of Radial Stripes

As emphasized by Katina et al. [15], 3D anatomical curves provide a much richer characterization of faces than landmarks, which are just individual points along anatomical curves. The method from Vittert et al. [29]

localizes a complete set of anatomical curves along ridges and valleys of the facial shape. The overall curvature along a curve is iteratively maximized over points with a positive or negative shape index, respectively. A complete facial model is built from these anatomical curves, subsampled intermediate curves, and manually annotated landmarks. However, it was applied for statistics of the neutral face appearance. In the case of facial expressions, the shape index and 1D curvature can change drastically and would require many heuristics to still obtain robust anatomical curves.

Instead of relying on anatomical curves, the representation form Berretti et al. [2] approximates the facial shape with a set of geodesics. A set of radial curves is created by intersecting the facial surface with planes through the nose tip. Starting with the anatomical midsaggital plane, it is repeatedly rotated around the roll axis by a fraction of the full angle. This representation has proven to result in superior performances in 4D facial expression recognition [33], face recognition [12], and even body part analysis in general [5]. However, the method from Zhen et al. [33] would have a too high computational effort, if applied to facial behavior analysis on high-quality 4D scans due to the costly alignment procedure. All consecutive face scans are roughly aligned using the iterative closest point algorithm, following a fine-alignment of all consecutive radial curves using dynamic programming.

For feature extraction, all mentioned methods rely on the first derivative or curvature of curves along the surface w.r.t. arc-length. Instead, we encourage to use the 3D surface curvature

[24]

which takes the neighborhood on a 2D surface into account and is more robust against overfitting through noise, outliers, and holes. For registration purposes, the mentioned methods fit 1D curves and enforce a homogeneous geodesic distance by equidistant sampling along the arc-length. However, it is assumed that the complexity of the underlying surface must be similar for all curves and all expected shapes as the degrees of freedom of the curves are fixed to a certain number. Due to the huge shape variety of faces, 3D masks, and planar presentation attacks, this is not given in case of presentation attacks. Curves along the nasal ridge and through the cheeks have a lower curvature than curves through the eyes and mouth. In the extreme case of presentation attacks using photographs and monitors, all curvature values should be small. Depending on whether the degrees of freedom are adjusted to more or less complex surfaces, it results in over- or underfitting in the other cases, respectively.

Iii Methodology

Fig. 1: Our suggested representation of 4D face scans reduces the huge amount of biometric characteristics in color and depth image sequences to the graph shown on the right. This graph measures the facial expression change at certain radial stripes over time.

The calculation of our representation is illustrated in the pipeline in Fig. 1. Given two synchronized sequences of color and depth images, two preprocessing steps are performed. First, the anatomical landmarks are localized in the color images, transformed to the depth images and 3D reconstructed (Section III-A1). Second, the face is extracted and the pose is normalized for each frame, to obtain a local coordinate system which is centered at the nose tip (Section III-A2).

Afterward, we are proposing a representation of 4D face scans based on the following two steps: Equidistant surface curvatures are extracted at equiangular radial stripes to subsample the point cloud (Section III-B1). The curvatures of consecutive frames are correlated over time to locally measure the curvature change for each radial stripe (Section III-B2). The graph in Fig. 1

shows the maximum cross-correlation for each radial stripe over time. For example, the temporal changes of the two inner peaks relate to the eye movements and the two outer peaks to the mouth movements. We found that the standard deviation of the temporal changes in the mouth region allows for PAD (

Section IV-A).

Iii-a Preprocessing Steps

Iii-A1 3D Landmark Localization

The starting point for many methods which analyze human faces is the extraction of anthropometrical landmarks around the eyebrows, eyes, mouth, and nose (see Fig. 1, left). Since 2D facial landmark localization methods usually achieve superior performances than 3D methods due to larger training databases, we use the method from Kazemi et al. [16]. The even higher robustness of more recent deep learning methods [6] is not required as we are focusing on the cooperative self-service scenario with an almost frontal pose and little occlusion.

For stereoscopic and multi-view camera systems, the 2D landmarks can be 3D reconstructed using the registered depth image. However, in case of denser and more accurate 3D scans of a structured-light 3D sensor, the depth and color images are not registered. In this case, texture coordinates contain the pixel-wise mappings from the depth image to the color image. We store the inverse mapping from color to depth pixels during the calculation of the texture coordinates and call them depth coordinates. Finally, the closest non-zero depth coordinate for a given color coordinate points to its corresponding depth. Since the texture coordinates are always calculated, the computational overhead of this approach is negligible. Fig. 1 shows the corresponding landmark locations in the color and depth image.

Alternatively, 3D landmarks could be 3D reconstructed using lossy RGB-D images in two ways. First, a registered depth image can be created by extracting the color values for each pixel in the depth image using texture coordinates. Second, a depth value can be assigned to each color pixel after 3D reconstructing and projecting all depth pixels to the color camera. As the resolution of the depth images is usually smaller than of the color images, the first approach would result in low-resolution color images and the second approach would result in depth images with holes. Furthermore, our approach avoids the high computational demand of both common alternatives.

Iii-A2 Face Extraction and Pose Normalization

In the application scenario of a static 3D sensor in an eGate or kiosk system, the field of view must be large enough to capture faces of people with different sizes and positions. As such 3D face scans mostly contain background, the preprocessing first step centers the point cloud at the nose tip landmark and extracts all points inside a sphere with .

The second preprocessing step normalizes the head pose to an upright and frontal view. For head pose estimation, we adopt the approach from Derkach

et al. [11] based on facial landmarks (see Fig. 2).

Fig. 2: Left: According to [15], anthropometric landmarks are located at positions of maximum curvature along ridges and ruts of the facial surface and where these anatomical curves intersect (image from Klare and Jain [18]

© 2010, IEEE). Right: The red landmarks are used for estimating the roll angle (red arrow) of the head pose. A plane is fitted onto the red and blue landmarks and the x- and y-components of its normal vector (green) are used for estimating the yaw and pitch angles. For visualization purposes, a triangulated mesh is shown instead of the point cloud (best viewed in color).

Finally, the corresponding transposed rotation matrix is applied for pose normalization. This allows us to also achieve robust results in case of deviations of the head pose from the frontal view.

Iii-B Spatiotemporal Curvature Analysis

The most important subtask of a method for 4D facial behavior analysis is to deal with the extremely high spatiotemporal redundancy of the 4D face scans. The temporal difference between consecutive face scans and the spatial difference between points and its immediate neighbors are very small. Thus, a feature representation should only extract facial expression changes, which constitute the overall facial behavior. Our new representation achieves this goal by calculating the curvature of point subsets in Section III-B2 and a correlation-based approach in Section III-B2.

Iii-B1 Curvature Analysis of Radial Stripes

After applying the preprocessing steps, we account for the common issue of holes and peaks in 3D scans by calculating the mean depth image between three consecutive 3D scans. To extract radial stripes in the next step, a reference coordinate system given by the three unit vectors in x-, y-, and z-direction is centered at the nose tip landmark. First, we intersect the face with the yz-plane as shown in Fig. 3 (exemplary, w/o cropping).

Fig. 3: After normalizing the pose to an upright and frontal view, we extract a radial stripe through the face (left). We then rotate the plane repeatedly by a fraction of the full angle until we obtained radial stripes (right, after 4 rotations).

A radial stripe is then extracted by taking all points for which the projection onto the x-axis is lower than a certain threshold and the projection onto the y-axis is positive as

(1)

among all points of a single 3D face scan. Afterward, the x- and y-axis are rotated around the z-axis by and the process is repeated until radial stripes are extracted.

To reduce the feature dimension, we calculate the curvature for each sampled point based on its local neighborhood in the original point cloud. The usage of the curvature is encouraged by its invariance under 3D Euclidean transformations. The curvature can be calculated from the 1D parametric arc-length representation of each curve as

(2)

However, the resulting 1D curvatures in case of a 3D scan of a flat surface in Fig. 4 are in the range of the curvatures of genuine face scans due to overfitting to the noisy point cloud.

Fig. 4: Due to overfitting of radial curves, the 1D curvature becomes large (middle) in case of noisy, planar 3D scans of monitors (left) and photographs. Instead, the 3D surface curvature (right) is more robust against noise and is almost zero everywhere in this case.

As a consequence, we are approximating the 3D surface curvature [24] as the surface variation from the statistics of the neighborhood for each point as

(3)

where and

are the eigenvalues of a principal component analysis applied to a local point subset.

Fig. 5: The 3D curvature of the face is large in the important facial areas of eyes, eyebrows, and the mouth. The first radial stripe (black) is vertically aligned and points downward.

In Fig. 5, it is shown that this approach is more robust against overfitting as a larger neighborhood on a 2D surface is taken into account. This approximation also avoids the major impact of the unbounded, unstable second derivation in creftype 2. The surface variation is bounded to for isotropically distributed points.

To compare consecutive radial stripes, we subsample points along each radial stripe. We obtain an equidistant spacing between each point by sampling along the projection to the y-axis . An important advantage of the resulting representation is, that the degree of linear and angular subsampling can be varied depending on the desired accuracy and runtime.

Iii-B2 Measuring the Facial Expression Change

The resulting curvatures of genuine faces would still be indistinguishable from well-shaped copies like the 3D masks from REAL-f Co. Therefore, we measure the temporal curvature change between consecutive face scans and perform PAD on the resulting time series representation.

After extracting the curvature of points along radial stripes, the point-wise product of the curvature values between consecutive radial stripes is calculated. Since the detected position of the nose tip varies slightly, it is necessary to align the radial stripes to each other. This is done by taking the corresponding shift at the point of maximum cross-correlation

(4)

between consecutive radial stripes at time steps and . The maximum cross-correlation measures how similar the consecutive curvature values are. The resulting multivariate time series containing the values of for each radial stripe measures the individual change of the facial expression.

To compare the amount of change between presentation attacks and genuine faces, we calculate the standard deviation of each time series. The standard deviation measures the overall facial expression intensity and is comparable between all recordings if the number of facial expressions is similar. For PAD, all subjects were asked to answer the same number of questions to make the number of induced facial expressions and visemes111In general, visemes contain all mouth appearances during speech. comparable.

Iv Experiments

Iv-a 4D Presentation Attack Detection

Many published databases for PAD contain at most only 2D presentation attacks using bent photographs with holes [32]. The recently published 3D-MAD [13], CS-MAD [3] and WMCA [14] databases also contain static, elastic and partial 3D masks as presentation attacks. However, they were recorded using a low-quality 3D sensor and capture only the neutral face appearance without any facial expression. Since our method analyzes facial behavior, it is not possible to apply it to these databases.

Therefore, we collected an own database containing 48 4D face scans of 16 subjects and 9 presentation attacks using replay attacks, static and elastic 3D masks (see Fig. 6).

Fig. 6: Our new database contains 4D scans of presentation attacks using monitors (top middle/right), static 3D mask (bottom left) and elastic 3D masks (bottom right). The implemented challenge-response protocol results in the first database which allows for detecting presentation attacks using 3D behavioral analysis.

Furthermore, some subjects have beards, wear classes and make-up to make it more challenging as these cases can only be reconstructed very inaccurately using active 3D sensors. We used a structured-light 3D sensor with a high accuracy of and a depth resolution of 1 MP at over a duration of 18 s (540 frames). For each face scan, the computation takes only 162 ms on a Core-i7 CPU and the number of extracted points is around . We will publish the source code and our new database.

In general, challenge-response protocols are used in case of recent behavioral biometric or PAD methods to become robust against static presentation attacks. We adopted the common practice of asking familiar small-talk questions at a bank or concerning the entry regulations in cross-border traffic. We instructed the user to answer by speaking, to induct visemes and facial expressions. In our case, the captured face scans serve as the response and are analyzed for irregularities in facial appearance and expression.

Fig. 7: For most recordings of genuine faces like an average one with number 22, the facial expression change is much larger than in case of the 3D mask recording with number 7. Even though the facial expression change is similar in some cases like number 11 (elastic 3D mask) and number 5 (minimal facial expression), there is still a large difference concerning the standard deviation.

After calculating for all radial stripes and time series using creftype 4, we found that the overall facial expression change of genuine faces is much larger than of presentation attacks (see Fig. 7).

Fig. 8: The blue and orange curves show the sample distribution of for each radial stripes and . Some standard deviations are small for both presentation attacks and genuine faces, but the radial stripes which correspond to the mouth and eye regions show substantial differences.

Fig. 8 shows the distribution of among all radial stripes and recordings. The standard deviation of the first and last few radial stripes through the mouth differ between genuine faces and presentation attacks by a large margin. For the eye regions, the standard deviation is also different, but not well-separated. Active 3D sensors are not able to accurately measure the eye region which is either reflecting or absorbing the projected stripe pattern.

Fig. 9: We used the radial stripe which comprises the maximum overall curvature around the mouth region for PAD. A threshold at the lower limit of the plausible amount of facial expression change allows finally to separate genuine faces from presentation attacks.

As shown in Fig. 9

, a single radial stripe through the mouth region is finally sufficient to perfectly classify between genuine faces and presentation attacks. A threshold of

also leaves enough space for even more elastic 3D masks.

The key idea behind our approach is that presentation attacks show less facial expressions compared to genuine faces concerning the standard deviation of our representation. Hence, it would also be possible to track the position of any patch of the facial surface, extract local features and measure their standard deviation. Since it is not trivial to select a suited surface patch and to track its position in case of facial expressions and pose changes, we implemented a similar method based on the mouth openings as a baseline. After pose normalization, the Euclidean y- and z-distances between the labrale superior (ls) and labrale inferior (li) 3D landmarks are calculated as the mouth openings.

Fig. 10: As a baseline, we calculated the standard deviation of the mouth openings in y- and z-direction over time as a simple measurement of the facial expression intensity. Even though the standard deviation is also larger in case of genuine faces, both classes are not well separated.

However, the resulting standard deviations of these time series in Fig. 10 are too similar between genuine faces and elastic 3D masks.

For comparison purposes, we implemented the simple, yet powerful method from Lagorio et al. [20]. They found that the mean 3D surface curvature of genuine 3D face scans is larger than the mean curvature of (bent) photographs. To improve the robustness of this method, we also applied our preprocessing steps from Section III-A2.

Fig. 11: The mean curvature of the four rightmost presentation attacks using monitors is smaller than the mean curvature of genuine faces. However, this representation does not allow distinguishing between 3D masks and genuine faces.

Fig. 11 shows that this also holds in case of the four rightmost presentation attacks using monitors. However, since a 3D scan of a reflecting monitor is noisy, the differences are much smaller than their stated difference of an order of magnitude between photographs and genuine faces. As expected, their method cannot detect the presentation attacks using 3D masks since their mean curvature is in the same range as for genuine faces.

Iv-B Paresis Treatment Analysis

In many cases of treatment analyses of patients, the individual progress matters and is too complex to be deduced from the average or a similar patient. As an outlook, we also applied our representation to individual treatment analysis of a patient with facial paresis. We recorded a patient every month over one year using the same 3D sensor as before. At the beginning of the study, the facial nerve of the right facial half was cut, resulting in a complete paralysis. During the treatment, the nerve grew back together (reinnervation), showed first signs of its restored functionality on 09/04/2018 and allowed for substantial, voluntary muscle movements at the end of the treatment (see Fig. 12).

For each recording, we calculated the mean of the cross-correlation over the snarl facial exercise. Fig. 13 shows that improved continously over time during the reinnervation. The fluctuations on the healthy half of the face are caused by the varying motivation of the patient. Due to mass movements, i.e. stretching of the paretic muscles to the healthy half of the face, is also high for the paretic half of the mouth. Since the eyes cannot be accurately reconstructed with 3D sensors in general, fluctuations of occur in both eye regions.

V Conclusions

Fig. 12: After one year of home-based training, the facial symmetry of this patient improved. Especially the nasolabial fold (blue ellipsen) became more pronounced and the muscles on the right facial half allowed for voluntary movements.

Fig. 13: Besides natural fluctuations, improved over time for the curves through the left cheek (arrow) and the snarl exercise. Thus, the treatment method was successful as the nerve reinnervated and allowed for voluntary muscle movements.

Even though current face authentication methods achieve impressive accuracies, most methods can be fooled by presenting a facial photograph of someone else instead of their own face. The ultimate goal for detecting also advances presentation attacks like bent photographs, replay attacks on monitors and elastic 3D masks, is to analyze if the behavior and the 3D shape of a face scan are plausible. To the best of our knowledge, we developed the first method which is able to robustly detect all of these presentation attacks directly based on 4D face scans. We subsampled the 3D surface curvature at equiangular radial stripes and calculated the standard deviation of the cross-correlation between consecutive stripes over time. Our proposed representation also allows for varying the degree of subsampling depending on the desired accuracy and runtime.

Many published databases for PAD contain at most only 2D presentation attacks using bent photographs with holes or the neutral face appearance in case of 3D presentation attacks. Since our method focuses on facial behavior, we collected a challenging database containing three different types of sophisticated masks and monitor replay-attacks. To induct facial expressions, we implemented a challenge-response protocol and asked the user to answer familiar questions in cross-border traffic by speaking. Our evaluation results for PAD showed the potential of our representation as a single radial stripe through the mouth was sufficient to perfectly distinguish between 2D/3D presentation attacks and genuine faces. For future work, it remains a difficult task to distinguish elastic 3D masks from genuine faces which show minimal facial expressions while speaking.

To show the potential of our representation of 3D face scans for other research topics, we applied it to individual treatment analysis of patients with facial paresis. In this case, multiple radial stripes of our representation were required to highlight and localize the individual improvement in facial symmetry.

References

  • [1] S. Arik, J. Chen, K. Peng, W. Ping, and Y. Zhou (2018) Neural voice cloning with a few samples. In Advances in Neural Information Processing Systems (NIPS), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 10019–10029. External Links: Link Cited by: §I.
  • [2] S. Berretti, A. Del Bimbo, P. Pala, and F. Mata (2008) Face recognition by svms classification of 2d and 3d radial geodesics. In IEEE International Conference on Multimedia and Expo (ICME), pp. 93 – 96. External Links: Document, ISBN 978-1-4244-2570-9 Cited by: §II-B.
  • [3] S. Bhattacharjee, A. Mohammadi, and S. Marcel (2018) Spoofing deep face recognition with custom silicone masks. In IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–7. Cited by: §II-A, §IV-A.
  • [4] V. Blanz, T. Vetter, et al. (1999) A morphable model for the synthesis of 3d faces.. In SIGGRAPH, Vol. 99, pp. 187–194. Cited by: §II-A.
  • [5] A. W. Bowman, S. Katina, J. Smith, and D. Brown (2015) Anatomical curve identification. Computational Statistics and Data Analysis (CSDA) 86, pp. 52–64. Cited by: §II-B.
  • [6] A. Bulat and G. Tzimiropoulos (2017-03-21) How far are we from solving the 2d and 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). IEEE International Conference on Computer Vision (ICCV). External Links: 1703.07332v3 Cited by: §III-A1.
  • [7] P. Cattin (2002) Biometric authentication system using human gait. Ph.D. Thesis, ETH Zurich. Cited by: §I.
  • [8] A. J. Cheyer (2016) Device access using voice authentication. Note: US Patent 9,262,612 Cited by: §I.
  • [9] J. A. Crowe and D. Damianou (1992) The wavelength dependence of the photoplethysmogram and its implication to pulse oximetry. In IEEE Engineering in Medicine and Biology Society (EMBC), Vol. 6, pp. 2423–2424. External Links: Document Cited by: §II-A.
  • [10] M. De Marsico, M. Nappi, D. Riccio, and J. Dugelay (2012) Moving face spoofing detection via 3d projective invariants. In IAPR/IEEE International Conference on Biometrics (ICB), pp. 73–78. Cited by: §II-A.
  • [11] D. Derkach, A. Ruiz, and F. M. Sukno (2017) Head pose estimation based on 3-d facial landmarks localization and regression. In IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 820–827. Cited by: §III-A2.
  • [12] H. Drira, B. B. Amor, A. Srivastava, M. Daoudi, and R. Slama (2013) 3D face recognition under expressions, occlusions, and pose variations. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 35 (9), pp. 2270–2283. Cited by: §II-B.
  • [13] N. Erdogmus and S. Marcel (2013) Spoofing in 2d face recognition with 3d masks and anti-spoofing with kinect. In IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–6. Cited by: §IV-A.
  • [14] A. George, Z. Mostaani, D. Geissenbuhler, O. Nikisins, A. Anjos, and S. Marcel (2019)

    Biometric face presentation attack detection with multi-channel convolutional neural network

    .
    IEEE Transactions on Information Forensics and Security. Cited by: §II-A, §IV-A.
  • [15] S. Katina, K. McNeil, A. Ayoub, B. Guilfoyle, B. Khambay, P. Siebert, F. Sukno, M. Rojas, L. Vittert, J. Waddington, et al. (2016) The definitions of three-dimensional landmarks on the human face: an interdisciplinary view. Journal of Anatomy 228 (3), pp. 355–365. Cited by: §II-B, Fig. 2.
  • [16] V. Kazemi and J. Sullivan (2014) One millisecond face alignment with an ensemble of regression trees. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    pp. 1867–1874. Cited by: §III-A1.
  • [17] G. Kim, S. Eum, J. K. Suhr, D. I. Kim, K. R. Park, and J. Kim (2012) Face liveness detection based on texture and frequency analyses. In IAPR/IEEE International Conference on Biometrics (ICB), pp. 67–72. Cited by: §II-A.
  • [18] B. Klare and A. K. Jain (2010) On a taxonomy of facial features. In IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–8. Cited by: Fig. 2.
  • [19] K. Kollreider, H. Fronthaler, M. I. Faraj, and J. Bigun (2007) Real-time face detection and motion analysis with application in “liveness” assessment. IEEE Transactions on Information Forensics and Security 2 (3), pp. 548–558. Cited by: §II-A.
  • [20] A. Lagorio, M. Tistarelli, M. Cadoni, C. Fookes, and S. Sridharan (2013) Liveness detection based on 3d face shape analysis. In IEEE International Workshop on Biometrics and Forensics (IWBF), pp. 1–4. Cited by: §II-A, §IV-A.
  • [21] S. Liu, P. C. Yuen, S. Zhang, and G. Zhao (2016) 3d mask face anti-spoofing with remote photoplethysmography. In European Conference on Computer Vision (ECCV), pp. 85–100. Cited by: §II-A.
  • [22] F. Monrose and A. Rubin (1997) Authentication via keystroke dynamics. In ACM Conference on Computer and Communications Security (CCS), pp. 48–56. Cited by: §I.
  • [23] O. Nikisins, A. George, and S. Marcel (2019)

    Domain adaptation in multi-channel autoencoder based features for robust face anti-spoofing

    .
    In IAPR/IEEE International Conference on Biometrics (ICB), Cited by: §II-A.
  • [24] M. Pauly, M. Gross, and L. P. Kobbelt (2002) Efficient simplification of point-sampled surfaces. In IEEE Visualization Conference, pp. 163–170. Cited by: §II-B, §III-B1.
  • [25] D. S. Prakash, L. E. Ballard, J. V. Hauck, F. Tang, E. Littwin, P. K. A. Vasu, G. Littwin, T. Gernoth, L. Kucerova, P. Kostka, et al. (2019) Biometric authentication techniques. Note: US Patent App. 16/049,933 Cited by: §II-A.
  • [26] R. Raghavendra, S. Venkatesh, K. B. Raja, S. Bhattacharjee, P. Wasnik, S. Marcel, and C. Busch (2019) Custom silicone face masks - vulnerability of commercial face recognition systems and presentation attack detection. In IAPR/IEEE International Workshop on Biometrics and Forensics (IWBF), Cited by: §II-A.
  • [27] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In ACM SIGSAC Conference on Computer and Communications Security, pp. 1528–1540. Cited by: §II-A.
  • [28] L. Sun, G. Pan, Z. Wu, and S. Lao (2007) Blinking-based live face detection using conditional random fields. In IAPR/IEEE International Conference on Biometrics (ICB), pp. 252–260. Cited by: §II-A.
  • [29] L. Vittert, A. Bowman, and S. Katina (2017) Statistical models for manifold data with applications to the human face. Annals of Applied Statistics (AOAS). External Links: Review Sehe schriftliches Review im Ordner. Cited by: §II-B.
  • [30] T. Wang, J. Yang, Z. Lei, S. Liao, and S. Z. Li (2013) Face liveness detection using 3d structure recovered from a single camera. In IAPR/IEEE International Conference on Biometrics (ICB), pp. 1–6. Cited by: §II-A.
  • [31] D. Zhang and W. K. Kong (2005) Palm print identification using palm line orientation. Note: US Patent App. 10/872,878 Cited by: §I.
  • [32] S. Zhang, X. Wang, A. Liu, C. Zhao, J. Wan, S. Escalera, H. Shi, Z. Wang, and S. Z. Li (2019) A dataset and benchmark for large-scale multi-modal face anti-spoofing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 919–928. Cited by: §IV-A.
  • [33] Q. Zhen, D. Huang, H. Drira, B. B. Amor, Y. Wang, and M. Daoudi (2017) Magnifying subtle facial motions for effective 4d expression recognition. IEEE Transactions on Affective Computing (TAC). Cited by: §II-B.