In recent years, biometric authentication has been widely used as a reliable and convenient way of user identification and access control . Among all types of biometric features (e.g., fingerprint, voice, retina, and palm veins), facial characteristics gain increasing significance as digital images or videos can be easily captured by cameras readily available on smartphones and other mobile devices . Thus face authentication becomes popular in a wide range of application scenarios. Examples include SmartGate developed by the Australian Border Force and the New Zealand customers services for automated border passing , HSBC’s online banking for allowing customs to open a new account using a selfie , and Windows Hello face authentication in Windows 10 for logging in or unlocking one’s Microsoft Passport . The popularity of face authentication is also evidenced by the predicted global market growth at a compound annual growth rate (CAGR) of 9.5% from 2015 to 2022 .
However, a large body of research has demonstrated the vulnerability of face authentication systems under spoof attacks, where an adversary attempts to spoof the authentication system by mimicking facial features of a legitimate user . Based on the object used, the existing methods for spoofing a face authentication system can be roughly classified into four categories, namely, picture-based attacks, video-based attacks, mask-based attacks, and 2D/3D model attacks. For instance, an adversary in photo spoofing attacks can feed a photo of a specific face to a recognition system, while in video spoofing attacks, a video can be presented to provide more sequential information, e.g., environmental changes and transformations of facial components.
To defend against spoofing attacks, face liveness detection is proposed to distinguish between the image or video samples of a legitimate on-site user and the imitated ones . For instance, when applying for a new bank account, the applicant may be required to take specific actions as an evidence of liveness. The face authentication system is thus decomposed into two logically independent processes: face liveness detection and face recognition. Usually, the former is launched to ensure that the image or video samples are provided lively and by the genuine users, while the latter leverages these samples to determine whether the user is authorized. In this paper, we focus on the liveness detection process and aim at designing efficient solutions.
Face liveness detection has been studied over the past decade . Existing methods can be divided into two main categories according to the features used for drawing conclusions. The first category mainly focuses on extracting static features from single images to derive differences of environmental features (e.g., textures and light) between the image displaying surfaces and real faces [10, 11, 12, 13, 14, 15, 16]. These methods directly capture images and use them as input, which simplifies the procedure of collecting the necessary input data. However, the simplicity of input data makes them sensitive to environmental factors (e.g., illumination and image quality), which can have a severe impact on detection accuracy. The second category resorts to sequential images or videos to detect changes in environmental features or facial motions so as to match those changes with real situations [17, 18, 19, 20, 21, 22, 23, 24, 2, 25, 26]
. These approaches can better fend against spoofing attacks with a high detection accuracy. However, these approaches usually suffer from high computational and storage complexity as they introduce cumbersome operations, e.g., applying deep learning algorithms on consecutive images.
Inspired by existing studies which demonstrate the effectiveness of performing analyses over eye movements [27, 28, 26, 1], we explore the feasibility of detecting face liveness using iris trajectory caused by intentional eye movements. Although eye movement is an important sign of liveness, the following observations make it an extremely challenging task to precisely track the iris for face liveness detection. First, eye movements, stimulated by user-device interactions, usually introduce significant noise, e.g., an unconscious change of gaze of a user can lead to frequent and unexpected eye movements . Second, hardware-defined image adjustment strategies vary greatly in cameras and lead to different transformations of captured images, setting barriers to exact comparisons between actual and expected eye trajectories. For instance, a horizontal flip is usually applied to front cameras, making captured eye trajectories reversed. Finally, complex interaction patterns improve the security in defending against spoof attacks, but also reduce efficiency due to longer detection duration. Therefore, the trade-offs among detection accuracy, efficiency, and system security should be carefully studied.
To address these problems, we propose IriTrack, an efficient system to perform liveness detection by tracking iris changes of users. IriTrack collects iris positions and uses the derived trajectories to draw a conclusion. It requires no special hardware, and can therefore be used on any device equipped with a camera and a display. The main idea of the proposed system is to trade data acquisition complexity for computation complexity, which can be suitable for many applications.
We conducted experiments to test the sensitivities for eyes to track among different angles with various parameter combinations, by which we balance the trade-off between detection efficiency and accuracy. Experimental results demonstrate that IriTrack outperforms the state-of-the-art in terms of detection accuracy, with a moderate time overhead. IriTrack is also robust in environmental condition changes, such as light intensity and face-camera distance.
The main contribution in this paper is two-fold:
We propose IriTrack, a liveness detection system based on eye movement tracking which works on commercial devices with the ability of image capturing and data processing. IriTrack achieves computational simplicity and efficiency, without the need for training complex detection models.
We introduce a probability-based random pattern generation method to increase the ability for defending against potential attacks and to balance system performance. In order to get rid of the influence of unconscious eye movement on similarity evaluation, we propose a method to compare the skeleton of displayed patterns against collected eye trajectories.
The remainder of this paper is organized as follows: We present potential spoofing attacks and briefly summarize existing literatures in Section II. We describe the basic idea of IriTrack and highlight the challenges in Section III. Then, we present the design details of IriTrack in Section IV, followed by a security analysis in Section V. The implementation of IriTrack in a commercial device is discussed in Section VI and evaluated for efficiency and security in Section VII. Finally, we discuss the limitations of the proposed system in Section VIII and conclude the paper in Section IX.
Ii Related Work
In this section, we first present typical spoofing attacks that circumvent face recognition systems, and then briefly review existing methods for face liveness detection.
Ii-a Facial Spoofing Attacks
Generally, face recognition systems extract the identity of a face from one or multiple consecutive images. A common idea to deceive face recognition systems is to present facial image samples obtained from the intended target user [18, 30]. According to the sources from which facial image samples are obtained, facial spoofing attacks can be categorized as follows:
Picture-based attacks. Displaying face images such as photos or paintings is a convenient way to spoof face recognition systems. An adversary can offer face recognition systems with pictures of the target user to allow required facial features being detected.
Video-based attacks. Similar to pictures, videos are able to expose specific face features. More importantly, videos usually have the ability to provide face recognition systems with necessary sequential information about environmental changes and transformations of facial components.
2D/3D model attacks. An adversary can build 2D or 3D models of a valid user, which enables transformations of facial components as well as environmental conditions. By adjusting animations of each element, these models can be highly customizable.
Mask-based attacks. To impersonate face features while preserving environmental conditions, another straightforward idea is to equip an adversary with a face mask.
Ii-B Summary of Typical Face Liveness Detection Methods
Recently, many face liveness detection methods have been proposed to determine whether image samples are captured from a real user. According to the features they use, we can classify them into two main categories, each of which can further be classified into sub-categories, as shown in Table I.
Static features. Static features are referred as features that contain no transformations, or the alterations can all be regarded as extraneous. They can be divided into three types: the first two types are texture features and structure features, which in most cases can be obtained from single images, while the third type is human physical characteristics, which can be directly sensed by special hardware.
Texture features describe the appearance of specific objects and environmental conditions, e.g., the complexity of colour components within faces [10, 11, 12]. While structure features depict the information of captured images in its composition [13, 14]. For instance, the size of captured faces can be used as a clue for face liveness detection. Methods based on static features ignore transformation information in images. Thus, those methods usually take single images as input. Analyses over single image draw conclusions by contrasting differences between real faces and fake faces in shapes and details [10, 31, 32, 11, 33], as the displayed surface of a fake face usually exhibits detectable characteristics, e.g., colour differences, variety in image qualities, etc. Kim et al.  proposed an approach to distinguish a real live face from a masked face by differentiating both frequency and texture features. Dong et al.  proposed a liveness detection system, which utilizes the gradient of each colour channel in static images to distinguish between real and fake faces.
These methods are generally computationally inexpensive since they perform analysis only on single images, rather than on videos or sequential images. Moreover, using single images as input reduces the duration of capturing images, ensuring quick response times. However, they might be sensitive to illumination and image quality, as features extracted from single images contain limited information and easily affected by noise. Thus, they can be error-prone and unstable in varying environmental conditions.
Human physical characteristics are revealed to describe some properties that only a real person could own, e.g., skin temperature and skin resistance [15, 16]. To read features of this kind, in most cases, special hardware must be implemented to sense the data of interest. Such a detection can be of high accuracy as well as good efficiency since sensors can respond instantly with high precision. However, the hardware requirements would be an obstacle as such sensors bring extra implementation and maintenance cost. In addition, these special hardware may not be available on legacy devices.
Dynamic features. Generally, methods based on dynamic feature analyses take videos or sequential images as input, which provide transformation information of environmental and facial components in time series. Methods of this category try to make a judgement by matching environmental and facial changes with real situations [17, 18, 19, 20, 21, 22, 23, 24, 2, 25, 26].
Czajka et al.  proposed a solution based on analyses over changes of human irises. The method is based on the fact that human irises would have their size changed in different light intensity levels while printed irises would have no reaction to such changes. Chan et al.  presented a method by computing changes of both facial and environmental textures with and without an extra light source (e.g., a camera’s flash). They extracted 534 features based on 4 descriptors of faces and background, which are fed to an SVM classifier for liveness detection. The method requires strong stimulation (e.g., flash light) applied directly to user faces, which may affect the user experiences.
Compared with the methods based on single image analyses, methods in this category employ facial and environmental changes, which can better defend against spoofing attacks, but also enlarge the detection duration. The requirement of input data can increase storage overhead for capturing and saving images. The computational complexity is relatively high, as they perform analysis over a series of frames.
We pay special attention to solutions which take eye actions (e.g., movements and blinking) as a sign of liveness. Several methods need to precisely extract eye positions and require special helmet-like hardware or cameras [2, 25]. Czajka  proposed a solution which uses pupil reacts to light changes for liveness detection. In order to capture pupil dynamics (i.e., size), it requires changes of environmental light intensity, starting from complete darkness, which may be infeasible in practical usage. Moreover, pupil size can be altered in different psychological states (e.g., stress, relaxation, and drowsiness), leading to degradation of detection accuracy. Liu et al.  uses simple and unaltered patterns, making them less reliable in fending against spoofing attacks.
The system proposed in this paper captures and analyses motions of human eyes for liveness detection. Compared with existing methods, IriTrack needs neither pre-computation nor storage of additional data for training classifiers. It is also robust to environmental changes, such as light intensity and face-camera distance.
In this section, we present the basic idea of IriTrack, based on which we then give our reason for extracting similarity by comparing angles and probe into the feasibility for eyes tracking along with typical angles.
Iii-a Basic Idea
The idea of IriTrack is inspired by the widely used screen lock pattern systems in smartphones, where lines are drawn by a user over 9 or more dots displayed on the screen and then compared with a pre-defined pattern by an authorized user. The screen is unlocked if the two patterns are exactly the same. Similarly, IriTrack can make decision by comparing the trajectory of a user’s eye movements with a pre-defined patten consisting of a certain number of dots and lines.
The setting of pre-defined patterns is crucial to the security of a liveness detection system. A straightforward way is to be consistent with the screen lock pattern systems, where an authorized user can set a customized pattern in advance. Although simple, it may result in vulnerability as the pre-defined patterns could be leaked to potential attackers. Additionally, it also imposes the burden of pattern management to users, especially those of different liveness detection applications. Therefore, we offload the pattern setting operation to the liveness detection system, where a randomly generated pattern is displayed on the screen for a user.
As one has no prior knowledge about the pattern, it is difficult for a user to determine when to change his attentions. In order to help users gaze their gaze in an accurate way, IriTrack uses lines to guide users’ attention. More specifically, in IriTrack, a poly-line with dots inside will be generated and displayed on the screen. A user has to draw the poly-line by moving his/her eyes. The trajectory of his/her iris positions is recorded and compared with the given line to get a conclusion.
For clarity, we make several definitions as illustrated in Fig. 1. In each detection procedure, a pattern, which takes the form of an acyclic poly-line composed of connected line segments, will be randomly generated. We defer the generation strategy to the next section. Each pair of adjacent line segments possess an angle seated at their joint. Endpoints of each line segment are referred to as dots. Correspondingly, eye positions in captured images are called points.
IriTrack differs from the widely used pattern-based screen-lock systems. Patterns in screen lock systems are pre-defined by users and used as a way of authentication. However, patterns in IriTrack are randomly generated and used only for liveness detection, which is launched before the authentication process performed by face recognition. The randomness is employed in IriTrack to greatly reduce the possibility of forecasting a pattern by a spoofing attacker.
It is a non-trivial task to instantiate the above-mentioned idea, due to the following challenges:
Unconscious movement of eyes. Since IriTrack aims at tracking changes in iris positions caused by users’ attention shifts, the fundamental factor affecting the detected result is whether one’s eye movements have an anticipant representation. Existing studies  indicate that one’s gaze could exhibit unconscious rapid changes, which leads to unexpected eye movements. What’s worse, blinking eyes would also introduce noises in the observed iris trajectory.
Transformations of captured images. The cameras, operating systems, and hardware in devices can vary greatly due to various manufacturers, which causes the obtained images rendered in different representations. For instance, a surprising observation in our experiments is that some cameras record images in a horizontally flipped way while others are not. These uncertain transformations make the exact comparison between eye trajectories and patterns meaningless. Therefore, we should try to eliminate the impact of such an uncertainty.
Trade-offs between efficiency and accuracy. As described above, adjusting the number and length of line segments as well as the degree of angles results in various patterns. Obviously, a longer poly-line with more line segments and angles will prolong the duration of detection, but also help remove noises in trajectory extraction and thereby improve detection accuracy. Thus, it is desirable to strike a balance between efficiency and accuracy.
Iii-C Sensitivity of Tracking Angles
In our design, users are required to shift their gaze along with a randomly generated pattern, where the recorded trajectories are then compared with the given poly-lines for making detection conclusion. However, unconscious eye movements along with eye blinking would result in unpredictable positions of irises, which causes indeterministic deviation from the poly-lines. Besides, transformations due to hardware diversity have an influence on the phase of similarity comparison. Thus, it is extremely difficult to achieve an exact match between the poly-lines and collected trajectories.
In order to address this challenge, we turn to track eye movements at the critical endpoints in the poly-line. More specifically, we view the angles between each pair of adjacent line segments as the skeleton of a pattern, and attempt to measure the similarity between the skeleton and the eye movements when angles occur.
To validate the feasibility of the above-mentioned idea, we conduct experiments to evaluate the sensitivity of tracking eye movements at angles (cf. Section VII-B for more details). In the experiments, two lines with an angle at their conjunct endpoint are displayed on the screen, and the positions of pupils are recorded when testers shift their gaze along the given poly-line. We measure the angle from the tracked irises’ positions, and then calculate the deviation of a measured angle from its real value. Methods for locating iris positions and measuring angles are deferred to the next section.
For the sake of simplicity, we assume that angles on a poly-line are restricted to 6 typical degrees, i.e., . Experimental results are shown in Fig. 2, where the two numbers at each point indicate the real value of an angle and its deviation, respectively. From the results, we can learn that it is possible to track eye movements for typical angles. But the sensitivity varies among different angles, e.g., angles of and are more difficult to track, which should be carefully considered in pattern design.
Iv Design of IriTrack
In this section, we present the workflow and design details of IriTrack.
Iv-a System Overview
We build our system based on two primary facts . First, one can keep staring at a specific object for a relatively short time (e.g., 5 seconds). Second, tracking any specific object with eyes causes detectable changes of relative distances between eye regions and the center of irises in the prerequisite of keeping one’s head still.
IriTrack’s system architecture is illustrated in Fig. 3, and is mainly composed of three components, namely pattern generation, iris tracking, and similarity measurement. The design details of each module will be described in the following subsections.
The workflow of liveness detection can be described as follows: IriTrack randomly generates and displays a pattern on the screen as requested by a user. Then, the user is required to follow the pattern with his eyes and the trajectories of irises can be recorded by a camera. During this process, the user is required to try his best to keeping his head still. Finally, the collected trajectories and the given pattern are fed to the similarity measurement module for drawing a conclusion.
Since liveness detection samples must be used for recognition as well, the liveness face images, from which iris patterns are retrieved in IriTrack, do not affect the recognition by certain cutting-edge recognition algorithms if used as recognition input. As face recognition is logically independent of liveness detection, we focus on the design details of IriTrack hereafter.
Iv-B Pattern Generation
As stated, a pattern is a poly-line consisting of line segments. To help users concentrate and balance their tracking speed, a slipper which moves along with patterns at a constant speed is also displayed. All patterns need to be arbitrarily generated to avoid potential spoofing attacks. In our design, we take consideration of the following two factors in pattern generation: 1) the capability in fending against spoofing attacks, and 2) time efficiency of tracking iris positions.
A pattern is denoted by , where and are angle set and line segment set in the pattern, respectively. The generated patterns should be random enough, or attackers may take preparations in advance if a pattern can be easily speculated. To allow the randomness of patterns, we apply probabilities when generating angles and lines. Recall that contains typical angles from which an angle in a pattern can be selected. For each angle , we associate a weight , which indicates the probability of accurately following such an angle by eye movements. The notations used in the rest of this paper are summarized in Table II.
We denote as the probability of setting the current angle to be (). can be calculated as follows:
It can be noted that the higher the number of dots used, the more difficult a spoofing attack succeeds. Let be the total number of dots in a generated pattern. Here, we assume (i.e., at least two angles in a pattern) for security considerations. There would be possible combinations of angles. Meanwhile, line segments appearing in a pattern are randomly selected from a pre-defined set , thus there would be possible combinations of line segments. It should be noted that increasing the number of dots can cause increment of the time spent gathering iris tracks, and also make users impatient, which can affect the accuracy of tracking. As the slipper moves at a constant speed , the time cost should be directly proportional to .
To achieve a balance between time cost, security against spoofing attacks, and tracking accuracy, we resort to a probability-based model of pattern generation, where we start from a pattern with only one line segment (i.e., two dots), and iteratively determines whether a new line (also a new angle) should be added to the current pattern, as stated in Algorithm II.
In each iteration, assume that () dots already exist in pattern , we use to denote the probability of adding the -th dot in , as shown in Eq. eq:pl. Algorithm II shows how to determine whether a dot should be added.
|The set containing typical angle degrees|
|The set containing typical lengths of line segments|
|A generated pattern|
|The set containing angles in|
|The set containing line segments in|
|The total length of line segments in|
|The total amount of dots in|
|The constant moving speed of the slipper in|
|The set containing sequentially recorded eye positions|
|The probability of setting the next angle as|
|The probability of adding the -th dot into a pattern|
[t] as the index of the next dot to be generated whether a new dot should be added to the pattern
Next, considering the two key factors mentioned in the beginning of this section, the goodness of the generated pattern should be measured to ensure that the pattern is secure enough to resist against spoofing attacks and requires moderate tracking time. Given a generated pattern , we use to describe its goodness, which is calculated in Eq. eq:goodness.
The coefficient is a measure of the randomness of which is directly associated with the strength in fending against spoofing attacks. The denominator ensures that a pattern with less line segments is more likely to be accepted as the time overhead for tracking can be reduced. Additionally, we employ an exponential function to introduce a rapid drop in goodness when the time overhead increases. The rest signifies the efficiency for eyes to track angles. A better pattern should have a higher value of . A pre-defined constant is introduced and each valid pattern must satisfy the condition . The setting of will be described in Section VII.
Moreover, as the pattern would be displayed in the screen, it should be guaranteed that all dots are placed within the bound of the screen. Meanwhile, to reduce the confusion for users when tracking, we stipulate artificially that all lines and dots in a pattern are not allowed to overlap.
Finally, the above conditions are considered together to determines whether to return the current pattern or generate a new pattern (Line 9 in Algorithm II).
Pattern true // There are 2 dots in the current pattern. // Index of the next dot would be . and and
Iv-C Iris Tracking
The tracking module utilizes the embedded camera to grab facial images, which are used to identify the center of each iris and track the movements of irises.
As the module starts working, the camera acquires images at a fixed frequency. Given a facial image, the Daugman’s integrodifferential operator  is employed to detect the center of irises. To find a circular path that fixes the contour of each iris, the algorithm tries every combination of center position and radius to detect the path with the maximum change of pixel values. It can be expressed by the following equation:
where is the input image, is the pixel value in the corresponding position , is the radius of the detected area, and is the Gaussian smoothing function.
As we need only transformations of iris positions, the coordinates with values of and are recorded, but the detected radius of each iris is simply ignored.
Iv-D Similarity Measurement
As shown in Fig. 3, a randomly generated pattern along with the collected eye trajectories would be passed to our measurement module. The main task of this stage is to recover the skeleton from eye movements and compare the similarity between the skeleton with the given pattern.
Based on the assumption that the gaze of eyes moves at a uniform speed, the coordinates of tracked dots can be proportionally divided according to the length of each line segment. Given a pattern , let , where () is the length of the -th line in . The total length of the poly-line can be denoted by . Denoting as the set of recorded dots where , the position of the -th dot in the pattern can be recovered as follows:
where . Sequentially taking three dots recovered from the tracked points, the angles in degrees can be easily obtained using the law of Cosines. According to Eq. eq:divide_track, distances between the adjacent dots are the rule to recover relative positions among dots, while angles calculated consecutively are the evidence to judge whether the movement of irises are similar to the given pattern.
In IriTrack, the PC screen and camera captured images usually have two different coordinate systems. Since we use angles for similarity measurement, the calculation involved is irrelevant to the coordinate systems.
As mentioned above, we assign weights to different angles. An angle with a higher weight can be followed with less disparity, and the difference between the angle and its measured value can be more credible. We introduce the matching cost to describe the dissimilarity between the original pattern and the tracked trajectory, as shown in Eq. eq:cost:
where is the actual value of an angle in the given pattern, and represents its measured result. A pre-defined constant threshold is involved. If , we consider that the face in front of the camera comes from a live person.
V Security Analysis
As described above, IriTrack uses eye movements as the evidence for determining the liveness of a presented face. In this section, we discuss the security guarantees provided by IriTrack against the potential attacks presented in Section II.
Picture-based attacks. Faces recorded by pictures (e.g., photos) are inherently different from real faces, because the irises in pictures are static.. As a result, to cheat IriTrack, an attacker must move the picture along with a same path as the displayed poly-line. However, this would result in a relatively large range of face movement. By analysing the region of face movements during this process, IriTrack can easily figure out that the trajectory is derived mainly from face movements, rather than iris movements.
Video-based attacks. Videos recording eye movements may be used to deceive IriTrack. In order to succeed in passing the verification, an attacker should present a video displaying a series of eye movements which match the generated poly-line. As the poly-line is generated with a high degree of randomness (e.g., the length of segments and the degree of angles), it is difficult to spoof IriTrack without a prior knowledge of the displayed ploy-line. Experimental results will be presented in Section VII.
2D/3D model attacks. Although a model can have moveable facial components, changing the movements of facial parts usually needs time-consuming reprogramming. Thus, a time-out rule can be involved to prevent programming operations. That is, IriTrack can trigger a time-out rule and terminate the detection process with a rejection once the tracking module fails to record eye movement within a certain period.
Mask-based attacks. Masks of faces expose specific facial features to IriTrack. Similar to pictures, masks are not able to provide irises transformations as eyes within masks are not moveable. Thus, the same idea of detecting pictures attacks can be applied. As a variation of mask-based attacks, an adversary may use a mask which have some level of transparency around eyes such that a camera still detect iris movements. We will discuss this special case in Section VIII.
We have implemented a prototype of IriTrack on a PC with Windows 10. This section presents the implementation details.
During pattern generation, we use a pseudo-random number generator to simulate probabilities. An alternative way is to obtain random number generator via RANDOM.ORG .
The tracking module utilizes OpenCV to invoke image-related functions, e.g., recognizing regions of faces and eyes. We use pre-trained Haar classifiers to search for regions of the largest face as well as both the left and right eyes. With the help of the eye classifier, IriTrack can successfully detect regions of eyes either with or without glasses. By limiting search within the regions of eyes, the locating algorithm is greatly accelerated. Fig. 4 demonstrates the result of recognizing regions of interest within a captured face.
As stated earlier, IriTrack is supposed to capture points at a fixed frequency. When extracting angles from the tracked points, the captured points can be proportionally divided based on the lengths of line segments. In other words, the position of a given dot can be derived by the point in the corresponding index from the obtained point sequence as illustrated by Eq. eq:divide_track. In our implementation, for each dot in a pattern, we select the corresponding point as well as the 2 nearby points. That is, we take 3 points for each dot as its candidates. We maintain those selected candidate sets in a list, from which we sequentially take 3 adjacent sets to calculate angles. For 27 combinations of coordinates respectively selected from the 3 sets, we can get the containing angle by applying the arc-cosine function. Finally, we can simply select the most frequent value from the 27 candidates as the final result.
However, in our experiment, we notice that irises’ positions may not be strictly periodically recorded as the processing time may differ for each frame, especially in the situation where some irrelevant background tasks are executed concurrently in the host device. The difference between sampling intervals may cause a significant effect in positioning interested dots as well as measuring angles. As a result, some revisions must be applied to fix the inaccuracy caused by the uneven scatter of analyses in time series. In our system, we use the captured timestamp of two adjacent points to predicate the position of the user’s gaze in a specific moment. When tracking the position of the user’s irises, the tracking module is designed to record theand coordinates as well as the timestamp when the currently analysed frame captured. We denote to be the extracted information from a center point where represents the time when the position of is concluded. Having the corresponding data of point and , we can predicate the position of a point , which is supposed to be recorded at a specific moment .
By introducing the timestamp based correction, we can then recover the turning points in a more precise way. We subtract the timestamp of the last recorded point from the timestamp of the first point to extract the duration of the whole process so we can divide the time gap according to the lengths of generated line segments to get the recording moment of turning points.
The goals of our evaluation are: 1) exploring parameters that achieve a balance between time overhead and accuracy of the detection process, 2) showing the efficiency and security of our system by comparing with state-of-the-art methods, 3) demonstrating the system performance with various pattern scales, and 4) estimating the reliability of the proposed system under varying environmental conditions.
Methods to Compare. We select several representing liveness detection systems for performance comparison, which are listed as follows:
IriTrack, which is the main work of this paper. The timestamp-based optimization is involved in similarity measurement.
ncIriTrack, which is the same as IriTrack except the timestamp-based optimization.
FlashSys, which is the flash-related face liveness detection system proposed by Chan et al. .
OptFlowSys, which is proposed by Bao et al.  to detect face liveness based on the optical flow field.
Testbed. The system is deployed on a PC, with 16GB RAM and one Intel Dual-Core i7-6600U CPU. The main camera carries an OV5693 sensor and captures images with a size of in pixels. 18 volunteers participated in evaluating the accuracy of the selected methods. The heads of volunteers should be kept as still as possible in detection process.
As described in Section V, video-based attacks are capable of imitating iris movements of real users. Thus, we mainly ponder the possibility for video attacks to spoof IriTrack. We assume that a potential adversary can learn typical parameters of IriTrack, such as angle type and segment length. To simulate these attacks, we record 50 different video clips (with random combinations of these parameters) for each of the 18 volunteers (i.e., 900 clips in total) with consistent indoor light intensity of 350lux. We also test several scenarios with varying environmental factors to evaluate the flexibility of the proposed system against environmental changes.
Summary of experimental results.
Among all potential combinations of parameter values, we find candidates that achieve a better balance between time overhead and detection accuracy, i.e., and . Angles in 45deg and 90deg are hard for following so weights for these two kinds are relatively low.
The average time overhead of liveness detection with IriTrack is roughly 3,845ms, which is dominated by the tracking module. IriTrack achieves higher detection accuracy in detecting 2D spoofing attacks, with an score of 95.4%.
The probability-based random pattern generation model can reach a balance between processing time and detection accuracy.
The performance of IriTrack can be maintained in a relatively stable and high level when environmental conditions change. Lowering circumstance brightness can help increase detection accuracy.
Vii-B Evaluation of Impacts of Parameters on Time and Accuracy
Now, we investigate how the time cost and accuracy vary according to different values of the parameters in IriTrack.
As stated above, a slipper moving along the poly-line displayed on the screen is employed to help users to focus on the path and adjust the movement speed of their eyes. Thus, the time spent on iris tracking is positively correlated with the ratio of the total length of the given poly-line to the speed of the slipper. Intuitively, a shorter path with a faster slipper would significantly reduce the time interval for collecting trajectories. However, a fast-moving slipper may make users feel uncomfortable and also reduce the number of captured points, leading to a significant decrease of measurement accuracy. Therefore, we focus on trade-offs between time overhead and accuracy with varying parameter settings.
|per pattern||c# total|
Dataset. In order to clearly understand the impact of different parameters, the generated pattern is determined and simplified into a poly-line consisting of only 3 dots (i.e., 2 segments with a single angle). We assign the two segments with the same length, thus the total length of the line segments in a pattern, , is twice the length of each segment. As summarized in Table III, all combinations of parameters , and result in 180 unique patterns. Given a specific pattern, 4 volunteers are involved and each completes 10 times. The following figures show the average results of each pattern.
The average time cost of the 6 typical kinds of angles with varying and are plotted in Fig. 5. We can find that at each fixed moving speed, the time spent on tracking grows as the line length increases. Thus, shorter lines contribute to a reduction in tracking time. When fixing the line length, speeding up the slipper’s movement also reduces the time overhead for tracking. Thus, to achieve more efficiency in terms of time cost, combinations of shorter lines and higher speed are preferred.
With the same settings as in Fig. 5, we exhibit the average matching deviation of angles with varying speeds and line lengths in Fig. 6. We observe two typical combinations leading to higher deviation, which are referred to as underspeed and overspeed cases. The underspeed cases happen when setting a low speed with relatively longer lines, e.g., the rightmost two bars at the speed of 100, as users would unconsciously try to predict the position of the slipper, making the tracking speed vary during the verification process. The overspeed cases happen when setting a high speed with relatively shorter lines, e.g., the length of 100 at a speed larger than 100. This is because shorter lines restrict eye movements with in a rather small area on the screen, making IriTrack more difficult to recover trajectory accurately.
Parameter selection result. An appropriate combination of line length and moving speed leads to a better balance between accuracy and time efficiency. According to the results depicted in Figs. 5 and 6, we set and hereafter.
Recall that each angle is associated with a weight, indicating the probability of it being selected when generating a pattern. Now, we describe the rationale for weight assignment. Among the 6 angles, the average of disparities between the measured and actual angles reaches a value of 20deg. However, considering angles of 45deg and 90deg which are harder for tracking, to ensure the performance for these 2 kinds, we set which is their average deviation.
For each of selected 6 kinds of angles, by calculating the frequency that a corresponding test case has a disparity no larger than , we assign such frequencies as weights, as shown in Table IV. In Fig. 7, we demonstrate the matching deviation for each angle using the parameters recommended above. For instance, angles of 45deg and 90deg are relatively difficult for tracking and thereby their weights are lower than those of the rest angles.
Based on the probabilities for generating different angles, , we utilize the highest, lowest, and average probabilities, to estimate the goodness of a pattern consisting of a certain number of angles. In general, a pattern should at least contain 2 angles with a corresponding goodness of 1.4. We take it as the baseline of goodness and set .
Vii-C Evaluation of Performance of IriTrack
Using the parameters determined above, we now evaluate the performance of IriTrack versus other counterparts. We employ the well-known accuracy criteria, i.e., precision (indicating the percentage of real faces detected in all instances detected as real) and recall (indicating the percentage of real faces detected in all real faces in the ground-truth). Moreover, is calculated as .
Dataset. Each of the 18 volunteers is tested 40 times, which leads to a total number of genuine cases. We also simulate 720 attack cases, which are conducted as follows: Considering that the pattern in IriTrack is generated randomly, we replay a clip selected randomly from the 900 clips to spoof the targeted detection systems at each round of detection.
Time efficiency. For the 4 selected face liveness detection systems, we record their average time costs for detection. OptFlowSys spends the most time as it requires the tester’s head to swing slowly for several times while detecting the directional changes of optical flow. On the contrary, FlashSys needs the least time, as it captures and compares only two images in each round of detection, i.e., one without external light source and the other with flash turned on. However, the flash light is directly applied to the face of testers during each procedure, making the system less user-friendly. IriTrack holds a tolerable time cost, i.e. less than 4 seconds, which is comparable to that achieved by ncIriTrack.
Accuracy and security. In order to reduce the influence of environmental factors, the selected methods are tested simultaneously. Besides tests with real persons, we present several instances of video attacks. The video attacks are conducted as follows: A series of video clips recording random iris movements are prepared in advance, and one video clip is randomly selected and displayed in front of the camera, attempting to cheat the liveness detection system.
The detection accuracy of each system is presented in Table V. IriTrack achieves the best performance in distinguishing between live real faces and fake faces. In IriTrack, patterns are generated with a random number of angles and lines, where the degree of each angle and the length of each line are also randomly selected from given sets. This greatly reduces the probability that a video attack successfully predicts a pattern.
The fundamental goal of liveness detection is to be accurate, e.g., identifying more spoofing attacks in the ground truth, and avoiding false alarms. Thus, compared with FlashSys, one may prefer to use IriTrack for achieving higher accuracy with a slight increase of detection delay.
From the results collected at this stage, we pick 30 subsets, each of which contains detection results of randomly selected 50 cases (i.e., half with the genuine cases and half with the attack cases). Then,
score of each subset is calculated. We find that for both FlashSys and OptFlowSys, there are statistically significant difference with 95% confidence in comparison with IriTrack using the Student’s t-test.
The effectiveness of the timestamp-based optimization in IriTrack can be demonstrated by the comparison between ncIriTrack and IriTrack, as shown in Table VI. With the help of timestamps, moving angles of irises can be more precisely recovered. Therefore, with IriTrack, more legitimate testers get passed (i.e., 95.6% vs. 80.4%) and more video attacks are successfully recognized (i.e., 95.2% vs. 86.4%).
Summary of performances. The experiment results show that with IriTrack, the detection process takes less than 4 seconds and the score reaches 95.4%. Thus, IriTrack owns the highest detection accuracy with a moderate time overhead.
Vii-D Investigation of Pattern Scales
The performance of IriTrack is largely determined by the generated patterns. This subsection investigates how the performance varies with patterns in different scales. We classify all the generated patterns according to the number of angles they contain.
As stated in the last subsection, time and accuracy are crucial indicators of performance. Table VII reveals the results of the experiments, the last column indicates the detection accuracy when a pattern is tested by video spoofing attacks. The most complex pattern has the highest security and also the the highest time overhead. Generally, a pattern with more angles can certainly possess more line segments as well as break points, which results in growth of time consumption. Noticing that over all kinds of patterns, the security can be maintained in a relatively high level.
Table VII also demonstrates that the probability-based model for pattern generation provides a flexible way to balance the tradeoffs between time efficiency and security.
|Angle Count||Tracking Time||Pattern Frequency|
Vii-E Evaluation of Environmental Impacts
This subsection evaluates the effect of the environment on the performance of IriTrack, including the light conditions and face-camera distances.
|cIntensity (lux)||Intuitive description||score|
|1||cIndoor, evening, screen light only||96.6%|
|25||cIndoor, evening, with daylight lamp||95.6%|
|150||cIndoor, afternoon, curtain closed||95.2%|
|350||cIndoor, afternoon, natural light||95.4%|
|830||cIndoor, afternoon, near a window||94.2%|
|2700||cOutdoor, afternoon, cloudy||91.7%|
|10000||cOutdoor, afternoon, sunny||91.6%|
Light Intensity. In liveness detection systems, images of users are taken by cameras for further analysis. All previous experiments are conducted in a general indoor condition with a light intensity of 350lux. Next, we keep the brightness of the displaying screen at the same level (i.e. 250lux) and evaluate the performance of IriTrack by varying the environmental light intensities. For video attacks, we keep using the same video dataset and replay strategy as mentioned earlier. Note that the device for replaying attack clips has a screen, which increases the environmental light intensity by 200lux on average.
The results are summarized in Table VIII. We can find that the detection accuracy in terms of score maintains at a relatively high level as the environmental light intensity changes. An intensive sunlight slightly reduces the accuracy for detecting face regions, because the screen in such a condition can be comparatively darker, making it harder for the testers to keep focused.
Face-Camera Distance. Distance between the face and camera will influence the size of faces in obtained images, e.g., a shorter distance helps get a larger face with more details of iris movements. Daugman’s algorithm used in IriTrack searches irises with radiuses in a pre-defined range. That is, to make irises successfully and accurately detected, testers have to put their heads at a proper distance to the camera so that each of the captured irises could have an appropriate size for further detection.
The results are exhibited in Table IX. We can find that the detection accuracy in terms of score reaches a steady level, as long as the face-camera distance is appropriate where irises can be traceable.
Being different from most existing liveness detection methods, IriTrack does not rely on direct analysis on images acquired by cameras, thus it needs no online or offline training of image classifiers for liveness detection. We have shown its effectiveness in the previous section. This section mainly discusses issues that might affect its performance in practice.
Compatibility on different devices. Screens displaying the generated patterns may differ in their physical sizes (in terms of inches) and effective rendering sizes (in terms of pixels). A physically small screen may have a larger pixel density, which makes a line rendered visually shorter. To get a consistent displaying effect on different devices, the pixels per inch (PPI) parameter can be involved, which converts lengths in pixels into values in device-independent inches by simple multiplications.
Defense against advanced mask-based attacks. As mentioned earlier in Section V, an adversary may use a mask which enables camera-detectable eye movement to spoof IriTrack. As a liveness detection system, IriTrack is only responsible for verifying if a user is alive, irrespective of the user is authorized or not. In general, existing liveness detection systems which take eye reaction (e.g., movement and blinking) as an alive sign are vulnerable to such advanced attacks. To defend against these attacks, static feature analysis approaches [10, 12] can be incorporated into IriTrack, since masks are different from real faces in textures.
Assumption on user concentration. It is worth noticing that the heads of users should be kept as still as possible, as intensive jitters occur when recognizing face regions by OpenCV even though positions of a head are changed negligibly. Currently, IriTrack records the global positions of irises for each frame. In order to improve the steadiness of algorithms that locate face regions, the iris tracking module can use the relative positions between a face and the irises to identify the movement of irises. In this case, such an assumption will no longer be necessary.
We leave these improvement attempts as future work.
In this paper, we proposed a face liveness detection system named IriTrack, which performs detection by comparing iris trajectories with randomly generated patterns. Each module in IriTrack does not require special hardware and is easy to implement on commercial devices. Extensive experimental results demonstrated the effectiveness of IriTrack in fending against video-based spoofing attacks. In future work, we will further improve the time efficiency and compatibility of the proposed system.
-  Y. Zhang, W. Hu, W. Xu, C. T. Chou, and J. Hu, “Continuous authentication using eye movement response of implicit visual stimuli,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 1, pp. 177:1–177:22, Jan. 2018.
-  S. Thavalengal, T. Nedelcu, P. Bigioi, and P. Corcoran, “Iris liveness detection for next generation smartphones,” IEEE Transactions on Consumer Electronics, vol. 62, pp. 95–102, May 2016.
-  “Arrivals smartgate.” https://www.homeaffairs.gov.au/trav/ente/goin/arrival/smartgateor-epassport. Accessed May 24, 2018.
-  “Hsbc customers can open new bank accounts using a selfie.” https://www.cnbc.com/2016/09/05/hsbc-customers-can-open-new-bank-accounts-using-a-selfie.html. Accessed May 22, 2018.
-  “Windows hello — windows 10 — microsoft.” https://www.microsoft.com/en-us/windows/windows-hello. Accessed May 24, 2018.
-  “Facial recognition market- global industry analysis and forecast 2015-2022.” https://www.transparencymarketresearch.com/facial-recognition-market.html. (Accessed on 05/24/2018).
-  I. Chingovska, A. R. d. Anjos, and S. Marcel, “Biometrics evaluation under spoofing attacks,” IEEE Transactions on Information Forensics and Security, vol. 9, pp. 2264–2276, Dec 2014.
-  Z. Akhtar, C. Micheloni, and G. L. Foresti, “Biometric liveness detection: Challenges and research opportunities,” IEEE Security Privacy, vol. 13, pp. 63–72, Sept 2015.
-  R. Karunya and S. Kumaresan, “A study of liveness detection in fingerprint and iris recognition systems using image quality assessment,” in 2015 International Conference on Advanced Computing and Communication Systems, pp. 1–5, Jan 2015.
-  G. Kim, S. Eum, J. K. Suhr, D. I. Kim, K. R. Park, and J. Kim, “Face liveness detection based on texture and frequency analyses,” in 2012 5th IAPR International Conference on Biometrics (ICB), pp. 67–72, March 2012.
-  J. Dong, C. Tian, and Y. Xu, “Face liveness detection using color gradient features,” in 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp. 377–382, Dec 2017.
-  Z. Boulkenafet, J. Komulainen, and A. Hadid, “Face spoofing detection using colour texture analysis,” IEEE Transactions on Information Forensics and Security, vol. 11, pp. 1818–1830, Aug 2016.
-  W. Kim, S. Suh, and J. J. Han, “Face liveness detection from a single image via diffusion speed model,” IEEE Transactions on Image Processing, vol. 24, pp. 2456–2465, Aug 2015.
-  J. Galbally, S. Marcel, and J. Fierrez, “Image quality assessment for fake biometric detection: Application to iris, fingerprint, and face recognition,” IEEE Transactions on Image Processing, vol. 23, pp. 710–724, Feb 2014.
-  A. Czajka and P. Bulwan, “Biometric verification based on hand thermal images,” in 2013 International Conference on Biometrics (ICB), pp. 1–6, June 2013.
-  A. Lagorio, M. Tistarelli, M. Cadoni, C. Fookes, and S. Sridharan, “Liveness detection based on 3d face shape analysis,” in 2013 International Workshop on Biometrics and Forensics (IWBF), pp. 1–4, April 2013.
-  P. P. K. Chan, W. Liu, D. Chen, D. S. Yeung, F. Zhang, X. Wang, and C. C. Hsu, “Face liveness detection using a flash against 2d spoofing attack,” IEEE Transactions on Information Forensics and Security, vol. 13, pp. 521–534, Feb 2018.
-  S. Tirunagari, N. Poh, D. Windridge, A. Iorliam, N. Suki, and A. T. S. Ho, “Detection of face spoofing using visual dynamics,” IEEE Transactions on Information Forensics and Security, vol. 10, pp. 762–777, April 2015.
L. Yang, “Face liveness detection by focusing on frontal faces and image
2014 International Conference on Wavelet Analysis and Pattern Recognition, pp. 93–97, July 2014.
-  T. W. Lee, G. H. Ju, H. S. Liu, and Y. S. Wu, “Liveness detection using frequency entropy of image sequences,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2367–2370, May 2013.
-  W. Bao, H. Li, N. Li, and W. Jiang, “A liveness detection method for face recognition based on optical flow field,” in 2009 International Conference on Image Analysis and Signal Processing, pp. 233–236, April 2009.
-  M. Smiatacz, “Liveness measurements using optical flow for biometric person authentication,” Metrology and Measurement Systems, vol. 19, no. 2, pp. 257–268, 2012.
-  A. Czajka, “Pupil dynamics for iris liveness detection,” IEEE Transactions on Information Forensics and Security, vol. 10, pp. 726–735, April 2015.
G. Pan, L. Sun, Z. Wu, and S. Lao, “Eyeblink-based anti-spoofing in face
recognition from a generic webcamera,” in
2007 IEEE 11th International Conference on Computer Vision, pp. 1–8, Oct 2007.
-  O. V. Komogortsev, A. Karpov, and C. D. Holland, “Attack of mechanical replicas: Liveness detection with eye movements,” IEEE Transactions on Information Forensics and Security, vol. 10, pp. 716–725, April 2015.
-  D. Liu, B. Dong, X. Gao, and H. Wang, “Exploiting eye tracking for smartphone authentication,” in Applied Cryptography and Network Security (T. Malkin, V. Kolesnikov, A. B. Lewko, and M. Polychronakis, eds.), (Cham), pp. 457–477, Springer International Publishing, 2015.
-  L. Wu, X. Du, and X. Fu, “Security threats to mobile multimedia applications: Camera-based attacks on mobile phones,” IEEE Communications Magazine, vol. 52, pp. 80–87, March 2014.
-  I. Rigas and O. V. Komogortsev, “Gaze estimation as a framework for iris liveness detection,” in IEEE International Joint Conference on Biometrics, pp. 1–8, Sept 2014.
-  C. Galdi, M. Nappi, D. Riccio, and H. Wechsler, “Eye movement analysis for human authentication: a critical survey,” Pattern Recognition Letters, vol. 84, pp. 272 – 283, 2016.
-  N. Kose and J. L. Dugelay, “On the vulnerability of face recognition systems to spoofing mask attacks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2357–2361, May 2013.
-  F. Pala and B. Bhanu, “Iris liveness detection by relative distance comparisons,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 664–671, July 2017.
-  M. Kumar and N. B. Puhan, “Iris liveness detection using texture segmentation,” in 2015 Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4, Dec 2015.
-  C. H. Yeh and H. H. Chang, “Face liveness detection with feature discrimination between sharpness and blurriness,” in 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), pp. 398–401, May 2017.
-  R. B. Dubey and A. Madan, “Article: Iris localization using daugman’s intero-differential operator,” International Journal of Computer Applications, vol. 93, pp. 6–12, May 2014. Full text available.
-  M. Haahr, “Random.org - statistical analysis.” https://www.random.org/analysis/. Accessed June 20, 2017.