Statistical Meta-Analysis of Presentation Attacks for Secure Multibiometric Systems

09/06/2016 ∙ by Battista Biggio, et al. ∙ Universita Cagliari 0

Prior work has shown that multibiometric systems are vulnerable to presentation attacks, assuming that their matching score distribution is identical to that of genuine users, without fabricating any fake trait. We have recently shown that this assumption is not representative of current fingerprint and face presentation attacks, leading one to overestimate the vulnerability of multibiometric systems, and to design less effective fusion rules. In this paper, we overcome these limitations by proposing a statistical meta-model of face and fingerprint presentation attacks that characterizes a wider family of fake score distributions, including distributions of known and, potentially, unknown attacks. This allows us to perform a thorough security evaluation of multibiometric systems against presentation attacks, quantifying how their vulnerability may vary also under attacks that are different from those considered during design, through an uncertainty analysis. We empirically show that our approach can reliably predict the performance of multibiometric systems even under never-before-seen face and fingerprint presentation attacks, and that the secure fusion rules designed using our approach can exhibit an improved trade-off between the performance in the absence and in the presence of attack. We finally argue that our method can be extended to other biometrics besides faces and fingerprints.



There are no comments yet.


page 6

page 11

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The widespread use of biometric identity recognition systems is severely limited by security threats arising from several kinds of potential attacks [1, 2, 3]. In particular, the use of fake biometric traits (e.g., gummy fingers [4]), also known as “direct”, “spoofing” or “presentation” attacks [1, 2, 5, 6, 7]

, has the greatest practical relevance as it does not require advanced technical skills. Accordingly, the potential number of attackers is very large. Vulnerability to presentation attacks has been studied mostly for unimodal fingerprint and face recognition systems 

[8, 4, 9, 10]. Multibiometric systems have been considered intrinsically more secure, instead, based on the intuition that an attacker should spoof all the biometric traits to successfully impersonate the targeted client [11, 12].

This claim has been questioned in subsequent work, showing that an attacker may successfully spoof a multibiometric system even if only one (or a subset) of the combined traits is counterfeited, if the fusion rule is not designed by taking that into account [13, 14, 15]. From a computer security perspective, this is not surprising: it is well-known that the security of a system relies the most on the security of its weakest link [16]. This vulnerability of multibiometric fusion has been shown under the assumption that the fake score distribution at the output of the attacked matcher is identical to that of genuine users, without fabricating any fake trait. Under the same assumption, more secure fusion rules against presentation attacks have also been proposed.

In our recent work, through an extensive experimental analysis, we have shown that the aforementioned assumption is not representative of current face and fingerprint presentation attacks [17, 18, 19]. In fact, their fake score distributions do not only rarely match those of genuine users, but they can also be very different, depending on the technique, materials, and source images used to fabricate the presentation attack; i.e., presentation attacks can have a different impact on the output of the targeted matcher. For these reasons, the methodology proposed in [13, 14, 15] may not only provide an overly-pessimistic security evaluation of multibiometric systems to presentation attacks, but also lead one to design secure fusion rules that exhibit a too pessimistic trade-off between the performance in the absence of attack and that under attack.

To perform a more complete analysis of the security of multibiometric systems, as shown in [17, 18, 19, 20, 21], considering only one or a few known attacks is not sufficient. One should thus face the cumbersome and time-consuming task of collecting or fabricating a large, representative set of presentation attacks (with different impact on the attacked matchers) for each biometric characteristic, and evaluate their subsequent effect on the fusion rule and on system security. However, even in this case, one may not be able to understand how multibiometric systems may behave under scenarios that are different from those considered during design, including presentation attacks fabricated with novel materials or techniques, that may produce different, never-before-seen fake score distributions at the matchers’ output.

In this paper, we propose a methodology for evaluating the security of multibiometric systems to presentation attacks, and for designing secure fusion rules, that overcomes the aforementioned limitations. In particular, we start from the underlying idea of [13, 14, 15], based on simulating fake score distributions at the matchers’ output (without fabricating fake traits), and on evaluating the output of the fusion rule. However, instead of considering a single fake score distribution equal to that of genuine users, we develop a statistical meta-model that enables simulating a continuous family of fake score distributions, with a twofold goal: () to better characterize the distributions of known, state-of-the-art presentation attacks, as empirically observed in our previous work, through a statistical meta-analysis; and (

) to simulate distributions that may correspond to never-before-seen (unknown) attacks incurred during system operation, as perturbations of the known distributions, through an uncertainty analysis on the input-output behavior of the fusion rule. Accordingly, our approach provides both a point estimate of the vulnerability of multibiometric systems against (a set of) known presentation attacks, and an evaluation of how the estimated probability of a successful presentation attack may vary under unknown attacks, giving confidence intervals for this prediction. To validate the soundness of our approach, we will empirically show that it allows one to model and correctly predict the impact of unknown attacks on system security. For these reasons, our approach provides a more complete evaluation of multibiometric security. As a consequence, it also allows us to design fusion rules that exhibit a better trade-off between the performance in the absence and in the presence of attack.

The remainder of the paper is structured as follows. We discuss current approaches for evaluating multibiometric security, and their limitations, in Sect. 2. In Sect. 3, we present our statistical meta-model of presentation attacks. In Sect. 4, we discuss a data model for multibiometric systems that accounts for the presence of zero-effort and spoof impostors (i.e., impostors that do not attempt any presentation attack, and impostors that use at least one fake trait). In Sect. 5 and Sect. 6, we exploit these models to define a procedure for the security evaluation of multibiometric systems, and to design novel secure fusion rules. We empirically validate our approach in Sect. 7, using publicly-available datasets of faces and fingerprints that include more recent, never-before-seen presentation attacks, with respect to those considered to develop our meta-model. Contributions and limitations of our work are finally discussed in Sect. 8.

2 Security of Multibiometric Systems

Before reviewing current approaches for evaluating multibiometric security against spoofing, it is worth remarking that different terminologies have been used in the literature when referring to biometric systems, and only recently they have been systematized in the development of a novel standard and harmonized vocabulary from the International Standard Organization (ISO) [5, 6, 7]. We summarize some of the most common names in Table I, and use them interchangeably in the remainder of this paper.

In this work, we focus on multibiometric systems exploiting score-level fusion to combine the matching scores coming from distinct biometric traits. An example for is shown in Fig. 1. During the design phase, authorized clients are enrolled by storing their biometric traits (i.e., templates) and identities in a database. During the online operation, each user provides the requested biometrics, and claims the identity of an authorized client. The corresponding templates are retrieved from the database and matched against the submitted traits. The matching scores are combined through a fusion rule which outputs an aggregated score . The aggregated score is finally compared with a threshold to decide whether the identity claim is made by a genuine user (if ) or an impostor. Performance is evaluated, as for unimodal systems, by estimating the False Acceptance Rate () and the False Rejection Rate () from the genuine and impostor distributions of the aggregated score [22].111Note that, according to the ISO standard [23], and refer to the overall system performance, including errors like failure to capture and failure to extract. If one only considers the algorithmic performance, disregarding these aspects, then the False Match Rate () and the False Non-Match Rate () should be used.

ISO Standard [5, 6] Commonly-used alternatives
Artefact(s) Spoof / Fake Trait(s)
Biometric Characteristic(s) Biometric(s) / Biometric Trait(s)
Comparison Subsystem Matcher / Matching Algorithm
Comparison Score(s) Matching Score(s)
Presentation Attack(s) Spoof / Direct Attack(s)
Presentation Attack Detection Liveness / Spoof Detection
TABLE I: ISO standard nomenclature (under development) for biometric systems and presentation attacks [5, 6], and commonly-used alternatives.

Presentation attacks can target any subset of the biometrics; e.g., a fake face (e.g., a 3D mask) and/or a fake fingerprint can be submitted to the corresponding sensor (see Fig. 1). The other impostor’s biometrics are submitted to the remaining sensors (if any): such biometrics are said to be subject to a zero-effort attack [13, 18, 15, 19, 20]. In multibiometric systems, the is evaluated when all the biometrics are subject to a zero-effort attack, i.e., against zero-effort impostors [15, 20, 21]. As spoofing attacks affect only the of a given system (and not the ), the corresponding performance is evaluated in terms of the so-called Spoof ([15, 20, 21]. Impostors attempting at least a presentation attack against one of the matchers are referred to as spoof impostors [15, 20]. Different values can be clearly estimated depending on the combination of attacked matchers, and on the kind of spoofing attacks involved (e.g., one may either use a face mask or a photograph for the purpose of face spoofing). Furthermore, the evaluated against an impostor distribution including both zero-effort and spoof impostors is referred to as Global FAR ([20, 21]. These measures will be formally defined in Sect. 4.222Note that and should not be confused with standard metrics used for presentation attack detection, like the Attack Presentation Classification Error Rate () and Normal Presentation Classification Error Rate ([5, 6, 7], as the latter aim to evaluate the performance of a system that discriminates between alive and fake traits, and not between genuine and impostor users.

In the following, to keep the notation uncluttered, we will respectively denote the score distribution of genuine users, zero-effort and spoof impostors at the output of an individual matcher as , and .

Fig. 1: A bimodal system combining face and fingerprint, that can potentially incur presentation attacks against either biometric, or both.

2.1 Where We Stand Today

In [13], Rodrigues et al. showed that multibiometric systems can be evaded by spoofing a single biometric characteristic, under the assumption that the corresponding score distribution is identical to that of genuine users:


This result was obtained without fabricating any fake trait: the matching scores of presentation attacks were simulated by resampling fictitious scores from . Similar results were obtained under the same assumption in [14, 15].

Subsequently, we carried out an extensive experimental analysis on multibiometric systems combining face and fingerprint, to evaluate whether and to what extent this assumption was representative of current techniques for fabricating fake traits (from now on, fake fabrication techniques[18, 17, 19].333For example, fake fingerprints can be fabricated adopting a mold of plasticine-like material where the targeted client has put his own finger, and a cast of gelatin dripped over the mold; fake faces can be fabricated by printing a picture of the targeted client on paper. To this end, we considered different acquisition sensors and matchers, and several presentation attacks fabricated with different techniques. We used five state-of-the-art fake fabrication techniques for fingerprints, as listed in Table II, taken from the first two editions of the Fingerprint Liveness Detection Competition (LivDet) [24, 25]. They include fake traits fabricated with five different materials, in a worst-case setting commonly accepted for security evaluation of fingerprint recognition systems, i.e., with the cooperation of the targeted client (consensual method). Fake faces (see Table II) were obtained by displaying a photo of the targeted client on a laptop screen, or by printing it on paper, and then showing it to the camera [26, 27]. Pictures were taken with cooperation (Print [28, 26] and Photo Attack datasets [17]) and without cooperation of the targeted client (Personal Photo Attack [17]). As a fourth fake fabrication technique, wearable masks were also fabricated to impersonate the targeted clients (3D Mask Attack [29]).

Our analysis showed that most of the above fake fabrication techniques produced distributions that did not match the overly-pessimistic assumption of [13]. Representative examples are reported in Fig. 2 for several unimodal fingerprint and face systems, where it can be seen that: is located in between and , it can be very different from , and its shape depends on the fake fabrication technique (see the three rightmost plots in the top row of Fig. 2, which refer to presentation attacks against the same fingerprint system with three different fake fabrication techniques). Notably, this is also true for more recent work on presentation attacks, even against different biometrics, like palm vein and iris [30, 31]. Accordingly, the estimated under the assumption in [13] can be much higher than the actual  [18, 17, 19].

Under the viewpoint of security evaluation, the approach proposed in [13, 14, 15] has the advantage of avoiding the fabrication of fake traits. However, it can give one an overly-pessimistic and incomplete analysis of multibiometric security, as only rarely matches , and different presentation attacks produce different .

Dataset # clients # spoofs # live
LivDet09-Silicone (catalyst) [24] 142 20 20
LivDet11-Alginate [25] 80 3 5
LivDet11-Gelatin [25] 80 3 5
LivDet11-Silicone [25] 80 3 5
LivDet11-Latex [25] 80 3 5
Photo Attack [17] 40 60 60
Personal Photo Attack [17] 25 5 (avg.) 60
Print Attack [28, 26] 50 12 16
3D Mask Attack [29] 17 5 (video) 10 (video)
TABLE II: Dataset characteristics: number of clients (# clients), and number of spoof (# spoofs) and live (# live) images per client.

2.2 Limitations and Open Issues

The empirical evidences summarized in Sect. 2.1 highlight that, even if the assumption in [13, 14, 15] allows us to assess the of a multibiometric system under different combinations of attacked matchers without fabricating any fake trait, the corresponding estimates may be too pessimistic. For the same reason, secure fusion rules based on the same assumption may even worsen the trade-off between the performance in the absence and in the presence of spoofing.

In addition, these evaluation techniques, as well as more recent ones [20, 21], do not specifically account for never-before-seen presentation attacks that may be incurred during system operation. System security is often evaluated on a set of attacks that may not be representative enough of future ones, i.e., considering fake score distributions that may be very different from those exhibited by unknown attacks. Accordingly, such evaluations only provide point estimates for the corresponding , without giving any information on how it may vary under presentation attacks that are different from those considered during design.

This raises two main open issues: () how to perform a more complete security evaluation of multibiometric systems, accounting for both known and unknown presentation attacks; and () how to design improved secure fusion rules, while avoiding the cumbersome task of collecting or fabricating a large set of representative fake biometric traits.

In the next section, we present a statistical meta-model of presentation attacks that overcomes the aforementioned limitations by simulating a continuous family of fake score distributions , including distributions that correspond to known presentation attacks, and potential variations to account for never-before-seen attacks. We will then discuss how to exploit our meta-model for the purpose of security evaluation and to design secure fusion rules (Sects. 5-6).

3 Statistical Meta-Analysis of Presentation Attacks

To address the issues discussed in Sect. 2.2, in this section we develop a statistical meta-model that allows one to simulate a continuous family of fake score distributions (at the output of a matcher), with a twofold goal: () to characterize the score distributions of known, state-of-the-art presentation attacks; and () to simulate continuous perturbations of such distributions that may correspond to the effect of unknown presentation attacks. In both cases, no fabrication of presentation attacks is required, since the corresponding fake score distributions are simulated.

To this aim, we exploit two well-known techniques of statistical data analysis:: () statistical meta-analysis, that aims to find common patterns from results collected from previous studies, to avoid repeating time-consuming and costly experiments (see, e.g.[32]); and () uncertainty analysis, to evaluate the output of a system under unexpected input perturbations (see, e.g., [33, 34, 35]). In particular, the latter allows us to investigate the input-output behavior of fusion rules, accounting for perturbations of the fake score distributions at the matchers’ output (i.e., inputs to the fusion rule) that may be caused by unknown attacks. This completes our evaluation by providing information on how the system may behave under presentation attacks that are different from those considered during design.

Accordingly, our approach may provide a point estimate of the vulnerability of multibiometric systems against known presentation attacks (e.g., a given value), as current evaluation techniques [13, 14, 15, 20, 21], and also an estimate of how this vulnerability may vary under unknown attacks, giving confidence intervals for this prediction. As we will show experimentally, this provides a more thorough security evaluation of multibiometric systems, capable of correctly predicting even the impact of never-before-seen presentation attacks.

In the following we present our meta-model, and show how it characterizes known presentation attacks.

3.1 A Meta-model of Presentation Attacks

Fig. 2: Matching score distributions of fake fingerprints (top row) and faces (bottom row) for LivDet09-Silicone (Sensor: Biometrika, Matcher: Veryfinger), LivDet11-Alginate (S: Biometrika, M: Bozorth3), LivDet11-Gelatin (S: Italdata, M: Bozorth3), LivDet11-Latex (S: Italdata, M: Bozorth3), Personal Photo Attack (S: commercial webcam, M: EBGM), 3D Mask Attack (S: Microsoft Kinect, M: ISV), Print Attack (S: commercial webcam, M: EBGM), Photo Attack (S: commercial webcam, M: PCA). Simulated fake distributions according to our model are also shown for comparison, at low risk (LivDet09-Silicone, LivDet11-Alginate, 3D Mask Attack, and Personal Photo Attack), medium risk (LivDet11-Gelatin, and Print Attack), and high risk (LivDet11-Latex, and Photo Attack). The values of used to simulate each attack scenario are reported in Table III.

We first carry out a statistical meta-analysis of the fake score distributions of known presentation attacks described in Sect. 2.1. We already observed that they exhibit very different shapes across the different multibiometric systems and fake fabrication techniques considered. Nevertheless, under a closer scrutiny some general patterns emerge (see Fig. 2). First, always lies in between and  [18, 17, 29, 19]. Second, it exhibits a clear similarity either to or to , or an intermediate shape between them. This is reasonable, since presentation attacks attempt to mimic the clients’ traits: accurate reproductions are likely to result in a score distribution more similar to , whereas inaccurate ones (provided that they still “resemble” a real trait to the sensor/matcher) are likely to yield a score distribution more similar to . In particular we observed that, for fingerprint spoofing, resembles for lower scores, and for higher scores, exhibiting a significant heavy tail behavior (see Fig. 2, top row); in the case of face spoofing, it exhibits a shape intermediate between and , without significant heavy tails (see Fig. 2, bottom row).

To reproduce the above patterns, we propose a meta-model of as a mixing of and , by defining a fictitious r.v.  as:


where we define in turn , and as independent r.v. drawn respectively from the empirical distributions and ,444Note that, assuming independence between and here is only used to generate the fictitious scores , and neither requires nor assumes independence between the genuine and impostor distributions. and a distribution , with mean

and standard deviation


For the sake of interpretability, we consider the mean and standard deviation of a Beta distribution, instead of the parameters

Note that the meta-model (2) is more flexible than a standard mixture, as itself is a r.v.

The choice of a Beta distribution is motivated by the fact that a Beta r.v. ranges in , which allows the resulting to exhibit the pattern mentioned above, i.e., an intermediate shapes between and . As limit cases, when , one obtains , and when one obtains . In all the other cases, the achievable values, for any fixed decision threshold, range between those of and . The above limit distributions correspond therefore to the worst- and best-case presentation attack for a biometric system, in terms of the corresponding , that our meta-model can represent. These are reasonable choices, since can be seen as a worst-case attack, and correspond to the attack scenario of[13, 14, 15] (see Eq. 1); and corresponds to the least harmful spoofing attack of interest for security evaluation, since attacks leading to a lower are by definition less effective than a zero-effort attack, which is already included in the standard evaluation of biometric systems.

Note that, if we set , the meta-model (2) becomes equivalent to the one we proposed in [36], where was defined as a constant parameter (not a r.v.) ranging in . Although this is a simpler model, it turned out to be not flexible enough to properly fit all the fake score distributions empirically observed in our subsequent work [18, 17, 19].

Inferring the meta-parameters. Given a set of genuine and impostor scores obtained, for a given trait, in a laboratory setting, and a set of scores obtained from fake traits fabricated with a given technique, fitting the empirical distribution with our meta-model (2) amounts to inferring the value of the meta-parameters . To this aim the

method of moments

can be exploited.666An alternative is a log-likelihood maximization approach, which is however more computationally demanding. In our case it simply amounts to setting and to the values that can be analytically obtained from Eq. (2

), as a function of means and variances of the distributions of genuine, impostor and fake scores, estimated from the available data:


Note however that the estimated values of and may not satisfy the constraints and that must hold for a Beta distribution. In this case, one should correct the corresponding estimate(s) to the closest (minimum or maximum) admissible value.

To sum up, our meta-model characterizes a known fake score distribution with given values of the meta-parameters and . The associated presentation attack can then be simulated on any multibiometric system that uses the same biometric trait, by simulating the fake score distribution using Eq. (2), with the given values of and , and the corresponding empirical distributions and .

Finally, our meta-model (2) can be exploited also to carry out an uncertainty analysis. This can be attained by considering simulated fake score distributions obtained by perturbing the values of the and parameters associated to the known presentation attacks. Note that this produces a continuous family of simulated distributions.

The exact security evaluation procedure we propose is described in Sect. 5. In the following we analyze how our meta-model characterizes known presentation attacks.

3.2 Attack Scenarios for Fingerprints and Faces

Fig. 2 shows some representative examples of the distributions , and collected for our statistical meta-analysis, together with the fitting distributions of obtained with our meta-model as explained above. In all cases the fitting accuracy turned out to be very good. The values of obtained for all datasets, including those of Fig. 2, are shown in Fig. 3, where each point corresponds to a specific Beta distribution of the r.v. . Note that the closer the points in the plane, the closer the corresponding Beta distributions, and thus the distribution , for a given pair , . The different shapes exhibited by the fake score distributions (see, e.g., Fig. 2) are mirrored by the different values of the corresponding meta-parameters, which are spread over a considerable region of the plane (see Fig. 3). Note also that, often, : this explains why the simpler model we proposed in [36], corresponding to , does not provide a good fitting in such cases, as mentioned above. In particular, it can not properly characterize the heavy tails exhibited by the score distributions of fake fingerprints (see Fig. 2).

In Sect. 2, we observed that some attacks produce fake score distributions that are “closer” to the score distribution of genuine users, rather than to that of impostors (e.g., our fingerprint replicas made with latex are much more similar to the “alive” ones than replicas made with silicone). A useful by-product of our meta-model is that it allows us to formalize the above qualitative observation, by quantifying the impact of each presentation attack in terms of the probability that it is successful. To this aim, it is not possible to directly use the value computed for some given decision threshold, as the fake score distribution simulated by our meta-model (2) depends also on the distributions and . Nevertheless, for any given and , the more the distribution is concentrated towards higher values, the closer the corresponding is to , and thus the higher the for any given decision threshold. Accordingly, we propose to quantify the impact of an attack associated to given values of as:


where is a reasonable value to evaluate how much the Beta distribution is concentrated towards high values, as .777We argue that this measure of the attack impact may be also useful to quantitatively evaluate some aspects of the attack potential, a metric under definition in [6]. This can be investigated in future work. Fig. 4 shows the values of Eq. (5) for each point of the plane. Note that known fake score distributions exhibit very different values of the attack impact.

Fig. 3: Results of fitting our model to the considered datasets [18, 17, 29]. Each real fake distribution is represented as a point with coordinates . A red (green) ‘x’ (‘+’) denotes a fake fingerprint distribution obtained by the Bozorth3 (Veryfinger) matcher and the Biometrika (Italdata) sensor. A blue (black) ‘x’ denotes a fake face distribution obtained by the EBGM (PCA) matcher and a commercial webcam. The black ‘+’ denotes the distribution of fake faces for the 3D Mask Attack database, obtained by the ISV matcher and the Microsoft Kinect sensor. The area under the dashed black curve corresponds to , and delimits the family of possible fake distributions generated by our meta-model. Low-, medium-, and high-impact presentation attacks clustered to form the corresponding attack scenarios are highlighted respectively as blue, orange, and red shaded areas.
Fig. 4: Attack impact for each attack scenario of our meta-model (Eq. 5).
Fig. 5: Beta distributions and attack impact (in parentheses) for the three fingerprint and face presentation attack scenarios identified in Fig. 3.

Another interesting by-product emerging from our statistical meta-analysis is that score distributions produced by the same fake fabrication technique (e.g., fake fingerprints fabricated with latex) are fitted by our meta-model with very similar values, across different sensors, matchers, and user populations (i.e., different and ); on the other hand, the same sensor, matcher, and user population can result in considerably different values for different fake fabrication techniques (e.g., fake fingerprints fabricated with alginate, gelatin and latex). This implies that the attack impact measure of Eq. 5 mainly depends on the kind of attack, and is almost independent on the specific multibiometric system. Accordingly, our meta-model allows one to quantitatively compare the impact of different kinds of presentation attacks, either against the same or different multibiometric systems. The above result also means that, for both considered traits, our meta-model produces compact clusters of , each representing fake score distributions associated to one or more different fake fabrication techniques. These clusters are highlighted in Fig. 3. This result allows one to use a single instantiation of our meta-model for approximating all the distributions corresponding to the fake fabrication technique(s) lying in the same cluster, encompassing all the underlying, different multibiometric systems. For instance, the corresponding pair of values can be defined as the cluster centroid, with no appreciable loss in fitting accuracy. In particular, we can identify in the considered data the three clusters for fingerprint spoofing, and the three for face spoofing, highlighted in Fig. 3. The meta-model associated to each cluster, corresponding to a point in the plane, can then be used to simulate a given presentation attack scenario, involving the corresponding trait and fake fabrication technique(s), as explained in the next sections.

Fing. risk Dataset(s)
Low risk 0.08 0.09 0.28% LivDet09-Silicone; LivDet11-Alginate
Med. risk 0.23 0.20 12.33% LivDet11-Gelatin
High risk 0.40 0.26 35.67% LivDet11-Silicone; LivDet11-Latex
Face risk Dataset(s)
Low risk 0.38 0.03 0.01% Personal Photo Attack; Mask Attack
Med. risk 0.78 0.19 89.83% Print Attack
High risk 0.91 0.11 98.75% Photo Attack
TABLE III: Attack scenarios for fingerprint and faces, and their parameters.

Fig. 5 shows the Beta distributions associated to the attack scenarios of Fig. 3 (using the cluster centroids), and the corresponding values of Eq. (5). It can be seen that, for each considered trait, the Beta distribution of the different attack scenarios are characterized by considerably different values of the attack impact. Accordingly, we can label the above scenarios as “low”, “medium” and “high” impact. In Table III we report the corresponding values of , together with the attack techniques associated to each scenario. This taxonomy may be clearly revised in the future, if novel attack scenarios emerge from new empirical evidences.

To sum up, our meta-analysis does not only provide a clear picture of current fingerprint and face spoofing attacks, but also the first quantitative characterization of their impact.

4 Data Modeling for Multibiometric Systems under Presentation Attacks

We define here a data model for multibiometric systems to account for different presentation attacks against each matcher, and revise the metrics used for evaluating the performance of such systems accordingly. This model will be exploited in the rest of the paper to define our security evaluation procedure, and to design secure fusion rules.

In the following, uppercase and lowercase letters respectively denote random variables (r.v.) and their values. We denote with

the r.v. representing an identity claim made by either a genuine user (G) or an impostor (I), and with , the r.v. denoting whether the matcher is under attack () or not (), assuming that different presentation attacks are possible against the matcher.

For instance, in Sect. 3.2, we found three representative attack scenarios for fingerprint and face. To model them, the corresponding should take values in (i.e., ), respectively denoting the no-spoof scenario (), and the low-, medium- and high-impact scenarios ().

Assuming that the matching scores are independent from each other, given , and that each only influences the corresponding , we can write the class-conditional score distributions by marginalizing over all possible values of , as:


Note that the attack variables are not assumed to be independent, given , i.e.,

. This model corresponds to the Bayesian network of Fig. 


Since it is unlikely that genuine users use fake traits to access the system, we can reasonably set . Thus, the genuine distribution consists of a single component, i.e., . The impostor distribution is instead modeled as a mixture of different components, including the distribution of zero-effort impostors, and the distributions , for , associated to different combinations of attacked matchers and presentation attacks.

Accordingly, for a given fusion rule and acceptance threshold , the metrics used to evaluate the performance of a multibiometric system can be defined as:


where denotes the associated to a specific combination of attacked matchers and spoofing attacks (which are in total). Further, it is not difficult to see that the so-called Global FAR ([20, 21] attained in the presence of a mixture of zero-effort and spoof impostors can be directly computed as a convex linear combination of and using Eq. (6), as:


where, for notational convenience, we set and , for . To our knowledge, this is the first model highlighting a clear connection between the aforementioned performance metrics and the distribution of zero-effort and spoof impostors.

5 Security Evaluation of Multibiometric Systems under Presentation Attacks

In this section we describe our security evaluation procedure, and show how it can be exploited also for selecting the fusion rule and/or its parameters.

Fig. 6: A Bayesian network equivalent to Eq. (6).

Security evaluation. The procedure we propose is empirical, as in [13, 14, 15]: the SFAR under a simulated presentation attack is evaluated by replacing the available zero-effort impostor scores coming from the attacked matchers with fictitious fake scores sampled from our meta-model. More precisely, consider a multibiometric system made up of matchers, and an available set of matching scores , with . For a point estimate of the SFAR under a single attack against a subset of matchers (e.g., one of the known attacks), one should first define the combination of attacked matchers and attack scenarios, i.e., the values of the meta-parameters for each such matcher. Then:

  • for each matcher , if , set according to the desired scenario;

  • set equal to ;

  • for each , if and , set according to Eq. (2), with drawn from , and and sampled from and , i.e., the genuine and impostor scores of the matcher;

  • evaluate empirically using .

The resulting can then be evaluated by Eq. (10) (where the summation reduces to the single term ), after estimating the through the standard procedure and hypothesizing the values of and .

To evaluate the of Eq. (10) under different combinations of attacked matchers and scenarios, the above steps can be repeated for each of them.

Uncertainty analysis for security evaluation. To account for both known and unknown presentation attacks, through an uncertainty analysis, the above procedure has to be carried out by sampling a large number of attack scenarios from the feasible region of Fig. 3, besides the known attacks of interest. For instance, a uniform sampling can be used, as we will show in Sect. 7. In particular, it is convenient to sort the attack scenarios of each matcher for increasing values of attack impact (Eq. 5), such that higher values correspond to a higher attack impact. This allows the to be evaluated as a function of the attack impact on each matcher, highlighting its variability around the point estimates corresponding to known attacks, and which matchers the fusion rule is most sensitive to. Confidence bands can be used to represent the variability of the as the attack impact varies, as commonly done in statistical data analysis to represent the uncertainty on the estimate of a curve or function based on limited or noisy data. Examples are given in the plots of Fig. 7, where the yellow and purple bands represent the uncertainty on the impact of different attacks on the . As we will show in Sect. 7, even the associated to never-before-seen attacks can fall within these confidence bands, highlighting that our approach can also predict the impact of unknown attacks on the system.

Fusion rule and parameter selection. Our security evaluation procedure can be also exploited to help system designers selecting an appropriate fusion rule (or tuning its parameters), taking into account a set of potential attack scenarios. Assume that the designer has identified a number of relevant attack scenarios of interest, and would like to choose among fusion rules

, and/or tuning their parameter vectors

, to attain a suitable trade-off between the performance in the absence of attacks (defined by and values) and the one under the relevant attack scenarios, defined by the corresponding values of , estimated using the above procedure. While in the absence of attack application requirements can be expressed in terms of a trade-off between and , in the presence of spoofing the desired trade-off should also account for the under distinct, potential attack scenarios [13, 14, 15, 19, 20, 21]. This can be expressed, for instance, by minimizing a desired expression (Eq. 10), while keeping the below a maximum admissible value :


The value of and the priors and have to be carefully chosen by the system designer, depending on the application at hand, and on the attack scenarios that are considered more relevant. In Sect. 7 we will show an example of how to exploit the proposed procedures to assess the security of a bimodal system against fingerprint and face presentation attacks, and to select a suitable fusion rule.

6 Design of Secure Fusion Rules

The secure score-level fusion rules proposed so far are based on explicitly modeling presentation attacks against each matcher as part of the impostor distribution, using the scenario defined by Eq. (1[13, 14, 15]. However, as discussed in Sect. 2, this may cause such rules to exhibit a too pessimistic trade-off between the performance in the absence of spoofing and that under attacks that are not properly represented by Eq. (1). In this section, we discuss how to overcome these limitations using our meta-model (Sect. 3).

We first show how previously-proposed secure fusion rules can be interpreted according to the data model of Sect. 4 (Sect. 6.1

), and how this model can be also exploited to train secure fusion rules based on discriminative classifiers (Sect. 

6.2). Then, we discuss how our meta-model of and the attack scenarios defined in Sect. 3.2 can be exploited to design spoofing-aware score-level fusion rules that can achieve a better trade-off in terms of , and , on a wider set of attack scenarios characterized by different levels of attack impact. In particular, two secure score-level fusion rules are considered, respectively relying on a generative and a discriminative approach (Sect. 6.3).

6.1 Previously-proposed Secure Fusion Rules

In [37, 13] spoofing-aware score-level fusion rules were proposed, as variants of the well-known LLR rule [38]:


Both exploit an estimate of incorporating knowledge of potential presentation attacks that may be incurred during operation, and that are not included in the training data (as our model of Eq. 6). Therefore, while the genuine and the zero-effort impostor distributions can be estimated from the corresponding matching scores in the training data, specific assumptions have to be made on the remaining components of the mixture of zero-effort and spoof impostors. In our model, they include the priors and the fake score distributions , for , and . Both rules assume only a possible kind of attack against each matcher. This can be easily accounted for in our model of Eq. (6) by setting , i.e., for .

Extended LLR [13]. This rule is based on a seemingly more complex expression of than Eq.  (6), as it includes the probability of attempting a presentation attack against each matcher (represented by the r.v. ), and the probability of each attempt being successful (represented by the r.v. ). For each matcher, only if an attack is attempted and successful (i.e., and ), then the corresponding score follows a distribution different from that of zero-effort impostors. The expression of becomes however equivalent to Eq. (6), if we set , and marginalize over (cf. Eq. 6 with Eq. 5 in [13]). The prior distribution can be indeed written as:


while the fake score distributions in [13] are clearly equivalent to our .

In [13] the probability of attempting a presentation attack against any of the combinations of matchers was set to . Thus, the probability of zero-effort impostor attempts was set to , being a parameter.888To avoid confusion, we use instead of as in [13]. The probability of an attempted spoof failing, , was denoted as , and referred to as the “level of security” of the matcher. Clearly, as an attack can not be successful if it has not been attempted, . The resulting expression of therefore depends on the parameters and . Setting their values amounts to defining the distribution in Eq. (6) (see Eqs. 6 and 14); e.g., for a bimodal system (), one obtains:


Notably, if we assume and the scenario of Eq. (1), the distribution described by the models of Eqs. (6) and (14) becomes identical, as well as the corresponding LLR-based secure fusion rules given by Eq. (13).

Uniform LLR [37]. We proposed this robust version of the LLR in previous work, for a broader class of applications in computer security. It is based on modeling the impostor distribution according to Eq. (6), with . However, we considered the case when no specific knowledge on the distribution of potential attacks is available to the designer; accordingly, we agnostically

assumed a uniform distribution for modeling


6.2 Discriminative Approaches

The aforementioned secure fusion rules are based on a generative model of the data distribution. However, Eq. (6) can be also exploited to develop secure rules based on discriminative

classifiers such as Support Vector Machines (SVMs) and Neural Networks (NNs) 

[39]. To this end, one may train the fusion rule after resampling the available zero-effort impostor scores according to Eq. (6) as follows (see also [39, Sect. 8.1.2]). First, define according to Eq. (6), i.e., define and (for , and ). As for the security evaluation procedure defined in Sect. 5, let us denote the available set of scores as , with . For each , draw a value of from . For each , if , replace the corresponding with a sample from the hypothesized fake score distribution , otherwise leave unmodified (i.e., sample from the empirical distribution of zero-effort impostors).

Despite their simplicity, discriminative approaches have not been widely considered to design secure fusion rules. We will show how to exploit them to this end in Sect. 7.

6.3 Secure Fusion Rules based on our Meta-model

We showed that our meta-model (Eq. 2) can be exploited in the design of secure fusion rules to simulate the fake score distribution at the output of each attacked matcher, both in generative and discriminative approaches. To overcome the limitations of secure fusion rules based on Eq. (1), the aforementioned distribution can be hypothesized by selecting a suitable combination of attack scenarios, depending on the given application and desired level of security. The corresponding parameters can be selected either among those defined in Sect. 3.2 (Table III) for faces and fingerprints, or through cross validation, to properly tune the trade-off among , and (or ) under a wider class of presentation attacks.

In the following, we give two examples of how novel secure fusion rules can be defined. Different choices are possible, depending on the the selected fusion scheme (e.g., LLR, SVM, NN), and on the trade-off between the performance in the absence of attack and the security level that one aims to achieve. In our data model (Eq. 6), this influences the choice of each prior and of the corresponding attack scenarios (i.e., the parameters and ). We consider here an application setting demanding for a high level of security, e.g., an access control system for banking, and we thus only consider the worst-case available scenarios involving high-impact presentation attacks (see Table III).

-LLR. This is another variant of the generative LLR rule (Eq. 13), in which and the distributions are simulated according to the high-impact attack scenario. The prior distribution should be hypothesized based on the specific application setting, as suggested for the Extended LLR [13].


This rule is instead based on a discriminative approach. It consists of learning an SVM with the Radial Basis Function (RBF) kernel on a modified training set that includes simulated presentation attacks, as discussed in Sect. 

6.2: the available impostor scores are replaced with a number of matching scores sampled from the hypothesized . As for the -LLR, is a parameter, and is simulated according to the high-impact attack scenario.

Besides , the other parameters to be tuned are the SVM regularization parameter and the parameter of the RBF kernel, given by , where and denote the input and the -th training score vectors (see, e.g., [39]).

In conclusion, it is worth remarking that each secure fusion rule makes specific assumptions on the mixture of zero-effort and spoof impostors (Eq. 6), in terms of the prior and of the fake score distributions , for and . It is thus clear that each rule will achieve an optimal trade-off between and (Eq. 10) only when the is obtained under the same hypothesized model of (Eq. 6). This also holds for the secure fusion rules proposed in this section. Nevertheless, the data model of Sect. 4 along with the meta-model of Sect. 3 can be exploited to implement secure fusion rules that are optimal according to any other choice of , giving us much clearer guidelines to design secure score-level fusion rules, especially if compared to previous work [13, 14, 15].

7 Experimental Analysis

We report here a case study on a bimodal system combining fingerprint and face, to show how to thoroughly assess its security against presentation attacks, and to select a suitable fusion rule, according to the procedures defined in Sect. 5.

For our experiments, we consider a scenario in which the designer () believes that only presentation attacks against one matcher (either the face or fingerprint one) are likely, and () would like to select a fusion rule and/or its parameters to protect the multibiometric system against worst-case attacks, assuming that attacks against face and fingerprint are equiprobable, and accepting a maximum of 2%. According to our approach, and to point () above, all the available attack scenarios encompassed by our meta-model should be considered against each matcher, to thoroughly evaluate system security. With regard to point () above, and according to Sect. 5, the system designer should also encode application-specific requirements using Eqs. (11)-(12) to define a proper trade-off among , and . Then, the goal defined above can be formalized as:


being and the attained by the first and the second matcher against the corresponding high-impact attacks, while the other matcher is not under attack.

The above setting can be easily generalized to any other combination of attacks, also targeting multiple biometrics at the same time, or choice of parameters , and .

7.1 Experimental Setup

For these experiments, to validate the predictions of our approach under never-before-seen presentation attacks, we have considered two very recent databases of fake fingerprints and faces that have not been used in the design of our meta-model. They are concisely described below.

LivDet15 [40]. This database consists of about 16,000 fingerprint images captured by performing multiple acquisitions of all fingers of 50 different subjects, with four different optical devices (Biometrika, Green Bit, Digital Persona and Crossmatch). Fingerprint images were acquired in a variety of ways (e.g., wet and dry fingers, high and low pressure) to simulate different operating conditions. Fake fingerprints were fabricated using the cooperative method, with different materials, including Ecoflex, Playdoh, Body Double, silicone and gelatin. In our experiments, we use the images acquired with the Crossmatch sensor, and consider each separate finger as a client, yielding 500 distinct clients. We use Bozorth3 as the matching algorithm.

CASIA [41]. This database consists of 600 videos of alive and fake faces belonging to 50 distinct subjects, captured at high and low resolution, respectively, with a Sony NEX-5 and a standard USB camera. Three different kinds of presentation attacks are considered: warped photo, in which face images are printed on copper paper, and warped to simulate motion; cut photo, in which the warped face photo has also eye cuts to simulate blinking; and video, in which face images are displayed using a mobile device. We extract four frames from each video, and rotate and scale face images to have eyes in the same positions. We use the same matcher described in [42]. It accounts for illumination variations as in [43], and then computes a BSIF descriptor [44]. Matching scores are finally computed using the cosine distance.

We exploit these unimodal matching scores to create a chimerical dataset, by randomly associating face and fingerprint images of different clients from the two databases. This is a common practice in biometrics to obtain multimodal datasets [22]. The chimerical dataset is then randomly subdivided into five pairs of training and test sets, respectively including 40% and 60% of the “virtual” clients.999The clients of a chimerical dataset are usually referred to as “virtual” clients, as they do not correspond to a real person or identity. The matching scores are normalized in using the - technique [45, 22]. Its parameters, and those of the trained fusion rules, are estimated on the training set. This procedure is repeated five times, each time creating a different set of “virtual” clients. The results thus refer to the average test set performance on the corresponding twenty-five runs.

We consider the following state-of-the-art score-level fusion rules, including the spoofing-aware fusion rules discussed in Sect. 6.1, and the two secure fusion rules based on our meta-model, as described in Sect. 6.3.

Sum. Given matching scores to be combined , the sum rule is defined as .

Product. The product rule is defined as .

Minimum. This rule is defined as .101010Note that this rule is equivalent to an “AND” fusion rule that classifies a claim as genuine only if all the combined matchers output a genuine decision, assuming the same threshold for all matchers.

Linear Discriminant Analysis (LDA). This is a trained rule, in which the matching scores are linearly combined as . The parameters and are estimated from the training set by maximizing the Fisher distance between genuine and impostor score distributions [39].

Likelihood ratio (LLR). This is the trained rule given by Eq. (13). To estimate the likelihoods of genuine and impostor users, it is often realistically assumed that the are independent given , i.e., . Here we make the same assumption, and estimate each component

by fitting a Gamma distribution on the corresponding training data, as in

[13, 14, 15].

Fig. 7: Results for the considered bimodal system. Plots in the first and third column report the average DET curves attained by each fusion rule, when no attack is considered (‘no spoof’), and under presentation attacks from LivDet15 (‘fingerprint spoofing’) and CASIA (‘face spoofing’). The yellow and purple shaded areas represent the confidence bands for the predicted by our approach, over the family of fake score distributions represented by our meta-model, for face and fingerprint, respectively. The background color of plots in the second and fourth column represents the value of the fused score for each rule, in the space of matching scores. The black solid line represents its decision function at . We also report points corresponding to genuine users, impostors and presentation attacks, to compare the different decision functions.
Rule Sum LDA Product Minimum LLR SVM-RBF Ext. LLR Unif. LLR -LLR -SVM-RBF
TABLE IV: Average (%) performance (and standard deviation) attained by each rule at , in terms of , under the LivDet15 ( fing.) and CASIA ( face) presentation attacks, under the fingerprint () and face () high-impact simulated attacks, and the corresponding values (denoted with for the LivDet15 and CASIA attacks, and for the simulated attacks).

SVM-RBF. This rule consists of learning an SVM with the RBF kernel on the available training matching scores, to discriminate between genuine and impostor users. We set the parameters and by minimizing the at through a 5-fold cross-validation on the training data.

Extended LLR. This is the modified LLR proposed in [13], as described in Sect. 6.1. To minimize the according to Eqs. (19)-(20), we set the priors as , , and , although this choice does not correspond to any specific choice of , and for this rule. The fused matching score is given by Eq. (13).

Uniform LLR. This is the other LLR modification proposed in [37], and described in Sect. 6.1. We set as for the Extended LLR, coherently with the given selection criterion. The combined score is given again by Eq. (13)

-LLR. For this rule too, we set the prior distribution as described for the Extended LLR and the Uniform LLR, in agreement with the selection criterion. The distribution of attack samples is instead based on the high-risk attack scenarios defined by our spoof simulation model. We therefore set and for simulating attacks against the fingerprint matcher (i.e., when ‘RI’ is used), and and for the face matcher (i.e., when ‘G’ is used), according to Table III. The fused score is given by Eq. (13).

-SVM-RBF. We train this fusion rule using a modified training set sampled from the same distribution hypothesized for the -LLR, i.e., assuming the same prior and fake score distributions . Such a training set can be obtained as explained in Sect. 6.2. The values of and are optimized using a 5-fold cross validation on the training data, as for the SVM-RBF, although here this amounts to minimizing the at , as the training data is modified according to the desired .

Fig. 8: Matching score distributions for CASIA and LivDet15. Fake score distributions fitted with our meta-model are shown for comparison. The values of found to simulate them are for CASIA (attack impact=), and for LivDet15 (attack impact=).

7.2 Experimental Results

To show that our meta-model is capable of reliably modeling also the score distributions of never-before-seen presentation attacks, we first report in Fig. 8 its fitting on the score distributions of the attacks included in LivDet15 and CASIA. Note that the corresponding parameters do not exactly match any of the attack scenarios defined in Table III and Fig. 3. In fact, face and fingerprint spoofs from CASIA and LivDet15 exhibit an intermediate behavior respectively between the low- and med-impact attack scenarios for faces, and between the med- and high-impact attack scenarios for fingerprints. This further highlights that such attacks are very different from those considered in the design of our meta-model and of known attack scenarios.

The results for the given bimodal system are reported in terms of average Detection Error Trade-off () curves in Fig. 7. These curves report vs (or ) for all operating points on a convenient axis scaling [46, 15]. Using our-meta model, we construct a family of curves, each obtained by simulating an attack scenario (i.e., a fake score distribution) against a single matcher. We then average curves corresponding to attack scenarios that exhibit a similar attack impact, yielding 20 distinct curves corresponding to the attack impact values . The area covered by such curves, for each matcher, is highlighted using a shaded area, as described in Fig. 7. We also report the curves corresponding to presentation attacks from the LivDet15 and CASIA databases.

To correctly understand our evaluation, recall that spoofing does not affect the matching score distribution of genuine users, i.e., the under attack does not change. Accordingly, for any operating point on the curve computed without attacks (corresponding to the ‘no spoof’ scenario, which reports vs ), the values predicted by our model can be found by intersecting the shaded areas in Fig. 7 with the horizontal line corresponding to the same value (see, e.g., ‘op. point’ in Fig. 7).

To provide a clearer discussion of our results, in Table IV, we also report the average performance attained by each rule at the operating point corresponding to , including , and and (Eqs. 19-20) attained under the LivDet15 and CASIA attacks, and under the high-impact attacks simulated with our meta-model (see Table III).

Let us first compare the predictions provided by our analysis against those corresponding to the LivDet15 and CASIA presentation attacks. As one may note from Fig. 7, the confidence bands denoting the variability of the curves obtained under the attacks simulated with our meta-model almost always correctly represent and follow the behavior of the curves corresponding to the LivDet15 and CASIA attacks. This shows that our approach may be able to reliably predict the performance of a multibiometric system even under never-before-seen presentation attacks.

Furthermore, our analysis also highlights whether the fusion rule is more sensitive to variations in the output of a given matcher. This can be noted by comparing the confidence bands corresponding to attacks against the fingerprint and the face matcher in Fig. 7. In our case, fusion rules are generally more vulnerable to attacks targeting the fingerprint matcher (except for Sum and LDA), as the confidence bands corresponding to fingerprint presentation attacks are typically more shifted towards higher error rates. The reason is simply that the fingerprint matcher is more accurate than the face one in this case, and, thus, when the former is under attack, the matching scores of spoof impostors and genuine users tend to overlap more (cf. the scatter plots in the second and fourth column of Fig. 7).

From the curves in Fig. 7, one may also note that standard rules are generally more accurate in the absence of attack than secure fusion rules, confirming the trade-off between the performance in the absence and in the presence of spoofing. The Minimum is an exception, as it exhibits a higher . The reason is that this rule only accepts a genuine claim if all the combined scores are sufficiently high. Thus, to keep an acceptable, low , one has to trade for a higher , and this may also worsen security against spoofing, conversely to intuition.

From Table IV, one may also appreciate that secure fusion rule designed under the too pessimistic assumption given by Eq. (1) perform worse than the -LLR and the -SVM-RBF under the LivDet15 and CASIA presentation attacks. This shows that our meta-model can also lead one to design fusion rules with an improved trade-off between the performance in the absence of attack and system security.

Besides giving a general overview of the kind of analysis enabled by our security evaluation procedure, the goal of the proposed case study is to show how a system designer can select a suitable fusion rule. According to the criterion given by Eqs. (19)-(20), it is clear from Table IV that the rule that attains the minimum expected (according to our model, the value) is the -SVM-RBF, which can be thus selected as the fusion rule for this task. In particular, this rule attains also the lowest under the (simulated) fingerprint presentation attack (). Notably, also Sum, LDA, Product and -LLR may be exploited to the same end, as they achieve only a slightly higher .

When considering the attained under the LivDet15 and CASIA spoofing attacks, the best rule turns out to be the Product rule, immediately followed by the -LLR and the -SVM-RBF. This is somehow reasonable to expect, as our analysis did not exploit any specific knowledge of such attacks, and, in particular, as we tuned our fusion rules using slightly overly-pessimistic attack settings (the fingerprint and face attack scenarios considered to design the -LLR and the -SVM-RBF have a higher attack impact than that exhibited by the LivDet15 and CASIA attacks). Despite this, selecting the -SVM-RBF instead of the Product would not raise any severe security issue in this case. Conversely, it may be even beneficial if fingerprint spoofing is deemed more likely during system operation than face spoofing. This should be clearly noted by the system designer before taking the final decision.

Why Secure Fusion Works. The fact that the standard fusion rules like the Product can be competitive with secure fusion rules in the presence of spoofing can be easily understood by looking at the shape of the decision functions in the scatter plots of Fig. 7. In fact, the decision functions of such rules tend to better enclose the genuine class rather than the impostor class. Whereas untrained rules like the Product may perform well only under specific data distributions (like those shown in the depicted cases), trained secure fusion rules are expected to perform better on a wider variety of cases, due to their flexibility in learning and shaping the decision function depending on the given set of scores. However, providing a better enclosing of the genuine class turns out to clearly increase the at the same value (or vice versa), underlining again the trade-off between the performance in the absence of attacks and that under attack. For this reason, it is especially important to be able to tune this trade-off properly, and not in an overly-pessimistic manner, as demonstrated in the design of the secure fusion rules based on our meta-model.

8 Conclusions and Future Work

We proposed an approach to thoroughly assess the security of multibiometric systems against presentation attacks, and to improve their security by design, overcoming the limitations of previous work [13, 14, 15]. Our approach is grounded on a statistical meta-model that incorporates knowledge of state-of-the-art fingerprint and face presentation attacks, by simulating their matching score distributions at the output of the attacked matchers, avoiding the cumbersome task of fabricating a large, representative set of attacks during system design. It also allows us to simulate perturbations of such distributions that may correspond to unknown attacks of different impact, through an uncertainty analysis. This aspect is specifically important, as attackers constantly aim to find novel evasion techniques [47]. In the case of biometric systems, this means that novel, unexpected attacks may be encountered in the near future. For instance, in [48], it has been claimed that it is not possible to forecast all potential face spoofing attacks and fake fabrication techniques, as humans can always find very creative ways to cheat a system. Our uncertainty analysis aims thus to overcome this issue. We showed empirically that our approach provides a much more informative security evaluation of multibiometric systems, characterizing the behavior of the system also under never-before-seen attacks, and enabling the design of improved secure fusion rules.

We argue that our statistical meta-model can be applied to presentation attacks targeting other biometric traits, like palm vein and iris, as preliminary empirical evidences show that their score distributions exhibit similar characteristics to those observed for face and fingerprint in Sect. 3.1 (see, e.g., [30, 31]). This however requires further investigation, and can be addressed in future work.

To conclude, it is also worth remarking that secure fusion may provide a complementary approach to liveness detection techniques that protect the combined matchers against spoofing. Accordingly, another interesting future extension of this work may be to exploit our meta-model in the context of recent approaches that combine liveness detection and matching algorithms, instead of using them as independent modules [49, 50, 51, 52]. We believe that this may significantly improve multibiometric security to spoofing.


  • [1] N. K. Ratha, J. H. Connell, and R. M. Bolle, “An analysis of minutiae matching strength,” in AVBPA, ser. LNCS, J. Bigün and F. Smeraldi, Eds., vol. 2091.   Springer, 2001, pp. 223–228.
  • [2] A. K. Jain, K. Nandakumar, and A. Nagar, “Biometric template security,” EURASIP J. Adv. Signal Process, vol. 2008, pp. 1–17, 2008.
  • [3]

    B. Biggio, G. Fumera, P. Russu, L. Didaci, and F. Roli, “Adversarial biometric recognition: A review on biometric system security from the adversarial machine-learning perspective,”

    IEEE Sig. Proc. Mag., vol. 32, no. 5, pp. 31–41, Sept 2015.
  • [4] T. Matsumoto, H. Matsumoto, K. Yamada, and S. Hoshino, “Impact of artificial “gummy” fingers on fingerprint systems,” Datenschutz und Datensicherheit, vol. 26, no. 8, 2002.
  • [5] ISO/IEC DIS 30107-1, “Information Technology - Biometric presentation attack detection - Part 1: Framework,” 2015.
  • [6] ISO/IEC DIS 30107-3, “Information Technology - Biometric presentation attack detection - Part 3: Testing and reporting,” 2015.
  • [7] ISO/IEC 2382-37:2012, “Information technology - Vocabulary - Part 37: Biometrics,” 2012.
  • [8] B. Geller, J. Almog, P. Margot, and E. Springer, “A chronological review of fingerprint forgery,” J. Forensic Sc., vol. 44, no. 5, pp. 963–968, 1999.
  • [9] Y. Kim, J. Na, S. Yoon, and J. Yi, “Masked fake face detection using radiance measurements,” J. Opt. Soc. Am. A, vol. 26, no. 4, pp. 760–766, 2009.
  • [10] J. Yang, Z. Lei, S. Liao, and S. Z. Li, “Face liveness detection with component dependent descriptor,” in Int’l Conf. Biometrics, 2013.
  • [11] A. K. Jain and A. Ross, “Multibiometric systems,” Commun. ACM, vol. 47, no. 1, pp. 34–40, 2004.
  • [12] A. K. Jain, A. Ross, S. Pankanti, and S. Member, “Biometrics: A tool for information security,” IEEE Trans. Information Forensics and Security, vol. 1, pp. 125–143, 2006.
  • [13] R. N. Rodrigues, L. L. Ling, and V. Govindaraju, “Robustness of multimodal biometric fusion methods against spoofing attacks,” J. Vis. Lang. Comput., vol. 20, no. 3, pp. 169–179, 2009.
  • [14] R. N. Rodrigues, N. Kamat, and V. Govindaraju, “Evaluation of biometric spoofing in a multimodal system,” in Int’l Conf. Biom.: Theory, Applications, and Systems, 2010, pp. 1–5.
  • [15] P. Johnson, B. Tan, and S. Schuckers, “Multimodal fusion vulnerability to non-zero effort (spoof) imposters,” in IEEE Int’l Workshop Information Forensics and Security, 2010, pp. 1–5.
  • [16] I. Arce, “The weakest link revisited [information security],” IEEE Security & Privacy, vol. 1, no. 2, pp. 72–76, Mar 2003.
  • [17] B. Biggio, Z. Akhtar, G. Fumera, G. L. Marcialis, and F. Roli, “Robustness of multi-modal biometric verification systems under realistic spoofing attacks,” in Int’l J. Conf. Biometrics, 2011, pp. 1–6.
  • [18] B. Biggio, Z. Akhtar, G. Fumera, G. L. Marcialis, and F. Roli, “Security evaluation of biometric authentication systems under real spoofing attacks,” IET Biometrics, vol. 1, no. 1, pp. 11–24, 2012.
  • [19] G. Fumera, G. L. Marcialis, B. Biggio, F. Roli, and S. C. Schuckers, “Multimodal anti-spoofing in biometric recognition systems,” in Handbook of Biometric Anti-Spoofing

    , ser. Advances in Computer Vision and Pattern Recognition, S. Marcel, M. Nixon, and S. Z. Li, Eds.   Springer, 2014, pp. 165–184.

  • [20] I. Chingovska, A. Anjos, and S. Marcel, “Evaluation methodologies,” in Handbook of Biometric Anti-Spoofing, ser. Advances in Computer Vision and Pattern Recognition, S. Marcel, M. Nixon, and S. Z. Li, Eds.   Springer, 2014, pp. 185–204.
  • [21] A. Hadid, N. Evans, S. Marcel, and J. Fierrez, “Biometrics systems under spoofing attack: An evaluation methodology and lessons learned,” IEEE Sig. Proc. Mag., vol. 32, no. 5, pp. 20–30, Sept 2015.
  • [22] A. A. Ross, K. Nandakumar, and A. K. Jain, Handbook of Multibiometrics.   Springer Publishers, 2006.
  • [23] ISO/IEC 19795-1, “Information technology - Biometric performance testing and reporting - Part 1: Principles and framework,” 2006.
  • [24] G. L. Marcialis, A. Lewicke, B. Tan, P. Coli, D. Grimberg, A. Congiu, A. Tidu, F. Roli, and S. A. C. Schuckers, “First Int’l Fingerprint Liveness Detection Comp. - LivDet2009,” in ICIAP, ser. LNCS, P. Foggia et al., Eds., vol. 5716.   Springer, 2009, pp. 12–23.
  • [25] D. Yambay, L. Ghiani, P. Denti, G. L. Marcialis, F. Roli, and S. Schuckers, “LivDet2011 - Fingerprint Liveness Detection Competition,” in 5th Int’l Conf. Biometrics, 2012.
  • [26] M. Chakka, A. Anjos, S. Marcel, R. Tronci, D. Muntoni, G. Fadda, M. Pili, N. Sirena, G. Murgia, M. Ristori, F. Roli, J. Yan, D. Yi, Z. Lei, Z. Zhang, S. Li, W. Schwartz, A. Rocha, H. Pedrini, J. Lorenzo-Navarro, M. Castrillon-Santana, J. Maatta, A. Hadid, and M. Pietikainen, “Competition on counter measures to 2-D facial spoofing attacks,” in Int’l J. Conf. Biometrics, 2011, pp. 1–6.
  • [27] Z. Zhang, D. Yi, Z. Lei, and S. Z. Li, “Face liveness detection by learning multispectral reflectance distributions,” in Int’l Conf. on Automatic Face and Gesture Recognition, 2011, pp. 436–441.
  • [28] A. Anjos and S. Marcel, “Counter-measures to photo attacks in face recognition: A public database and a baseline,” in Int’l Joint Conf. on Biometrics, 2011, pp. 1 –7.
  • [29] N. Erdogmus and S. Marcel, “Spoofing in 2D face recognition with 3D masks and anti-spoofing with Kinect,” in 6th IEEE Int’l Conf. on Biometrics: Theory, Applications and Systems, 2013, pp. 1–6.
  • [30] P. Tome and S. Marcel, “On the vulnerability of palm vein recognition to spoofing attacks,” in 8th IAPR Int’l Conf. Biom. (ICB), 2015.
  • [31]

    R. Raghavendra and C. Busch, “Robust scheme for iris presentation attack detection using multiscale binarized statistical image features,”

    IEEE Trans. Information Forensics and Security, vol. 10, no. 4, pp. 703–715, April 2015.
  • [32] S. Y. Sohn, “Meta analysis of classification algorithms for pattern recognition,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 11, pp. 1137–1144, 1999.
  • [33] A. Saltelli, S. Tarantola, and F. Campolongo, “Sensitivity analysis as an ingredient of modeling,” Statistical Science, pp. 377–395, 2000.
  • [34] A. Saltelli, K. Chan, and E. Scott, Sensitivity analysis.   Wiley Series in Probability and Statistics, Wiley, New York, 2000.
  • [35] A. Kann and J. P. Weyant, “Approaches for performing uncertainty analysis in large-scale energy/economic policy models,” Environmental Modeling & Assessment, vol. 5, no. 1, pp. 29–46, 2000.
  • [36] Z. Akhtar, G. Fumera, G. L. Marcialis, and F. Roli, “Evaluation of multimodal biometric score fusion rules under spoofing attacks,” in 5th IAPR Int’l Conf. Biometrics, 2012, pp. 402–407.
  • [37] B. Biggio, G. Fumera, and F. Roli, “Design of robust classifiers for adversarial environments,” in IEEE Int’l Conf. on Systems, Man, and Cybernetics (SMC), 2011, pp. 977–982.
  • [38] K. Nandakumar, Y. Chen, S. C. Dass, and A. Jain, “Likelihood ratio-based biometric score fusion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, pp. 342–347, 2008.
  • [39] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 1st ed.   Springer, 2007.
  • [40] V. Mura, D. Yambay, L. Ghiani, G. Marcialis, S. Schuckers, and F. Roli, “LivDet 2015 - fingerprint liveness detection competition 2015,” in 7th IEEE Int’l Conf. Biometrics: Technology, Applications and Systems (BTAS 2015), September 2015.
  • [41] Z. Zhang, J. Yan, S. Liu, Z. Lei, D. Yi, and S. Li, “A face antispoofing database with diverse attacks,” in 5th IAPR Int’l Conf. Biometrics (ICB), 2012, pp. 26–31.
  • [42] P. Tuveri, V. Mura, G. Marcialis, and F. Roli, “A classification-selection approach for self updating of face verification systems under stringent storage and computational requirements,” in Image Analysis and Processing (ICIAP), ser. LNCS, V. Murino and E. Puppo, Eds.   Springer, 2015, vol. 9280, pp. 540–550.
  • [43] X. Tan and B. Triggs, “Enhanced local texture feature sets for face recognition under difficult lighting conditions,” IEEE Trans. Image Processing, vol. 19, no. 6, pp. 1635–1650, June 2010.
  • [44] J. Kannala and E. Rahtu, “BSIF: Binarized statistical image features,” in 21st Int’l Conf. Patt. Rec. (ICPR), Nov 2012, pp. 1363–1366.
  • [45] A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal biometric systems,” Pattern Recognition, vol. 38, no. 12, pp. 2270–2285, 2005.
  • [46] A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “The DET curve in assessment of detection task performance,” 1997, pp. 1895–1898.
  • [47] B. Biggio, G. Fumera, and F. Roli, “Security evaluation of pattern classifiers under attack,” IEEE Trans. Knowledge and Data Engineering, vol. 26, no. 4, pp. 984–996, 2014.
  • [48] A. Hadid, “Face biometrics under spoofing attacks: Vulnerabilities, countermeasures, open issues, and research directions,” in IEEE Conf. on Computer Vision and Pattern Rec. Workshops, 2014.
  • [49] E. Marasco, P. Johnson, C. Sansone, and S. Schuckers, “Increase the security of multibiometric systems by incorporating a spoofing detection algorithm in the fusion mechanism,” in Multiple Classifier Systems, ser. LNCS, C. Sansone, J. Kittler, and F. Roli, Eds., vol. 6713.   Springer Berlin Heidelberg, 2011, pp. 309–318.
  • [50] E. Marasco, Y. Ding, and A. Ross, “Combining matching scores with liveness values in a fingerprint verification system,” in IEEE 5th Int’l Conf. Biom.: Theory, App. and Systems, 2012, pp. 418–425.
  • [51] I. Chingovska, A. Anjos, and S. Marcel, “Anti-spoofing in action: Joint operation with a verification system,” in IEEE Conf. on Computer Vision and Pattern Rec. Workshops, 2013, pp. 98–104.
  • [52] P. Wild, P. Radu, L. Chen, and J. Ferryman, “Robust multimodal face and fingerprint fusion in the presence of spoofing attacks,” Pattern Recognition, 2015, In Press.