False-Accept/False-Reject Trade-offs in Biometric Authentication Systems

05/31/2018 ∙ by Neri Merhav, et al. ∙ Technion 0

Biometric authentication systems, based on secret key generation, work as follows. In the enrollment stage, an individual provides a biometric signal that is mapped into a secret key and a helper message, the former being prepared to become available to the system at a later time (for authentication), and the latter is stored in a public database. When an authorized user requests authentication, claiming his/her identity as one of the subscribers, he/she has to provide a biometric signal again, and then the system, which retrieves also the helper message of the claimed subscriber, produces an estimate of the secret key, that is finally compared to the secret key of the claimed user. In case of a match, the authentication request is approved, otherwise, it is rejected. Evidently, there is an inherent tension between two desired, but conflicting, properties of the helper message encoder: on the one hand, the encoding should be informative enough concerning the identity of the real subscriber, in order to approve him/her in the authentication stage, but on the other hand, it should not be too informative, as otherwise, unauthorized imposters could easily fool the system and gain access. A good encoder should then trade off the two kinds of errors: the false reject (FR) error and the false accept (FA) error. In this work, we investigate trade-offs between the random coding FR error exponent and the best achievable FA error exponent. We compare two types of ensembles of codes: fixed-rate codes and variable-rate codes, and we show that the latter class of codes offers considerable improvement compared to the former. In doing this, we characterize the optimal rate functions for both types of codes. We also examine the effect of privacy leakage constraints on these trade-offs, for both fixed-rate codes and variable-rate codes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider a biometric authentication system which is based on the one described in [6, Sections 2.2–2.6], and on the notion of secret key generation and sharing of Maurer [7] and Ahlswede and Csiszár [1], [2]. In particular, this system works in the following manner. In the enrollment phase, a person that subscribes to the system, feeds it with a biometric signal, . The system then responds by generating (using its encoder) two outputs. The first is a secret key, , at rate and the second is a helper message, , at rate . The secret key will be used by the system later, at the authentication stage and the helper message is saved in a database. When an authorized user (a subscriber) wishes to sign in, claiming to be one of the subscribers that have already enrolled, he/she is requested to provide again his/her biometric signal, (correlated to , if indeed from the same person, or independent, otherwise). The system then retrieves the helper message of the claimed subscriber from the database, and responds (using its decoder) by estimating the secret key, (based on ), and comparing it to that of the claimed user, . If matches , the access to the system is approved, otherwise, it is denied.

In [6, Sect. 2.3], the achievable region of pairs of rates was established for the existence of authentication systems where the following four quantities need to be arbitrarily small for large

: (i) the false–reject (FR) probability, (ii) the false–accept (FA) probability, (iii) the privacy leakage,

, and (iv) the secrecy leakage, . Specifically, Theorem 2.1 of [6] asserts that when

are drawn from a joint discrete memoryless source (DMS), emitting independent copies of a pair of dependent random variables,

, the largest achievable key rate, , under the above constraints, is given by . It then follows that must lie between and , where the lower limit is needed for good identification of the legitimate subscriber (small FR probability) as well as for achieving the minimum possible privacy leakage, whereas the upper limit is due to the secrecy leakage requirement. These limitations already assure that , which in turn is necessary for keeping the FA probability vanishingly small for large . The achievability parts of the corresponding coding theorems were proved in [6] using random binning, similarly as in classical Slepian–Wolf coding.

More recently, in [9] these results have been refined by characterizing achievable exponential error bounds for the above performance metrics. In particular, for a given rate pair , random coding error exponents and expurgated error exponents were found for the FR probability, as well as sphere–packing bound, which is tight at a certain region of the plane of . For the FA probability, the exact best achievable error exponent was characterized, and finally, more refined upper bounds for privacy leakage and the secrecy leakage were derived.

This paper is a further development of [9], where the focus is on the trade–off between the FA error exponent and the FR error exponent. In the design of the helper message encoder, the following conflict arises: on the one hand, it is desirable that the helper message would be informative enough about , such that in the presence of , the identity of the legitimate subscriber will be approved with high probability. But on the other hand, it is also desired that in the absence of , the helper message would tell as little as possible about , in order to make it difficult for imposters to access the system.

Indeed, the converse theorem in [9, Theorem 5] is based on the assumption that every type class of source sequences , is mapped, by the helper–message encoder, to as many different helper messages as possible, thus making it as close as possible to be a one–to–one mapping, or in other words, making

is “as informative as possible” about the source vector

, and hence also about the secret key , generated from . This is good for achieving a small FR probability (or, equivalently, a large FR error exponent), at the expense of a limitation on the achievable FA exponent. In particular, by relaxing the above described assumption, and allowing smaller numbers of various helper messages for each source type class, one may achieve better FA exponents, at the expense of worse FR exponents.

This raises the interesting question of achievable trade-offs between the FA exponent and the FR exponent, which is similar, in spirit, to the trade–off between the false alarm probability and the mis–detection probability in the Neyman–Pearson scenario, where this trade–off is traditionally encapsulated by the notion of receiver operating characteristics (ROC). The difference, however, is that while in the Neyman–Pearson setting, this trade–off is controlled by the choice of a detector (or more precisely, by the choice of the threshold of the likelihood ratio detector), here we control the trade-off via the choice of an encoder, in this case, the helper–message encoder.

To this end, we first derive an expression of the FR random coding error exponent as a function of the desired FA error exponent for fixed–rate binning. This is a relatively straightforward manipulation of the results of [9], but it will serve as a reference result. The more interesting part, however, is about extending the scope to the ensemble of variable–rate random binning codes, whose rate function depends on the source vector only via its type (similarly as in [12] and [3]). The are two questions that arise in this context. The first is: what are the optimal rate functions of the secret–message and the helper–message for maximizing the achievable FR exponent for a given FA error exponent? Upon finding such rate functions, the second question is: what is the achievable FR error exponent as a function of the FA error exponent, and to what extent does it improve relative to fixed–rate binning? We find an exact formula for this function and demonstrate that the improvement may be rather significant compared to fixed–rate binning. Finally, we examine the influence of adding a constraint on the privacy leakage, in addition to the above mentioned constraint on the FA error exponent for both fixed–rate codes and variable–rate codes.

On a technical note, it should be pointed out that while in [9]

, the error exponent expressions are provided in the Csiszár–style formulation (i.e., minimizations of certain functionals of information measures over probability distributions), here we pass to Gallager–style formulations (i.e., maximizations of functions of relatively few parameters). The reasons for our interest in Gallager–style expressions are that they lend themselves more conveniently to numerical calculations (see the discussion after Theorem

1 below, for more details), and that they may be of independent interest on their own right.

The outline of the remaining part of this paper is as follows. Section II establishes the notation conventions. Section III provides a formal definition of the problem setting, then it gives some background (preliminaries), and finally, it describes the objectives. Section IV provides a preparatory step of deriving the optimal rate functions that maximize an expression of the achievable FR error exponent for a given FA error exponent, in both fixed–rate and variable–rate regimes. In Section V, we derive the FA error exponents of both fixed– and variable–rate codes as functions the prescribed FA error exponent. Finally, in Section VI, we examine the effect of a constraint on the privacy leakage.

II. Notation Conventions

Throughout the paper, random variables will be denoted by capital letters, specific values they may take will be denoted by the corresponding lower case letters, and their alphabets will be denoted by calligraphic letters. Random vectors and their realizations will be denoted, respectively, by capital letters and the corresponding lower case letters, both in the bold face font. Their alphabets will be superscripted by their dimensions. For example, the random vector , ( – positive integer) may take a specific vector value in , the –th order Cartesian power of , which is the alphabet of each component of this vector. Sources and channels will be denoted by the letter or , subscripted by the names of the relevant random variables/vectors and their conditionings, if applicable, following the standard notation conventions, e.g., , , and so on. When there is no room for ambiguity, these subscripts will be omitted. The probability of an event will be denoted by , and the expectation operator with respect to (w.r.t.) a probability distribution will be denoted by . Again, the subscript will be omitted if the underlying probability distribution is clear from the context. The entropy of a generic distribution on will be denoted by . For two positive sequences and , the notation will stand for equality in the exponential scale, that is, . Similarly, means that , and so on. The indicator function of an event will be denoted by . The notation will stand for .

The empirical distribution of a sequence , which will be denoted by , is the vector of relative frequencies of each symbol in . The type class of , denoted , is the set of all vectors with . Information measures associated with empirical distributions will be denoted with ‘hats’ and will be subscripted by the sequences from which they are induced. For example, the entropy associated with , which is the empirical entropy of , will be denoted by . Similar conventions will apply to the joint empirical distribution, the joint type class, the conditional empirical distributions and the conditional type classes associated with pairs (and multiples) of sequences of length . Accordingly, will be the joint empirical distribution of , and will denote the joint type class of . Similarly, will stand for the conditional type class of given , will designate the empirical joint entropy of and , will be the empirical conditional entropy, will denote empirical mutual information, and so on. We will also use similar rules of notation in the context of a generic distribution, (or , for short): we use for the type class of sequences with empirical distribution , – for the corresponding empirical entropy, – for the joint type class x, – for the conditional type class of given , – for the joint empirical entropy, – for the conditional empirical entropy, – for the empirical mutual information, and so on. We will also use the customary notation for the weighted divergence,

(1)

III. Problem Setting, Preliminaries and Objectives

A. Problem Setting

The problem setting is similar to the one in [9], but with a few small differences, mainly related to the fact that here, in contrast to [9], we allow variable–rate binning codes.

Consider the following system model for biometric identification. An enrollment source sequence, , that is a realization of the random vector , that emerges from a discrete memoryless source (DMS), , with a finite alphabet , is fed into an enrollment encoder, , that generates two outputs: a secret key, (a realization of a random variable ), and a helper message, (a realization of ), both taking values in finite alphabets, and , respectively. In the fixed–rate regime, and (assuming that and are integers), where is the secret–key rate, and is the helper–message rate. In the variable–rate regime, we allow both rates to depend on the type of the given input vector . In particular, in the variable rate regime, each is mapped, by the secret–key encoder and by the helper–message encoder, into and , respectively, where and , henceforth referred to as rate functions, are given continuous functions of . These encodings designate the enrollment stage.

Since the fixed–rate case is obviously a special case of the variable–rate case, our description will henceforth relate to the variable–rate case, with the understanding that in the fixed–rate case, and are just constants, denoted and , independent of .

As in [6], we consider the ensemble of enrollment encoders, , generated by random binning, where for each source vector

, one selects independently at random, both a secret key and a helper message, under the uniform distributions across

and , respectively. We denote by and , the randomly selected bin assignments for both outputs.

The authentication decoder, , which is aware of the randomly selected encoder, , is fed by two inputs: the helper message and an authentication source sequence, (a realization of ), that is produced at the output of a discrete memoryless channel (DMC), , with a finite output alphabet , that is fed by . The output of the authentication decoder is (a realization of ), which is an estimate (possibly, randomized) of the secret key, . If , access to the system is granted, otherwise, it is denied. This decoding operation stands for the authentication stage.

The optimal estimator of , based on , in the sense of minimum FR probability, , is the maximum a posteriori probability (MAP) estimator, given by

(2)

where (shorthand notation for

) is the posterior probability of

given , that is induced by the product distribution, (and the subscript will sometimes be suppressed for simplicity, when there is no risk of compromising clarity).

As in [9], here too, we consider the framework of generalized stochastic likelihood decoders (GLDs) [8], [10], [11], [14], where the decoder randomly selects its output according to the posterior distribution

(3)

where the function , which will be referred to as the decoding metric, is any continuous function of the joint empirical distribution . As explained in [9], as well as in earlier studies, the motivation for considering GLDs is that they provide a unified framework for examining a large variety of decoders. For example, with

(4)

we have the ordinary likelihood decoder [10], [11], [14]. For

(5)

being a parameter, we extend this to a parametric family of decoders. In particular, leads to the ordinary MAP decoder, . Other choices of are associated with mismatched metrics,

(6)

being different from , and

(7)

which for , tends to the universal minimum entropy decoder. When , being an arbitrary function and , we end up with Csiszár’s –decoder [4].

An illegal user (imposter), who claims for a given legal identity, does not have the correlated biometric data , and so, the best he/she can do is to estimate based on , and then forge any fake biometric data , which together with , would cause the decoder to output this estimate of . More precisely, the imposter first estimates according to

(8)

and then generates any such that , and uses it as the biometric signal for authentication.

B. Preliminaries

In [9, Theorems 1, 4, and 5], the following two results (among others) were derived for fixed–rate binning at rates and : the best achievable FA exponent is given by

(9)

and the random coding FR exponent is given by,

(10)

where

(11)

As shown in [9, eq. (12)], for the decoding metric , eq. (11) simplifies to

(12)

which is equivalent to the error exponent expression corresponding to optimal MAP decoding, (i.e., eq. (5) with ).

C. Objectives

As described in the Introduction, our first objective is to derive the FR error exponent as a function of the prescribed FA error exponent, henceforth referred to as the FR–FA trade-off function, for fixed–rate codes with optimal rate functions and optimal decoding metrics. This FR–FA trade-off function will be derived from eqs. (9)–(11). The more interesting goal would then be to extend the scope to variable–rate codes, derive optimal rate functions, and , then use them to obtain the FR–FA trade–off function for variable–rate codes together with their own optimal decoding metrics, and finally, compare to the trade-off function of fixed–rate codes.

Another objective is to examine the effect of imposing a privacy leakage constraint in addition to the FA error exponent constraint. This will be carried out in both the fixed–rate regime and the variable–rate regime.

IV. Optimal Rate Functions and Decoding Metrics

We begin by deriving optimal rate functions for both fixed–rate codes and variable–rate codes.

A. Fixed–Rate Codes

For fixed–rate codes, the following lemma establishes the optimal helper–message rate, , and secret key rate, , for a given value, , of the FA error exponent, .

Lemma 1

Necessary and sufficient conditions for the existence of fixed–rate codes that achieve are:

(13)
(14)

Note that the requirement is quite intuitive, because even a blind guess of may succeed with probability of . It was shown in [13] that the best achievable FA exponent is given in turn by . This is coherent with the result [6, Theorem 2.1] that is also an achievable upper bound on .

Proof of Lemma 1. From eq. (9), it is immediately seen that the statement, , is equivalent to the statement

(15)

which in turn is equivalent to the two simultaneous statements,

(16)
(17)

The former happens if and only if , which is eq. (13). As for the latter, for , the r.h.s. is non–positive, whereas the l.h.s. is non–negative, and so, there is no limitation on , which is associated with the region . For , on the other hand, we must have , or equivalently,

(18)

for every such that . This, in turn, is equivalent to the requirement given in the first two lines of eq. (14). The third line of (14) is obtained as follows:

(19)

where the third equality is follows from convexity in and concavity (in fact, affinity) in .

B. Variable–Rate Codes

For variable–rate codes, we have the following lemma, which sets the stage for optimal rate functions.

Lemma 2

Necessary and sufficient conditions for the existence of variable–rate codes that achieve FA error exponent at least as large as are:

(20)
(21)

Observe that for all with , which roughly speaking, means that the mapping from to is one–to–one within each type, .

Proof of Lemma 2. Eq. (9) easily extends to the variable–rate case, by simply substituting and instead of and , respectively. Therefore, the same reasoning as in the proof of Lemma 1 applies in the variable–rate setting considered here as well, except that now, there is no need for optimization (maximization, in the case of , and minimization, in the case of ), as the binning rates are allowed to depend on the type, .

V. FR–FA Trade-off Functions

In this section, we characterize FR–FA trade-off functions for both the random coding ensembles of both fixed–rate and variable–rate codes.

A. Fixed–Rate Codes

According to eq. (10), the random coding FR exponent depends on only, and it is a monotonically, non–decreasing function of this variable. Thus, the best one can do with fixed–rate codes is to use the highest allowable binning rate, which is . Therefore, if we denote the fixed–rate FR–FA trade-off function by (where the superscript “f” stands for “fixed–rate”), we have the following expression for the optimal decoding metric, (see eq. (12)),

(22)

which is also well known to be the Csiszár–style formula for the error exponent associated with ordinary MAP decoding (of the full source vector , rather than just the secret key ) for a random Slepian–Wolf code (see, e.g., [3, eqs. (7), (19)]).

As mentioned already in the Introduction, in this paper, we are also be interested in Gallager–style forms, since they are more convenient to work with when it comes to numerical calculations. The Gallager–style form of eq. (22) is well known [5], [3, p. 9] to be

(23)

Upon substituting the expression of (see eq. (14)), we finally obtain

(24)

B. Variable–Rate Codes

We now derive a FR–FA trade-off function for the variable–rate case, which will be denoted by .

The analysis associated with variable–rate codes is somewhat more complicated than with fixed–rate codes. Note that since the alphabets, and , of and , respectively, depend on the type class, , of the source sequence, a given can be generated only by types for which is at least as large as the numerical index111By “numerical index”, we mean the integer corresponding to the location of within , which is a number between and . of , and a similar comment applies to . This means that in the generalized posterior, defined in (3), the summations over the source vectors, , at both the numerator and the denominator, should now be limited only to members of the type classes that support the given and . Consequently, some modifications in the analysis of [9, Proof of Theorem 1] should be carried out in the variable–rate case.222In particlar, the maximizations over , in eqs. (19), (22) of [9], and the minimizations in eq. (23) therein, should be limited only to types that pertain to that support the given . It is easy to see, however, that a valid lower bound333While the exact FR random coding error exponent can be derived in principle, it is much more complicated to use than this lower bound. Nonetheless, as we show in the sequel, if even this lower bound would yield a significant improvement relative to fixed–rate codes (as we demonstrate in the sequel), then a–fortiori, this would also be the case with the exact FR error exponent. Also, this lower bound is tight for all whose numerical index is a sub–exponential function of (and then supported by essentially all types ). to the resulting FR error exponent is obtained if one simply substitutes instead of the fixed of eqs. (10)–(12). In other words, we will use the expression,

(25)

Once again, since the FR exponent of eqs. (10)–(12) is a monotonically non–decreasing function of the helper–message rate, the best we can do in terms of this expression, is to let saturate its maximum allowed value, , as given in Lemma 2.

Having adopted eq. (25) as our figure of merit, the following point is important as well: the universal decoding metric , that we have used above for fixed–rate codes, is no longer equivalent to that of MAP decoding (and hence no longer optimal) for variable–rate codes. For a given rate function, , the following decoding metric should be used instead in order to obtain the same random coding FR exponent as in MAP decoding:

(26)

In this case, referring to eqs. (10) and (12) of [9], we have

(27)

where the last line corresponds to (pairwise) errors pertaining to the MAP decoder (for ordinary Slepian–Wolf decoding), and hence the optimality. Now, for , we can present the above bound to the FR error exponent (omitting the subscript of , which is no longer needed):

(28)

This is the Csiszár–style formula of the corresponding FR–FA trade-off function for variable–rate codes. The following theorem provides the Gallager–style form of the same function.

Theorem 1

The variable–rate FR–FA trade-off function (28) can also be presented as

(29)

where the maximum over is taken over the simplex of probability distributions over , i.e., for all and .

Before turning to the proof of Theorem 1, we pause to demonstrate it and to discuss some aspects, consequences and extensions of this theorem.

Optimization issues. First, note that the formula (29) involves optimization over a probability distribution in addition to the parameters and , namely, a total of parameters. This number is never larger (and in most cases, considerably smaller) than the parameters that are associated with the minimization over in the Csiszár–style formula of eq. (28). Moreover, since the Gallager–style formula (29) involves only maximization, any arbitrary choice of , and in their allowed ranges, would yield a valid lower bound (a guarantee) on the achievable FR error exponent. This is different from the situation with the Csiszár–style formula, which involves minimization, and hence allows no such privilege: one must carry out the minimization in order to obtain the achievable FR error exponent. One drawback of the Gallager–style formula is that the range of maximization over the parameter is infinite. In practical numerical calculations, however, one can initially limit the range to an interval of the form , and then gradually enlarge up to the point where no further increase in improves on the resulting maximum.

A few words are in order concerning the maximization over , which is a relatively computationally demanding step, especially for a large source alphabet. First, note that in situations with a sufficient degree of symmetry, the optimal turns out to be the uniform distribution, a fact that saves the optimization numerically. This turns out to be the case if is uniform and the probability vectors , are all permutations, , of one such vector, that form a group (w.r.t. compositions of permutations), such that , where designates the uniform distribution over . This happens, for instance, when and is a modulo–additive channel. To see why this is true, we define

(30)

which we show444See the proof of Theorem 1 in the sequel. to be equal to

(31)

From the first representation of , it is easy to see that it is concave in . From the second representation and the assumed symmetry, it is easy to see that for every and every , . Thus,

(32)

which means that the optimal is uniform. In the general case, the optimization over is a convex program, and so, there are standard solvers that can handle this problem more efficiently than a brute–force exhaustive search.

Example. In order to demonstrate the advantage of variable–rate codes relative to fixed–rate codes in terms of the FR–FA trade-off, we now provide a numerical example. Consider the case of a double binary source with alphabets , and joint probabilities given by , , , and . Fig. 1 displays the two FR–FA trade-off functions, and , for this source. As can be seen, the gap between these two functions is rather considerable, which means that variable–rate codes with optimal rate functions, are significantly better in terms of these trade–offs. This example is quite representative in the sense that other examples (with different source probabilities) yielded qualitatively similar results.

Figure 1: Graphs of (solid red curve) and (dashed blue curve) for the double binary source, defined by , , , and .

Variable–rate codes do not always improve on fixed–rate codes. It should be kept in mind, however, that there are situations where variable–rate codes offer no improvement over fixed–rate codes, i.e., they might have exactly the same FR–FA trade-off function in some cases. One such example is the case where the source has a uniform distribution. In this case, the optimal rate function for variable–rate codes turns out to be (whenever ), which is independent of , and hence is a fixed–rate anyway. So unless the dominant type happens to fall in the region , when the source is uniform, variable–rate codes cannot offer any improvement beyond the performance of fixed–rate codes.

Another aspect is associated with the decoding metric. Even for a general source, , if one uses a random variable–rate code, but decodes it using the decoding metric function of fixed–rate codes, , instead of the optimal decoding metric for variable–rate codes, , then the resulting FR–FA trade–off function turns out to be exactly the same as with fixed–rate codes.

Mismatched decoding. Our results can be extended to apply to a mismatched decoding metric, (see eq. (6)), for an arbitrary , using the same techniques. The resulting fixed–rate Gallager–style FR–FA trade-off function would then be