1 Introduction
This document is currently incomplete, and has been uploaded primarily as a supporting document for [2]. The completed version will be uploaded shortly
Consider the following problem: we are given a “gallery” of items and a single “probe” entry, which we expect “matches” some of the g entries, in some sense. Our task is to retrieve the gallery entries that match the probe.
A typical problem, for instance, is when we are given a gallery of biometric identifiers, such as faces, and a probe instance, which is also a biometric instance (e.g. another face image, or even any other modality such as fingerprint or voice). We must retrieve the appropriate gallery entries that are from the same person as the probe entry. Alternately, we may be given a gallery of documents by a number of authors, and a probe document of an unknown author. We must find the gallery entries that match the probe. Many other such problems can be found.
In these problems, the general solution is to find statistical dependencies between entries that relate the two types of data (from the probe and the gallery) and recover the matching entries based on these. Typically, the solution comprises considering some variant of , or for each of the gallery entries, and determining the match based on this value [1]
. This probability itself may utilize any kind of underlying statistical model. These joint models must often be learned from joint presentation of the types of data present in the probe and the gallery.
Often, however, we can find common covariates to the probe and gallery data, which can be independently determined. For instance, in biometric identification, the gender, ethnicity, nationality, and even characteristics such as body size, affect both the probe and gallery entries. In the document case, the gender and nationality of the author, the writing style, etc., affect the probe and gallery entries.
We expect the covariate values for probe and its matching gallery entries to be identical.
The key feature here is that these covariates, being known entities, may be independently determined from both probe and gallery entries. For instance, in the biometric problem the gender or ethnicity of a subject may be independently determined for both of them. In the document problem, the gender, nationality, and writing style of the author can be independently determined for the probe and gallery entries. To learn to identify these covariate characteristics, the joint distribution of probe and gallery data need not be considered at all.
The question we address here how accurate
can retrieval be when it is based only on matching the covariate information of probe and gallery entries, i.e., if the only information used for the retrieval was the estimated covariate values of the probe and gallery data. E.g., in recovering the correct face from a gallery, how accurate would we be if all we did was to match the estimated gender of the probe and gallery entries.
We analyze this problem in a number of different settings.

Retrieval of unique match from a gallery of . Here we assume a gallery of entries, where exactly one of the gallery entries matches the probe.

Verification. The gallery comprises only one entry. I.e., given two data instances, one nominally the probe, and the other the gallery, we must determine if they both match or not.
In all of these settings we will derive an optimal policy for identifying gallery matches to the probe, and the error to be expected in following this policy.
We will make some simplifying assumptions. We assume that the gallery entries are all independently drawn and do not inform about one another. We assume only a single probe entry. Also, although we assume that only a single covariate is considered at any time; this is not a real restriction – groups of covariates fall into the same analysis by simply considering the group as a single extended covariate.
We will assume that the “imposter” entries in the gallery, i.e. gallery entries that do not match the probe, are drawn independently of the probe. We will also assume that the errors in determining the covariate values of gallery entries are independent of the errors made on probe.
2 Retrieval of a Unique Match from a Gallery Of
We first consider the problem of retrieval from a gallery of entries, where it is known that exactly one of the entries is a match to the probe.
Consider a covariate that can take values in the set . We will assume for this document that is discrete, although this is not necessary.
Figure 1 displays our model. We model the automatic classification of the covariate values for probe instances as a noisy channel . We model the automatic classification of the covariate values for gallery instances as a noisy channel .
The overall model has the following statistical components:

A probability distribution which specifies the probability that the noisy channel will ouput the value in response to input .

A probability distribution from which “imposter” gallery entries are drawn.

A probability distribution which specifies the probability that the noisy channel will ouput the value in response to input .
We assume all of these distributions are known.
The generative model for the process is as follows:

The probe entry is drawn according to probability .

is passed through the channel , which outputs a noisy covariate in response.

The probe entry is passed through the channel , which outputs the noisy covariate in response. is added to the gallery as the matching entry to the probe.

To fill the rest of the gallery of size , additional entries with covariates are drawn independently according to . Each of these is passed through the channel to obtain the noisy covariate , which is included in the gallery.
Note that using the known distributions, we can also compute the following terms:

The overall probability of observing a noisy probe value :

The a posteriori probability of the true probe covariate , given the noisy value

The overall probability that any particular gallery item (other than the entry matching the probe) will take a specific value . From the above formulation, we have
and
2.1 Defining a Policy for the Matching
We will consider a stochastic policy where, given an output from , we select a covariate value according to a probability distribution , and subsequently select one of the gallery entries for which . Note that this is a generalization of the more conventional deterministic policy (which would return a unique in response to each . As we shall see, however, the optimal strategy is indeed deterministic).
2.2 Probability of Correctness as a Function of Policy
If there are gallery entries for which , then the probability of correctly matching the probe, given that the original probe entry was , is given by
(1) 
This factors in both, the probability that the output of the noisy channel in response to is , and that we have chosen the correct instance from the gallery items for which .
The probability that exactly of the gallery items will have value , given that the matching entry is also is given by
(2) 
where is the binomial probability or choosing of entries, with probability parameter :
Equation 2 considers the fact that if we are given that one of the is the matching entry, we must only account for the ways in which of the remaining gallery entries can also be .
The overall probability of correctness of the response when we choose is
(3) 
Using the law of iterated expectations we can now write the overall probability of correctness of the response, given a noisy probe as
(4) 
2.3 Optimal Policy
Our objective is to find the policy that maximizes the probability of correctness for any probe :
Define as
From inspection of Equation 4 we obtain the following optimal policy.
(5) 
2.4 Optimal Error
Given any noisy probe , the probability error under the optimal policy is given by
(6) 
The overall probability of error is given by
(7) 
3 The Verification Problem
Figure 2 shows our model for the verification problem. We have two conditions: “match” and “mismatch”. Under match, a single covariate is drawn from and passed through the two noisy channels and to produce the probe entry and the gallery entry . Under mismatch, and are drawn independently from and respectively and passed through and to produce and .
From observing and we must determine which of the two conditions, or , produced them.
3.1 Defining The Error
To analyze the problem we must first define the error of matching apporpriately.
When we wrongly identify a case of match as a mismatch (i.e. we “reject” a match), we have an instance of a false rejection. When a mismatch is misidentified as a match (i.e. we “accept” a mismatch), we have a false acceptance.
Let represent the “false rejection rate”, i.e. the probability that a match will be wrongly rejected. Let
represent the “false acceptance rate”, i.e. the probability that a negative match will be wrongly accepted. Any classifier can generally be optimized to trade off
against . The “Equal Error Rate” (EER) is achieved when , i.e. (or ) when .We will choose as our objective the minimization of the EER. Note that if an operating point other than EER is chosen to quantify performance (e.g. for , or for some fixed or ), the analysis below can generally be modified to accommodate it, provided a feasible solution exists.
3.2 Defining a Policy
We will use the following stochastic policy: for any pair of noisy probe and gallery values, and , we will accept the pair as a match with probability . We must find the that minimizes the EER.
3.3 Error as a Function of Policy
We first define the probabilities of observing any given pair under conditions of match and mismatch. From the model of Figure 2 we get the following probability under match:
(8) 
Above we’re utilizing the fact that and are conditionally independent of , given .
Similarly, from Figure 2, the probability of any under mismatch is given by
(9) 
The probability of a false acceptance is given by
(10) 
The probability of a false rejection is given by
(11) 
We obtain EER when , i.e.
(12) 
Thus, optimizing the policy requires solving the following
References

[1]
A. R. Webb.
Statistical pattern recognition
. John Wiley & Sons, 2003.  [2] Y. Wen, M. Al Ismail, W. Liu, B. Raj, and R. Singh. Disjoint mapping network for crossmodal matching of voices and faces. arXiv preprint arXiv:1807.04836, 2018.
Comments
There are no comments yet.