Learning a Fixed-Length Fingerprint Representation

09/21/2019 ∙ by Joshua J. Engelsma, et al. ∙ Michigan State University 17

We present DeepPrint, a deep network, which learns to extract fixed-length fingerprint representations of only 200 bytes. DeepPrint incorporates fingerprint domain knowledge, including alignment and minutiae detection, into the deep network architecture to maximize the discriminative power of its representation. The compact, DeepPrint representation has several advantages over the prevailing variable length minutiae representation which (i) requires computationally expensive graph matching techniques, (ii) is difficult to secure using strong encryption schemes (e.g. homomorphic encryption), and (iii) has low discriminative power in poor quality fingerprints where minutiae extraction is unreliable. We benchmark DeepPrint against two top performing COTS SDKs (Verifinger and Innovatrics) from the NIST and FVC evaluations. Coupled with a re-ranking scheme, the DeepPrint rank-1 search accuracy on the NIST SD4 dataset against a gallery of 1.1 million fingerprints is comparable to the top COTS matcher, but it is significantly faster (DeepPrint: 98.80 seconds vs. COTS A: 98.85 DeepPrint representation is the most compact and discriminative fixed-length fingerprint representation reported in the academic literature.



There are no comments yet.


page 1

page 2

page 3

page 5

page 6

page 7

page 10

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over 100 years ago, the pioneering giant of modern day fingerprint recognition, Sir Francis Galton, astutely commented on fingerprints in his 1892 book titled “Finger Prints”:

“They have the unique merit of retaining all their peculiarities unchanged throughout life, and afford in consequence an incomparably surer criterion of identity than any other bodily feature.” [galton]

Galton went on to describe fingerprint minutiae, the small details woven throughout the papillary ridges on each of our fingers, which Galton believed provided uniqueness and permanence properties for accurately identifying individuals. Over the 100 years since Galton’s ground breaking scientific observations, fingerprint recognition systems have become ubiquitous and can be found in a plethora of different domains [handbook] such as forensics [ngi], healthcare, mobile device security [touchid], mobile payments [touchid], border crossing [obim], and national ID [india1]. To date, virtually all of these systems continue to rely upon the location and orientation of minutiae within fingerprint images for recognition (Fig. 1).

Although automated fingerprint recognition systems based on minutiae representations (i.e. handcrafted features) have seen tremendous success over the years, they have several limitations.

(a) Level-1 features
(b) Level-2 features
Figure 1: The most popular fingerprint representation consists of (a) global level-1 features (ridge flow, core, and delta) and (b) local level-2 features, called minutiae points, together with their descriptors (e.g., texture in local minutiae neighborhoods). The fingerprint image illustrated here is a rolled impression from the NIST SD4 database [sd4]. The number of minutiae in NIST4 rolled fingerprint images range all the way from 12 to 196.
Figure 2: Failures of the COTS A minutiae-based matcher (minutiae annotated with COTS A). The genuine pair (two impressions from the same finger) in (a) was falsely rejected at 0.1% FAR (score of 9) due to heavy non-linear distortion and moist fingers. The imposter pair (impressions from two different fingers) in (b) was falsely accepted at 0.1% FAR (score of 38) due to the similar minutiae distribution in these two fingerprint images (the score threshold for COTS A @ FAR = 0.1% is 34). In contrast, DeepPrint is able to correctly match the genuine pair in (a) and reject the imposter pair in (b). These slap fingerprint impressions come from public domain FVC 2004 DB1 A database [fvc2004]. The number of minutiae in FVC 2004 DB1 A images range from 11 to 87.
  • Minutiae-based representations are of variable length, since the number of extracted minutiae (Table I) varies amongst different fingerprint images even of the same finger (Fig. 2 (a)). Variations in the number of minutiae originate from a user’s interaction with the fingerprint reader (placement position and applied pressure) and condition of the finger (dry, wet, cuts, bruises, etc.). This variation in the number of minutiae causes two main problems: (i) pairwise fingerprint comparison is computationally demanding and varies with number of minutiae and (ii) matching in the encrypted domain, a necessity for user privacy protection, is computationally expensive, and results in loss of accuracy [encryption].

  • In the context of global population registration, fingerprint recognition can be viewed as a 75 billion class problem ( billion living persons, assuming nearly all with 10 fingers) with large intra-class variability and large inter-class similarity (Fig. 2). This necessitates extremely discriminative yet compact representations that are complementary and at least as discriminative as the traditional minutiae-based representation. For example, India’s civil registration system, Aadhaar, now has a database of billion residents who are enrolled based on their 10 fingerprints, 2 irises, and face image [india1].

  • Reliable minutiae extraction in low quality fingerprints (due to noise, distortion, finger condition) is problematic, causing false rejects in the recognition system (Fig. 2 (a)). See also NIST fingerprint evaluation FpVTE 2012 [nist].

(Min, Max)
# of Minutiae1
(Min, Max)
Template Size (kB)
(12, 196) (1.5, 23.7)
(12, 225) (0.6, 5.3)
N.A.2 0.2
  • Statistics from NIST SD4 and FVC 2004 DB1.

  • Template is not explicitly comprised of minutiae.

  • Template size is fixed at 200 bytes, irrespective of the number of minutiae (192 bytes for the features and 8 bytes for 2 decompression scalars).

Table I: Comparison of variable length minutiae representation with fixed-length DeepPrint representation
Figure 3: Fixed-length, 192-dimensional fingerprint representations extracted by DeepPrint (shown as feature maps) from the same four fingerprints shown in Figure 2

. Unlike COTS A, we correctly classify the pair in (a) as a genuine pair, and the pair in (b) as an imposter pair. The score threshold of DeepPrint @ FAR = 0.1% is 0.76

Figure 4:

Flow diagram of DeepPrint: (i) a query fingerprint is aligned via a Localization Network which has been trained end-to-end with the Base-Network and Feature Extraction Networks (no reference points are needed for alignment); (ii) the aligned fingerprint proceeds to the Base-Network which is followed by two branches; (iii) the first branch extracts a 96-dimensional texture-based representation; (iv) the second branch extracts a 96-dimensional minutiae-based representation, guided by a side-task of minutiae detection (via a minutiae map which does not have to be extracted during testing); (v) the texture-based representation and minutiae-based representation are concatenated into a 192-dimensional representation of 768 bytes (192 features and 4 bytes per float). The 768 byte template is compressed into a 200 byte fixed-length representation by truncating floating point value features into integer value features, and saving the scaling and shifting values (8 bytes) used to truncate from floating point values to integers. The 200 byte DeepPrint representations can be used both for authentication and large-scale fingerprint search. The minutiae-map can be used to further improve system accuracy and interpretability by re-ranking candidates retrieved by the fixed-length representation.

To overcome the limitations of minutiae-based matchers, we present a reformulation of the fingerprint recognition problem. In particular, rather than extracting varying length minutiae-sets for matching (i.e. handcrafted features), we design a deep network embedded with fingerprint domain knowledge, called DeepPrint, to learn a fixed-length representation of 200 bytes which discriminates between fingerprint images from different fingers (Fig. 4

). Our work follows the trajectory of state-of-the-art automated face recognition systems which have almost entirely abandoned traditional handcrafted features in favor of deep features extracted by deep networks with remarkable success 

[face1, face3, face5]. However, unlike deep network based face recognition systems, we do not completely abandon handcrafted features. Instead, we aim to integrate handcrafted fingerprint features (minutiae 111Note that we do not require explicitly storing minutiae in our final template. Rather, we aim to guide DeepPrint to extract features related to minutiae during training of the network.) into the deep network architecture to exploit the benefits of both deep networks and traditional, domain knowledge inspired features.

While prevailing minutiae-matchers require expensive graph matching algorithms for fingerprint comparison, the 200 byte representations extracted by DeepPrint can be compared using simple distance metrics such as the cosine similarity, requiring only

multiplications and additions, where is the dimensionality of the representation (for DeepPrint, )222The DeepPrint representation is originally 768 bytes (192 features and 4 bytes per float value). We compress the 768 bytes to 200 by scaling the floats to integer values between [0,255] and saving the two compression parameters with the features. This loss in precision (which saves significant disk storage space) does not affect matching accuracy.. Another significant advantage of this fixed-length representation is that it can be matched in the encrypted domain using fully homomorphic encryption [homomorphic, homo1, homo2, homo3]. Finally, since DeepPrint is able to encode features that go beyond fingerprint minutiae, it is able to match poor quality fingerprints when reliable minutiae extraction is not possible (Figs. 2 and 3).

To arrive at a compact and discriminative representation of only 200 bytes, the DeepPrint architecture is embedded with fingerprint domain knowledge via an automatic alignment module and a multi-task learning objective which requires minutiae-detection (in the form of a minutiae-map) as a side task to representation learning. More specifically, DeepPrint automatically aligns an input fingerprint and subsequently extracts both a texture representation and a minutiae-based representation

(both with 96 features). The 192-dimensional concatenation of these two representations, followed by compression from floating point features to integer value features comprises a 200 byte fixed-length representation (192 bytes for the feature vector and 4 bytes for storing the 2 compression parameters). As a final step, we utilize Product Quantization 

[product_quant] to further compress the DeepPrint representations stored in the gallery, significantly reducing the computational requirements and time for large-scale fingerprint search.

Detecting minutiae (in the form of a minutiae-map) as a side-task to representation learning has several key benefits:

  • We guide our representation to incorporate domain inspired features pertaining to minutiae by sharing parameters between the minutiae-map output task and the representation learning task in the multi-task learning framework.

  • Since minutiae representations are the most popular for fingerprint recognition, we posit that our method for guiding the DeepPrint feature extraction via its minutiae-map side-task falls in line with the goal of “Explainable AI” [EAI].

  • Given a probe fingerprint, we first use its DeepPrint representation to find the top candidates and then re-rank the top candidates using the minutiae-map provided by DeepPrint 333The DeepPrint minutiae-map can be easily converted into a minutiae-set with minutia: and passed to any minutia-matcher (e.g., COTS A, COTS B, or [cao]).. This optional re-ranking add-on further improves both accuracy and interpretability.

The primary benefit of the 200 byte representation extracted by DeepPrint comes into play when performing mega-scale search against millions or even billions of identities (e.g., India’s Aadhaar [india1] and the FBI’s Next Generation Identification (NGI) databases [ngi]). To highlight the significance of this benefit, we benchmark the search performance of DeepPrint against the latest version SDKs (as of July, 2019) of two top performers in the NIST FpVTE 2012 (Innovatrics444https://www.innovatrics.com/ v7.2.1.40 and Verifinger555https://www.neurotechnology.com/ v10.0666We note that Verifinger v10.0 performs significantly better than earlier versions of the SDK often used in the literature.) on the NIST SD4 [sd4] and NIST SD14 [sd14] databases augmented with a gallery of nearly 1.1 million rolled fingerprints. Our empirical results demonstrate that DeepPrint is competitive with these two state-of-the-art COTS matchers in accuracy while requiring only a fraction of the search time. Furthermore, a given DeepPrint fixed-length representation can also be matched in the encrypted domain via homomorphic encryption with minor loss to recognition accuracy as shown in [homomorphic] for face recognition.

Algorithm Description
HR @ PR = 1.0%1
HR @ PR = 1.0%
(NIST SD14)3
Template Size
Jain et al. [fingercode, fingercode2]
Fingercode: Global representation
extracted using Gabor Filters
N.A. N.A. 640 N.A.
Cappelli et al. [mcc]
MCC: Local descriptors via
3D cylindrical structures
comprised of the minutiae-set representation
93.2% 91.0% 1,913 2,700
Cao and Jain [index1]
Inception v3: Global deep
representation extracted via
Alignment and Inception v3
98.65% 98.93% 8,192 250,000
Song and Feng [index2]
PDC: Deep representations extracted at
different resolutions and aggregated
into global representation
93.3% N.A. N.A. 2,000
Song et al. [index3]
MDC: Deep representations extracted
from minutiae and aggregated into
global representation
99.2% 99.6% 1,200 2,700
Li et al. [index4]
Finger Patches: Local deep
representations aggregated into
global representation via global
average pooling
99.83% 99.89% 1,024 2,700
DeepPrint: Global deep representation
extracted via multi-task CNN
with built-in fingerprint alignment
99.75% 99.93% 200 1,100,000
  • In some baselines we estimated the data points from a Figure (specific data points were not reported in the paper).

  • Only 2,000 fingerprints are included in the gallery to enable comparison with previous works. (HR = Hit Rate, PR = Penetration Rate)

  • Only last 2,700 pairs (2,700 probes; 2,700 gallery) are used to enable comparison with previous works.

  • Largest gallery size used in the paper.

  • The DeepPrint representation can be further compressed to only 64 bytes using product quantization with minor loss in accuracy.

Table II: Published Studies on Fixed-Length Fingerprint Representations

More concisely, the primary contributions of this work are:

  • A customized deep network (Fig. 4), called DeepPrint, which utilizes fingerprint domain knowledge (alignment and minutiae detection) to learn and extract a discriminative fixed-length fingerprint representation.

  • The use of Product Quantization to compress DeepPrint representations, enabling even faster mega-scale search (51 ms search time against a gallery of 1.1 million fingerprints vs. 27,000 ms for a COTS with comparable accuracy).

  • A two-stage fingerprint search scheme whereby candidates retrieved by DeepPrint representations are re-ranked using a minutiae-matcher in conjunction with the DeepPrint minutiae-map. This further improves system interpretability and accuracy.

  • Benchmarking DeepPrint against two state-of-the-art COTS matchers (Innovatrics and Verifinger) on NIST SD4 and NIST SD14 against a gallery of 1.1 million fingerprints. Empirical results demonstrate that DeepPrint is comparable to COTS matchers in accuracy at a significantly faster search speed.

  • Benchmarking the authentication performance of DeepPrint on the NIST SD4 and NIST SD14 rolled-fingerprints databases and the FVC 2004 DB1 A slap fingerprint database [fvc2004]. Again, DeepPrint shows comparable performance against the two COTS matchers, demonstrating the generalization ability of DeepPrint to both rolled and slap fingerprint databases.

  • Demonstrating that homomorphic encryption can be used to match DeepPrint templates in the encrypted domain, in real time (1.26 ms), with minimal loss to matching accuracy as shown for fixed-length face representations [homomorphic].

  • An interpretability visualization which demonstrates our ability to guide DeepPrint to look at minutiae-related features.

Figure 5: Fingerprint impressions from one subject in the DeepPrint training dataset [longitudinal]. Impressions were captured longitudinally, resulting in the variability across impressions (contrast and intensity from environmental conditions; distortion and alignment from user placement). Importantly, training with longitudinal data enables learning compact representations which are invariant to the typical noise observed across fingerprint impressions over time, a necessity in any fingerprint recognition system.

2 Prior Work

Several early works [fingercode, fingercode2, mcc] presented fixed-length fingerprint representations using traditional image processing techniques. In [fingercode, fingercode2], Jain et al. extracted a global fixed-length representation of 640 bytes, called Fingercode, using a set of Gabor Filters. Cappelli et al. introduced a fixed-length minutiae descriptor, called Minutiae Cylinder Code (MCC), using 3D cylindrical structures computed with minutiae points[mcc]. While both of these representations demonstrated success at the time they were proposed, their accuracy is now significantly inferior to state-of-the-art COTS matchers.

Following the seminal contributions of [fingercode, fingercode2] and [mcc], the past 10 years of research on fixed-length fingerprint representations has been quite stagnant. However, recent studies [index1, index2, index3, index4] have utilized deep networks to extract highly discriminative fixed-length fingerprint representations. More specifically, (i) Cao and Jain [index1] used global alignment and Inception v3 to learn fixed-length fingerprint representations. (ii) Song and Feng [index2] used deep networks to extract representations at various resolutions which were then aggregated into a global fixed-length representation. (iii) Song et al. [index3] further learned fixed-length minutiae descriptors which were aggregated into a global fixed-length representation via an aggregation network. Finally, (v) Li et al. [index4] extracted local descriptors from predefined “fingerprint classes” which were then aggregated into a global fixed-length representation through global average pooling.

While these efforts show tremendous promise, each method has some limitations. In particular, (i) the algorithms proposed in [index1] and [index2] both required computationally demanding global alignment as a preprocessing step, and the accuracy is inferior to state-of-the-art COTS matchers. (ii) The representations extracted in [index3] require the arduous process of minutiae-detection, patch extraction, patch-level inference, and an aggregation network to build a single global feature representation. (iii) While the algorithm in [index4] obtains high performance on rolled fingerprints (with small gallery size), the accuracy was not reported for slap fingerprints. Since [index4] aggregates local descriptors by averaging them together, it is unlikely that the approach would work well when areas of the fingerprint are occluded or missing (often times the case in slap fingerprint databases like FVC 2004 DB1 A), and (v) all of the algorithms, suffer from lack of interpretability compared to traditional minutiae representations.

In addition, existing studies targeting deep, fixed-length fingerprint representations all lack an extensive, large-scale evaluation of the deep features. Indeed, one of the primary motivations for fixed-length fingerprint representations is to perform orders of magnitude faster large scale search. However, with the exception of Cao and Jain [index1], who evaluate against a database of 250K fingerprints, the next largest gallery size used in any of the aforementioned studies is only 2,700.

As an addendum, deep networks have also been used to improve specific sub-modules of fingerprint recognition systems such as segmentation [seg1, seg2, seg3, seg4], orientation field estimation [orien1, orien2, orien3], minutiae extraction [minut1, minut2, minut3], and minutiae descriptor extraction [descript1]. However, these works all still operate within the conventional paradigm of extracting an unordered, variable length set of minutiae for fingerprint matching.

3 DeepPrint

In the following section, we (i) provide a high-level overview and intuition of DeepPrint, (ii) present how we incorporate automatic alignment into DeepPrint, and (iii) demonstrate how the accuracy and interpretability of DeepPrint is improved through the injection of fingerprint domain knowledge.

1:: Shallow localization network, outputs

A: Affine matrix composed with parameters

3:: Bilinear grid sampler, outputs aligned fingerprint
4:: Inception v4 stem
5:: Shared minutiae parameters
6:: Minutia representation branch
7:: Minutiae map estimation
8:: Texture representation branch
10:Input: Unaligned fingerprint image
19:Output: Fingerprint representation and minutiae-map . ( can be optionally utilized for (i) visualization and (ii) fusion of DeepPrint scores obtained via with minutiae-matching scores.)
Algorithm 1 Extract DeepPrint Representation
Figure 6: Unaligned fingerprint images from NIST SD4 (top row) and corresponding DeepPrint aligned fingerprint images (bottom row).

3.1 Overview

A high level overview of DeepPrint is provided in Figure 4 with pseudocode in Algorithm 1. DeepPrint is trained with a longitudinal database (Fig. 5) comprised of 455K rolled fingerprint images stemming from 38,291 unique fingers [longitudinal]. Longitudinal fingerprint databases consist of fingerprints from distinct subjects captured over time (Fig. 5[longitudinal]. It is necessary to train DeepPrint with a large, longitudinal database so that it can learn compact, fixed-length representations which are invariant to the differences introduced during fingerprint image acquisition at different times and in different environments (humidity, temperature, user interaction with the reader, and finger injuries). The primary task during training is to predict the finger identity label (encoded as a one-hot vector) of each of the 455K training fingerprint images ( fingerprint impressions / finger). The last fully connected layer is taken as the representation for fingerprint comparison during authentication and search.

The input to DeepPrint is a  777Fingerprint images in our training dataset vary in size from to . As a pre-processing step, we do a center cropping (using Gaussian filtering, dilation and erosion, and thresholding) to all images to . This size is sufficient to cover most of the rolled fingerprint area without extraneous background pixels. grayscale fingerprint image, , which is first passed through the alignment module (Fig. 4). The alignment module consists of a localization network, , and a grid sampler,  [spatial]. After applying the localization network and grid sampler to , an aligned fingerprint is passed to the base-network, .

The base-network is the stem of the Inception v4 architecture (Inception v4 minus Inception modules). Following the base-network are two different branches (Fig. 

4) comprised primarily of the three Inception modules (A, B, and C) described in [inceptionv4]. The first branch, , completes the Inception v4 architecture 888We selected Inception v4 after evaluating numerous other architectures such as: ResNet, Inception v3, Inception ResNet, and MobileNet. as and performs the primary learning task of predicting a finger identity label directly from the cropped, aligned fingerprint . It is included in order to learn the textural cues in the fingerprint image. The second branch (Figs. 4 and 8), , again predicts the finger identity label from the aligned fingerprint , but it also has a related side task (Fig. 8) of detecting the minutiae locations and orientations in via . In this manner, we guide this branch of the network to extract representations influenced by fingerprint minutiae (since parameters between the minutiae detection task and representation learning task are shared in ). The textural cues act as complementary discriminative information to the minutiae-guided representation. The two 96-dimensional representations (each dimension is a float, consuming 4 bytes of space) are concatenated into a 192-dimensional representation (768 total bytes). Finally, the floats are truncated from 32 bits to 8 bit integer values, compressing the template size to 200 bytes (192 bytes for features and 8 bytes for 2 decompression parameters). Note that the minutiae set is not explicitly used in the final representation. Rather, we use the minutiae-map to guide our network training. However, for improved accuracy and interpretability, we can optionally store the minutiae set for use in a re-ranking scheme during large-scale search operations.

In the following subsections, we provide details of the major sub-components of the proposed network architecture.

3.2 Alignment

In nearly all fingerprint recognition systems, the first step is to perform alignment based on some reference points (such as the core point). However, this alignment is computationally expensive. This motivated us to adopt attention mechanisms such as the spatial transformers in [spatial].

The advantages of using the spatial transformer module in place of reference point based alignment algorithms are two-fold: (i) it requires only one forward pass through a shallow localization network (Table III), followed by bilinear grid sampling. This reduces the computational complexity of alignment (we resize the fingerprints to 999We also tried , however, we could not obtain consistent alignment at this resolution. to further speed up the localization estimation); (ii) The parameters of the localization network are tuned to minimize the loss (Eq. 9) of the base-network and representation extraction networks. In other words, rather than supervising the transformation via reference points (such as the core point), we let the base-network and representation extraction networks tell the localization network what a “good” transformation is, so that it can learn a more discriminative representation for the input fingerprint.

Figure 7: Minutiae Map Extraction. The minutiae locations and orientations of an input fingerprint (a) are encoded as a 6-channel minutiae map (b). The “hot spots” in each channel indicate the spatial location of the minutiae points. The channel of the hot spots indicate their orientations.

Size, Stride

Convolution ,
Max Pooling ,
Convolution ,
Max Pooling ,
Convolution ,
Max Pooling ,
Convolution ,
Max Pooling ,
Fully Connected
Fully Connected
  • These three outputs correspond to ,, shown in Fig. 4.

Table III: Localization Network Architecture

Given an unaligned fingerprint image , a shallow localization network first hypothesizes the translation and rotation parameters (,, and ) of an affine transformation matrix (Fig. 4). A user specified scaling parameter is used to complete (Fig. 4). This scaling parameter stipulates the area of the input fingerprint image which will be cropped. We train two DeepPrint models, one for rolled fingerprints () and one for slap fingerprints () meaning a fingerprint area window will be cropped from the input fingerprint image. Given , a grid sampler samples the input image pixels for every target grid location to output the aligned fingerprint image in accordance with Equation 1.


Once has been computed, it is passed on to the base-network for classification. Finally, the parameters for the localization network are updated based upon the loss in Equation 9.

The architecture used for our localization network is shown in Table III and images from before and after the alignment module are shown in Figure 6. In order to get the localization network to properly converge, (i) the learning rate was scaled by and (ii) the upper bound of the estimated affine matrix translation and rotation parameters was set to pixels and degrees, respectively. These constraints are based on our domain knowledge on the maximum extent a user would rotate or translate their fingers during placement on the reader platen.

3.3 Minutiae Map Domain Knowledge

To prevent overfitting the network to the training data and to extract interpretable deep features, we incorporate fingerprint domain knowledge into DeepPrint. The specific domain knowledge we incorporate into our network architecture is hereafter referred to as the minutiae map [cao]. Note that the minutiae map is not explicitly used in the fixed-length fingerprint representation, but the information contained in the map is indirectly embedded in the network during training.

A minutiae map is essentially a 6-channel heatmap quantizing the locations and orientations of the minutiae within a fingerprint image. More formally, let and be the height and width of an input fingerprint image and be its minutiae template with minutiae points, where and . Then, the minutiae map at can be computed by summing the location and orientation contributions of each of the minutiae in to obtain the heat map (Fig. 7 (b)).


where and calculate the spatial and orientation contribution of minutiae to the minutiae map at based upon the euclidean distance of to and the orientation difference between and as follows:


where is the parameter which controls the width of the gaussian, and is the orientation difference between angles and :


An example fingerprint image and its corresponding minutiae map are shown in Figure 7.

3.4 Multi-Task Architecture

The minutiae-map domain knowledge is injected into DeepPrint via multitask learning. Multitask learning improves generalizability of a model since domain knowledge within the training signals of related tasks acts as an inductive bias [multi, multi0]. The multi-task branch of the DeepPrint architecture is shown in Figures 4 and 8. The primary task of the branch is to extract a representation and subsequently classify a given fingerprint image into its “finger identity”. The secondary task is to estimate the minutiae-map. Since parameters are shared between the representation learning task and the minutiae-map extraction task, we guide the minutiae-branch of our network to extract fingerprint representations that are influenced by minutiae locations and orientations. At the same time, a separate branch in DeepPrint aims to extract a complementary texture-based representation by directly predicting the identity of an input fingerprint without any domain knowledge (Fig. 4). DeepPrint extracts minutiae maps of size  101010We extract maps of to save GPU memory during training (enabling a larger batch size), and to reduce disk space requirements for storage of the maps. to encode the minutiae locations and orientations of an input fingerprint image of size . The ground truth minutiae maps for training DeepPrint are estimated using the open source minutiae extractor proposed in [cao].

Figure 8: The custom multi-task minutiae branch of DeepPrint. The dimensions inside each box represent the input dimensions.

Note, we combine the texture branch with the minutiae branch in the DeepPrint architecture (rather than two separate networks) for the following reasons: (i) the minutiae branch and the texture branch share a number of parameters (the Inception v4 stem), reducing the model complexity that two separate models would necessitate, and (ii) the spatial transformer (alignment module) is optimized based on both branches (i.e. learned alignment benefits both the texture-based and minutiae-based representations) avoiding two separate spatial transformer modules and alignments.

More formally, we incorporate domain knowledge into the DeepPrint representation by computing the network’s loss in the following manner. First, given and as computed in Algorithm 1

, fully connected layers are applied for identity classification logits, outputting

and , where is the number of identities in the training set. Next, and

are both passed to a softmax layer to compute the probabilities

and of and belonging to each identity. Finally, and , the ground truth label , and the network’s parameters , can be used to compute the combined cross-entropy loss of the two branches and an image :



. To further reduce the intra-class variance of the learned features, we also employ the widely used center-loss first proposed in 

[centerloss] for face recognition. In particular, we compute two center-loss terms, one for each branch in our multi-task architecture as:


where , are the branch, , and subject, , specific centers for a fingerprint image .

For computing the loss of the minutiae map estimation side task, we employ the Mean Squared Error Loss between the estimated minutiae map H and the ground truth minutiae map 111111The ground truth minutiae maps are estimated using the open-source minutiae extractor in [cao]. as follows:


Finally, using the addition of all these loss terms, and a dataset comprised of training images, our model parameters are trained in accordance with:



are empirically set to obtain convergence. Note, during the training, we augment our dataset with random rotations, translations, brightness, and cropping. We use the RMSProp optimizer with a batch size of 30. Weights are initialized with the variance scaling initializer. Regularization included dropout (before the embedding fully connected layer) with a keep probability of

and weight decay of .

After the multitask architecture has converged, a fixed length feature representation can be acquired by extracting the fully connected layer before the softmax layers in both of the network’s branches. Let be the unit-length minutiae representation and be the unit-length texture representation. Then, a final feature representation is obtained by concatenation of and into , followed by normalization of to unit length.

3.5 Template Compression

The final step in the DeepPrint representation extraction is template compression. In particular, the 192-dimensional DeepPrint representation consumes a total of 768 bytes. We can compress this size to 200 bytes by truncating the 32 bit floating point feature values to 8-bit integer values in the range of [0,255] using min-max normalization. In particular, given a DeepPrint representation , we transfer the domain of to and output , where we restrict the set of the natural numbers to the range of [0,255]. More formally:


where and output the minimum and maximum feature values of the vector , respectively. In order to decompress the features back to float values for matching, we need to save the minimum and maximum values for each representation. Thus, our final representation is 200 bytes, 192 bytes for the features, 4 bytes for the minimum value and 4 bytes for the maximum value. To decompress the representations (when loading them into RAM), we simply reverse the min-max normalization using the saved minimum and maximum values. Note, this does not affect the search speed since the decompressed gallery representations will already be in RAM when performing a search.

4 DeepPrint Matching

Two, unit length, DeepPrint representations and can be easily matched using the cosine similarity between the two representations. In particular:


Thus, DeepPrint authentication (1:1 matching) requires only 192 multiplications and 191 additions. Note, we also experimented with euclidian distance as a scoring function, but consistently obtained higher performance with cosine similarity.

4.1 Fusion of DeepPrint Score with Minutiae Score

Given the speed of matching two DeepPrint representations, the minutiae-based match scores of any existing AFIS can also be fused together with the DeepPrint scores with minimal loss to the overall AFIS authentication speed (i.e. DeepPrint can be easily used as an add-on to existing minutiae-based AFIS to improve recognition accuracy). In our experimental analysis, we demonstrate this by fusing DeepPrint scores together with the scores of minutiae-based matchers COTS A, COTS B, and [cao] and subsequently improving authentication accuracy. This indicates that the information contained in the compact DeepPrint representation is complementary to that of minutiae representations. Note, since DeepPrint already extracts minutiae as a side task, fusion with a minutiae-based matcher requires little extra computational overhead (simply feed the minutiae extracted by DeepPrint directly to the minutiae matcher, eliminating the need to extract minutiae a second time).

5 DeepPrint Search

Fingerprint search entails finding the top candidates, in a database (gallery or background) of fingerprints, for an input probe fingerprint. The simplest algorithm for obtaining the top candidates is to (i) compute a similarity measure between the probe template and every enrolled template in the database, (ii) sort the enrolled templates by their similarity to the probe 121212In our search experiments, we reduce the typical sorting time from to (where ) by maintaining a priority queue of size k since we only care about the scores of the top candidates. This trick reduces sorting time from 23 seconds to 8 seconds when the gallery size and the candidate list size ., and (iii) select the top most similar enrollees. More formally, finding the top candidates in a gallery for a probe fingerprint is formulated as:


where returns the most similar candidates from an input set of candidates and is a similarity function such as defined in Equation 11.

Since minutiae-based matching is computationally expensive, comparing the probe to every template enrolled in the database in a timely manner is not feasible with minutiae matchers. This has led to a number of schemes to either significantly reduce the search space, or utilize high-level features to quickly index top candidates [old_index1, old_index2, old_index3, old_index4, old_index5]. However, such methods have not achieved high-levels of accuracy on public benchmark datasets such as NIST SD4 or NIST SD14.

In contrast to minutiae-matchers, the fixed-length, 200 byte DeepPrint representations can be matched extremely quickly using Equation 11. Therefore, large scale search with DeepPrint can be performed by exhaustive comparison of the probe template to every gallery template in accordance with Equation 12. The complexity of exhaustive search is linear with respect to both the gallery size and the dimensionality of the DeepPrint representation ( in this case).

Figure 9: Examples of poor quality fingerprint images from benchmark datasets. Row 1: Rolled fingerprint impressions from NIST SD4. Row 2: Slap fingerprint images from FVC 2004 DB1 A. Rolled fingerprints are often heavily smudged, making them challenging to accurately recognize. FVC 2004 DB1 A also has several distinct challenges such as small overlapping fingerprint area between two fingerprint images, heavy non-linear distortions, and extreme finger conditions (wet or dry). Minutiae annotated with COTS A.

5.1 Faster Search

Although exhaustive search can be effectively utilized with DeepPrint representations in conjunction with Equation 12, it may be desirable to even further decrease the search time. For example, when searching against 100 million fingerprints, the DeepPrint search time is still (11 seconds on an i9 processor with 64 GB of RAM) 131313

Search time for 100 million gallery was simulated by generating 100 million random representations, where each feature was a 32-bit float value drawn from a uniform distribution from 0 to 1.

. A natural way to reduce the search time further with minimal loss to accuracy is to utilize an effective approximate nearest neighbor (ANN) algorithm.

Product Quantization is one such ANN algorithm which has been successfully utilized in large-scale face search [face6]. Product quantization is still an exhaustive search algorithm, however, representations are first compressed via keys to a lookup table, which significantly reduces the comparison time between two representations. In other words, product quantization reformulates the comparison function in Equation 11 to a series of lookup operations in a table stored in RAM. More formally, a given DeepPrint representation of dimensionality , is first decomposed into sub-vectors as:


Next, each sub-vector is mapped to a codeword in a codebook where is the size of the codebook. The index of each codeword can be represented as a binary code of bits. Therefore, after mapping each sub-vector to its codeword, the original d-dimensional representation ( for DeepPrint) can be compressed to only bits!

The codewords for each codebook

are computed offline (before search time) using k-means clustering for each sub-vector. Thus each codebook

contains centroids computed from the corresponding sub-vectors . Given all codebooks , the product quantizer of is computed as:


where is the index of the nearest centroid in the codebook .

Finally, given a DeepPrint probe representation , and the now quantized gallery template , a match score can be obtained in accordance with Equation 15:


Thus matching a probe template to each quantized template in the gallery requires a one-time build up of a table which is stored in RAM, followed by lookups and additions for each quantized template in the gallery. In our experiments, we set and . A quantized template in the gallery is compressed to 64 bytes, and search is reduced from 192 additions and multiplications ( times, where is the gallery size) to a one-time table build up, followed by 64 lookups and additions for each gallery template (a significant savings on memory and search time)141414We used the Facebook Faiss PQ implementation: https://github.com/facebookresearch/faiss.

5.2 Two-stage DeepPrint Search

In addition to increasing the speed of large-scale fingerprint search using DeepPrint with product quantization, we also propose a method whereby a negligible amount of search speed can be sacrificed in order to further improve the search accuracy. In particular, we first use the DeepPrint representations to find the top-151515The value of depends on the gallery size . For the gallery size of million, we empirically selected . candidates for a probe in a gallery . Then, the top- candidates are re-ranked using the scores of a minutiae-matcher fused together with the DeepPrint similarity scores. More formally, given a minutiae-matcher function , the re-ranked candidates can be computed by:


where and two varying length minutiae templates, and are the two fixed-length DeepPrint templates, is the DeepPrint similarity score (either Equation 11 or Equation 15), and returns a list of candidates sorted in descending order by similarity score.

We note that since DeepPrint already outputs a minutiae-map, which can easily be converted to a minutiae-set, fusing DeepPrint with a minutiae matcher is quite seamless. We simply convert the DeepPrint minutiae-maps to minutiae-sets, and subsequently input the minutiae-sets to a minutiae-matcher such as the open-source minutiae matcher in [cao].

6 Secure DeepPrint Matching

One of the primary benefits of the fixed-length, 192-dimensional DeepPrint representation is that it can be encrypted and matched in the encrypted domain (with 192 bits of security [homomorphic]) with fully homomorphic encryption (FHE). In particular, FHE enables performing any number of both addition and multiplication operations in the encrypted domain. Since DeepPrint representations can be matched using only multiplication and addition operations (Eq. 11), they can be matched in the encrypted domain with minimal loss to system accuracy (only loss in accuracy comes from converting floating point features to integer value features, resulting in a loss of precision).

In contrast, minutiae-based representations cannot be matched under FHE, since the matching function cannot be reduced to simple addition and multiplication operations. Furthermore, existing encryption schemes for minutiae-based templates such as the fuzzy-vault, result in a loss of matching accuracy, and are very sensitive to fingerprint pre-alignment [fuzzy]. We demonstrate in our experiments that the DeepPrint authentication performance remains almost unaltered following FHE matching. We utilize the Fan-Vercauteren FHE Scheme [homomorphic2] with improvements from [homomorphic] for improved speed and efficiency161616We the following open-source implementation: https://github.com/human-analysis/secure-face-matching.

Rank-1 Accuracy (%)
Rank-1 Accuracy (%)
Template Size
Range (kB)
Search Time
Inception v3 + COTS [index1]
Fixed-Length 97.80 N.A. 8 175
Finger Patches [index4]
Fixed-Length 99.27 99.04 1.0 16
Fixed-Length 98.70 99.22 0.2 11
Minutiae-based4 99.55 99.92 (1.5,23.7) 72
Minutiae-based4 92.9 92.6 (0.6,5.3) 20
  • Only 2,000 fingerprints are included in the gallery to enable comparison with previous works.

  • Last 2,700 pairs are used to enable comparison with previous works.

  • Search times for all algorithms benchmarked on NIST SD4 with an Intel Core i9-7900X CPU @ 3.30GHz

  • We use the proprietary COTS templates which are comprised of minutiae together with other proprietary features.

  • These results primarily show that (i) DeepPrints is competitive with the best fixed-length representation in the literature [index4] (with a smaller template size) and state-of-the-art COTS, but also (ii) the benchmark dataset performances are saturated due to small gallery sizes. Therefore, in subsequent experiments we compare with state-of-the-art COTS against a background of 1.1 million.

Table IV: Benchmarking DeepPrint Search Accuracy against Fixed-Length Representations in the Literature and COTS

7 Datasets

We use four sources of data in our experiments. Our training data is a longitudinal dataset comprised of 455K rolled fingerprint images from 38,291 unique fingers taken from [longitudinal]. Our testing data is comprised of both large area rolled fingerprint images taken from NIST SD4 and NIST SD14 (similar to the training data) and small area slap fingerprint images from FVC 2004 DB1 A.

7.1 Nist Sd4 & Nist Sd14

The NIST SD4 and NIST SD14 databases are both comprised of rolled fingerprint images (Fig. 9). Due to the number of challenging fingerprint images contained in both datasets (even for commercial matchers), they continue to be popular benchmark datasets for automated fingerprint recognition algorithms. NIST SD4 is comprised of 2,000 unique fingerprint pairs (total of 4,000 images), evenly distributed across the 5 fingerprint types (arch, left loop, right loop, tented arch, and whorl). NIST SD14 is a much larger dataset comprised of 27,000 unique fingerprint pairs. However, in most papers published on fingerprint search, only the last 2,700 pairs from NIST SD14 are utilized for evaluation. To fairly compare DeepPrint with previous approaches, we also use the last 2,700 pairs of NIST SD14 for evaluation.

7.2 Fvc 2004 Db1 A

The FVC 2004 DB1 A dataset is an extremely challenging benchmark dataset (even for commercial matchers) for several reasons: (i) small overlapping fingerprint area between fingerprint images from the same subject, (ii) heavy non-linear distortion, and (iii) extremely wet and dry fingers (Fig. 9). Another major motivation for selecting FVC 2004 DB1 A as a benchmark dataset is that it is comprised of slap fingerprint images. Because of this, we are able to demonstrate that even though DeepPrint was trained on rolled fingerprint images similar to NIST SD4 and NIST SD14, our incorporation of domain knowledge into the network architecture enables it to generalize well to slap fingerprint datasets.

8 COTS Matchers

In most all of our experiments, we benchmark DeepPrint against COTS A and COTS B (Verifinger 10.0 or Innovatrics v7.2.1.40, the latest version of the SDK as of July, 2019). Due to our Non-disclosure agreement, we cannot provide a link between aliases COTS A and COTS B and Verifinger or Innovatrics. Both of these SDKs provide an ISO minutia-only template as well as a proprietary template comprised of minutiae and other features. To obtain the best performance from each SDK, we extracted the more discriminative proprietary templates. The proprietary templates are comprised of minutiae and other features unknown to us. We note that both Verifinger and Innovatrics are top performers in the NIST and FVC evaluations [nist, fvc].

9 Benchmark Evaluations

We begin our experiments by comparing the DeepPrint search performance to the state-of-the-art fixed-length representations reported in the literature. Then, we show that the DeepPrint representation can also be used for state-of-the-art authentication by benchmarking against two of the top COTS fingerprint matchers in the market. We further show that this authentication can be performed in the encrypted domain using fully homomorphic encryption. Finally, we conclude our experiments by benchmarking the large-scale search accuracy of the DeepPrint representation against the same two COTS search algorithms.

9.1 Search (1:N Comparison)

Our first experimental objective is to demonstrate that the fixed-length DeepPrint representation can compete with the best fixed-length representations reported in the academic literature [index1, index4] in terms of its search accuracy on popular benchmark datasets and protocols. In particular, we compute the Rank-1 search accuracy of the DeepPrint representation on both NIST SD4 and the last 2,700 pairs of NIST SD14 to follow the protocol of the earlier studies.

The results, reported in Table IV, indicate that the DeepPrint representation is competitive with the most accurate search algorithm previously published in [index4] (slightly lower performance on NIST4 and slightly higher on NIST14). However, we also note that the existing benchmarks (NIST SD4 and NISTSD14) for fingerprint search have now become saturated, making it difficult to showcase the differences between published approaches. Therefore, in subsequent experiments, we better demonstrate the efficacy of the DeepPrint representation by evaluating against a background of 1.1 million fingerprints (instead of the in existing benchmarks).

We highlight once again that DeepPrint has the smallest template among state-of-the-art fixed length representations (200 bytes vs 1,024 bytes for the next smallest).

The search performance on FVC 2004 DB1 A is not reported, since the background is not of sufficient size (only 700 slap prints) to provide any meaningful search results.

9.2 Authentication

We benchmark the authentication performance of DeepPrint against two state-of-the-art COTS minutiae-based matchers, namely COTS A and COTS B. We note that none of the more recent works on fixed-length fingerprint representation [index1, index2, index3, index4] have considered authentication performance, making it difficult for us to compare with these approaches (to the best of our knowledge, the code for these methods is not open-sourced).

  • Sum score fusion is used.

  • TAR @ FAR of 0.1% is reported since there are only 4,950 imposter pairs in the FVC protocol.

Table V: Authentication Accuracy (FVC 2004 DB1 A)
TAR @ FAR = 0.01%
TAR @ FAR = 0.01%
99.70 99.89
97.80 97.85
Cao et al. [cao]1
96.75 95.96
97.90 98.55
DeepPrint +
98.70 99.0
  • Minutiae extracted from DeepPrint minutiae-map () and fed directly into minutiae matcher proposed in [cao].

Table VI: Authentication Accuracy (Rolled-Fingerprints)

From the experimental results (Tables V and VI), we note that DeepPrint outperforms COTS B on all benchmark testing protocols. We further note that DeepPrint outperforms both COTS A and COTS B on the very challenging FVC 2004 DB1 A (Fig. 9). The ability of DeepPrint to surpass COTS A and COTS B on the FVC slap fingerprint dataset is a very exciting find, given the DeepPrint network was trained on rolled fingerprint images which are comprised of very different textural characteristics than slap fingerprint impressions (Fig. 9). In comparison to rolled fingerprints, slap fingerprints often (i) require more severe alignment, (ii) can contain heavier non-linear distortion, (iii) and are much smaller with respect to impression area. We posit that our injection of domain knowledge (both alignment and minutiae detection) into the DeepPrint architecture help it to generalize well from the rolled fingerprints it was trained on to the slap fingerprints comprising FVC 2004 DB1 A. We demonstrate this further in a later ablation study.

9.2.1 Fusion with Minutiae-Matchers

Another interesting result with respect to the DeepPrint authentication performance is that of the score distributions. In particular, we found that minutiae-based matchers COTS A and COTS B have very peaked imposter distributions near 0. Indeed, this is very typical of minutiae-matchers. In contrast, DeepPrint, has a peaked genuine distribution around 1.0, and a much flatter imposter distribution. In other words, COTS is generally stronger at true rejects, while DeepPrint is stronger at true accepts. This complementary phenomena motivated us to fuse DeepPrint with minutiae-based matchers to further improve their authentication performance (Table V). Indeed, our results (Table V) indicate that the DeepPrint representation does contain features complementary to minutiae-based matchers, given the improvement in authentication performance under score level fusion. We note that since a DeepPrints score can be computed with only 192 multiplications and 191 additions, it requires very little overhead for existing COTS matchers to integrate the DeepPrint representation into their matcher.

9.2.2 Secure Authentication

In addition to being competitive in authentication accuracy with state-of-the-art minutiae matchers, the fixed-length DeepPrint representation also offers the distinct advantage of matching in the encrypted domain (using FHE). Here we verify that the DeepPrint authentication accuracy remains intact following encryption. We also benchmark the authentication speed in the encrypted domain. Our empirical results (Table VII) demonstrate that the authentication accuracy remains nearly the same following FHE, and that authentication between a pair of templates takes only 1.26 milliseconds in the encrypted domain.

Algorithm NIST SD42 NIST SD142 FVC 2004 DB1 A3
+ FHE1
  • Fully homomorphic encryption is utilized (match time: 1.26 ms).

  • TAR @ FAR = 0.01%.

  • TAR @ FAR = 0.1%

Table VII: Encrypted1 Authentication using DeepPrint Representation

10 Large Scale Search

Perhaps the most important attribute of the compact DeepPrint representation is its ability to perform extremely fast fingerprint search against large galleries. To adequately showcase this feature, we benchmark the DeepPrint search accuracy against COTS A and COTS B on a gallery of over 1.1 million rolled fingerprint images. The experimental results show that DeepPrint is able to obtain competitive search accuracy with the top COTS algorithm, at orders of magnitude faster speeds. Note, we are unable to benchmark other recent fixed-length representations in the literature against the large scale background, since code for these algorithms has not been open-sourced.

Figure 10: Closed-Set Identification Accuracy of DeepPrint (with and without Product Quantization (PQ)) on NIST SD4 and NIST SD14 (last 2,700 pairs) supplemented with a gallery of 1.1 Million. Rank-1 Identification accuracies are 95.15% and 94.44%, respectively. Search time is only 160 milliseconds. After adding product quantization, the search time is reduced to 51 milliseconds and the Rank-1 accuracies only drop to 94.8% and 94.2%, respectively.
Search Accuracy
Search Accuracy
Search Time
  • Search times benchmarked on an Intel Core i9-7900X CPU @ 3.30GHz

  • COTS only used for re-ranking the top 500 DeepPrint candidates.

  • COTS used to perform search against the entire 1.1 million gallery.

Table VIII: DeepPrint + Minutiae Re-ranking Search Accuracy (1.1 million background)
Rank 1
Search Accuracy
Search Accuracy
Search Time
DeepPrint + PQ
  • Search times benchmarked on an Intel Core i9-7900X CPU @ 3.30GHz

Table IX: DeepPrint + PQ: Search Accuracy (1.1 million background)
FVC 2004 DB1A
FAR = 0.1%
FAR = 0.01%
FAR = 0.01%
  • Each representation (96 bytes) is extracted from one branch in the DeepPrint architecture.

  • Scores from the minutiae representation are fused with the texture representation using sum score fusion.

Table X: DeepPrint Representation Comparison
Metric w/o all with alignment
with alignment
+ domain knowledge
FVC 2004 DB1A
FAR = 0.1%
FAR = 0.1%
FAR = 0.1%
Table XI: DeepPrint Ablation Study

10.1 DeepPrint Search

First, we show the search performance of DeepPrints using a simple exhaustive search technique previously described. In particular, we match a probe template to every template in the gallery, and select the candidates with the highest similarity scores. We use the NIST SD4 and NIST SD14 databases in conjunction with a gallery of 1.1 million rolled fingerprints. Under this exhaustive search scheme, the DeepPrint representation enables obtaining Rank-1 identification accuracies of 95.15% and 94.44%, respectively (Table IX) and (Fig. 10). Notably, the search time is only 160 milliseconds. At Rank-100, the search accuracies for both datasets cross over 99%. In our subsequent experiments, we demonstrate how we can re-rank the top candidates to further improve the Rank-1 accuracy with minimal cost to the search time.

10.2 Minutiae Re-ranking

Using the open-source minutiae matcher proposed in [cao], COTS A and COTS B, we re-rank the top- candidates retrieved by the DeepPrint representation to further improve the Rank-1 identification accuracy. Following this re-ranking, we obtain competitive search accuracy as the top COTS SDK, but at significantly faster speeds (Table VIII).

10.3 Product Quantization

We further improve the already fast search speed enabled by the DeepPrint representation by performing product quantization on the templates stored in the gallery. This reduces the DeepPrint template size to only 64 bytes and reduces the search speed down to 51 milliseconds from 160 milliseconds with only marginal loss to search accuracy (Table IX) and (Fig. 10).

Figure 11: Illustration of DeepPrint interpretability. The first row shows three example fingerprints from NIST SD4 which act as inputs to DeepPrint. The second row shows which pixels the texture branch is focusing on as it extracts its feature representation. Singularity points are overlaid to show that the texture branch fixates primarily on regions surrounding the singularity points. The last row shows pixels which the minutiae branch focuses on as it extracts its feature representation. We overlay minutiae to show how the minutiae branch focuses primarily on regions surrounding minutiae points. Thus, each branch of DeepPrint extracts complementary features which comprise more accurate and interpretable fixed-length fingerprint representations than previously reported in the literature.

11 Ablation Study

Finally, we perform an ablation study to highlight the importance of (i) the automatic alignment module in the DeepPrint architecture and (ii) the minutiae-map domain knowledge added during training of the network. In our ablation study, we report the authentication performance of DeepPrint with/without the constituent modules.

We note that in all scenarios, the addition of domain knowledge improves authentication performance (Tables X and XI). This is especially true for the FVC 2004 DB1 A database which is comprised of slap fingerprints with different characteristics (size, distortion, conditions) than the rolled fingerprints used for training DeepPrint. Thus we show how adding minutiae domain knowledge enables better generalization of DeepPrint to datasets which are very disparate from its training dataset. We note that alignment does not help in the case of NIST SD4 and NIST SD14 (since rolled fingerprints are already mostly aligned), however, it significantly improves the performance on FVC 2004 DB1 A where fingerprint images are likely to be severely unaligned.

We also note that the minutiae-based representation and the texture-based representation from DeepPrint are indeed complementary, evidenced by the improvement in accuracy when fusing the scores from both representations. (Table X).

12 Interpretability

As a final experiment, we demonstrate the interpretability of the DeepPrint representation using the deconvolutional network proposed in [visualizing]. In particular, we show in Fig. 11 which pixels in an input fingerprint image are fixated upon by the DeepPrint network as it extracts a representation. From this figure, we make some interesting observations. In particular, we note that while the texture branch of the DeepPrint network seems to only focus on texture surrounding singularity points in the fingerprint (core points, deltas), the minutiae branch focuses on a larger portion of the fingerprint in areas where the density of minutiae points are high. This indicates to us that our guiding the DeepPrint network with minutiae domain knowledge does indeed draw the attention of the network to minutiae points. Since both branches focus on complementary areas and features, the fusion of the representations improves the overall matching performance (Table X).

13 Conclusion

We have presented the design of a custom deep network architecture, called DeepPrint, capable of extracting highly discriminative fixed-length fingerprint representations (200 bytes) for both authentication (1:1 fingerprint comparison) and search (1:N fingerprint comparison). We showed how alignment and fingerprint domain knowledge could be added to the DeepPrint network architecture to significantly improve the discriminative power of its representations. Then, we benchmarked DeepPrint against two state-of-the-art COTS matchers on a gallery of 1.1 million fingerprints, and showed competitive search accuracy at significantly faster speeds (300 ms vs. 27,000 ms against a gallery of 1.1 million). We also showed how the DeepPrint representation could be used for matching in the encrypted domain via fully homomorphic encryption. We posit that the compact, fixed-length DeepPrint representation will significantly aid in large-scale fingerprint search. Among the three most popular biometric traits (face, fingerprint, and iris), fingerprint is the only modality for which no state-of-the-art fixed-length representation is available. This work aims to fill this void.