Minutiae are the premier features in most fingerprint matching systems . Extracting minutiae from rolled/slap fingerprints has been studied for many years, and acquires reliable results. However the accuracy degrades significantly in latent fingerprints because of fuzzy ridges and complex background noises. Latent fingerprints are obtained directly from crime scenes and their minutiae are manually marked by experts. It is of great significance to obtain valuable information from the fingerprints left on the scene. Extensive research has been undertaken on latent fingerprints in various fields. Table. 1 summarizes some studies on latent fingerprints in recent years.
|Segmentation||Ruangsakul et al. ||Rearranged fourier subbands||NIST SD27|
|Choi et al. ||Combined local ridge frequency and orientation||NIST SD27 and WVU|
|Orientation||Cao et al. ||convolutional neural network||NIST SD27|
|Enhancement||Cao et al. ||ridge structure dictionary||NIST SD27 and WVU|
|Extraction||Sankaran et al. ||
Stacked denoising sparse autoencoders
|Tang et al. ||Fully convolutional network||NIST SD27|
|Matching||Jain et al. ||Local and global matching with extended features||NIST SD27|
|Paulino et al. ||Descriptor-based hough transform||NIST SD27 and WVU|
So far, minutiae extraction methods can be divided into two categories. One is traditional method using handcrafted features designed by domain knowledge. Ratha et al.  followed the simple idea of ridge extraction, thinning and minutia extraction. Gao et al.  extracted minutiae on Gabor phase, which means the fingerprints have been enhanced with Gabor filters to overcome the influence of creases and noises. But in latent fingerprints, handcrafted features are difficult to adapt to complex background variance. Another is deep learning method which learns features from data automatically. Sankaran et al. 
used stacked denoising sparse autoencoders to learn features to classify minutiae and non-minutiae patches. Tang et al. regarded minutiae extraction as an object detection task. They extracted minutiae from a learned fully convolutional network. However the domain knowledge in fingerprints is not considered in these methods, such as the basic hypothesis of 2D amplitude and frequency modulated (AM-FM) signal . Transferring the network learned in natural images to fingerprints seemed to limit their performances.
Our basic idea is to combine domain knowledge and deep learning representation ability. Some researchers designed special structures with domain knowledge in specific areas, such as smoothing, denoising, inpainting and color interpolation. Liu et al.
transformed infinite impulse response filters into recurrent neural networks and learned the weights by a deep convolutional neural network. They achieved promising results through a simpler and faster network. Ren et al. demonstrated that the translation variant interpolation can not be simply modeled by a single kernel due to the inherent spatially varying property, so they designed a Shepard interpolation layer as translation variant operations for inpainting.
In this paper, a new way is proposed to guide the network’s structure design and weight initialization combing both traditional methods and deep convolutional networks. We demonstrate that the minutiae extraction pipeline consisting of orientation estimation, segmentation, Gabor enhancement and extraction is equivalent to a simple network with fixed weights, thus their representation ability is limited and they can’t learn complex background noise from latent fingerprints. Naturally, the simple network is then expanded with some convolutional layers to enhance its representation ability, and the weights are released to learn complex background from data. The specially designed network for fingerprints is called FingerNet. Benefiting from our design idea, the mechanism of FingerNet can be understood and typical fingerprint representations including orientation field, segmentation and enhancement can be acquired during minutiae extraction.
Considering the lack of training labels of orientation or segmentation, weak labels are generated based on the matching of latent fingerprints and corresponding rolled/slap fingerprints. Fig. 1 shows a sample for orientation estimation, segmentation, enhancement and extraction on a latent fingerprint. We also get promising performance on good quality fingerprints like FVC 2004 database .
The key contributions of this paper are as follows:
A new way to guide the deep network’s structure design and weight initialization to combine domain knowledge and the representation ability of deep learning, while preserving end-to-end differentiability.
A novel network for fingerprints called FingerNet is proposed. Typical fingerprint representations including orientation field, segmentation, enhancement and minutiae can be acquired from the unified network.
Reliable minutiae have been extracted on both rolled/slap and latent fingerprints automatically without any fine tuning.
One way to generate weak labels to latent fingerprints from the matched rolled/slap fingerprints, which helps to achieve modular training.
2 Proposed FingerNet for Minutiae Extraction
The basic idea is to build a specific network for fingerprints, which integrates the essence of traditional handcrafted methods (domain knowledge) and the representation ability of deep learning. We first transform some traditional methods to convolutional kernels and integrate them as a shallow network. This network is shallow and has fixed weights. The entire procedure integrating normalization, orientation estimation, segmentation, Gabor enhancement and minutiae extraction is visible in Fig. 2. Next, we discuss how to expand the plain network to a complete trainable network for fingerprints. We then describe weak label, loss definition and training procedure in detail.
2.1 Traditional Methods to Equivalent ConvNets
Traditional fingerprint minutiae extraction pipeline can be summarized as: normalization, orientation estimation, segmentation, enhancement and minutiae extraction. Here, we transform several classical methods and construct a ConvNet, called plain FingerNet as an example. The plain FingerNet pipeline and connection relationships can be seen in Fig 2.
It should be noted that all the operators in this article are pixel-wise operators and differentiable.
One pixel-wise method  adjusts the intensity value of each pixel to a same scale as,
where is the intensity value at pixel in input image , and are the image mean and variance and and are the desired mean and variance after the normalization.
2.1.2 Orientation Estimation
By replacing gradient computation and sum of windowed value with convolutional operations, the gradient-based orientation estimation method  computing ridge orientation can be transformed as,
where and are the and gradients computed through Sobel masks and , indicates a convolutional operator, is an all-ones matrix with size of , calculates the arc tangent of the two variables y and x with consideration of their quadrant and is the output orientation field .
It is actually a shallow ConvNet with 3 handcrafted kernels, a few merge layers and complex activation layers.
One learning-based segmentation method  trains a linear classifier based on handcrafted features like gradient coherence, local mean, and local variance. This method can be computed as,
where is the length of local window, and are the classifier’s parameters and indicates concatenation on channel dimension.
It is also a shallow ConvNet, and shares with orientation estimation part as defined in Eq. 2 .
Gabor enhancement  is widely used in fingerprint recognition systems because of its frequency-selection characteristic. The complex Gabor filter is generated from local ridge frequency and local ridge orientation , then convolution operations are conducted on local fingerprint block. The enhanced complex block can be described as follows.
For each pixel in block ,
where and are the amplitude and phase of the enhanced complex block. And is taken as the final enhanced results.
The hardest part to transform these operations is that Gabor filters do not share weights on the whole image, but share on image blocks with same and . To solve this problem, we propose a selective convolution method.
Firstly, parameters are discretized into N different intervals and Gabor filters are generated respectively. Then a group of filtered complex images can be obtained by convolving with these Gabor filters.
where denotes the intensity value at pixel in the -th filtered complex image. The grouped phases are the argument of group filtered images.
A mask is generated to select proper enhanced blocks from the grouped phases. The th value at pixel in the mask is defined as,
Finally, the enhanced map can be calculated as,
This selective convolution can still be classified as a kind of ConvNet, since all operations are differentiable.
2.2 Expand to FingerNet
Plain FingerNet mentioned in Section 2.1 can achieve a fair result on rolled/slap fingerprints, since it’s typically designed for it. However it failed on latent fingerprints. It is not because the properties of fingerprints have changed when it comes to latent images, but the algorithms used to get those properties fail. This is caused by the contradiction between complex background noises and shallow ConvNet structures with poor expressive power.
Naturally, the simple network is then expanded with some convolutional layers to enhance its representation ability, and the weights are released to learn complex background variance from data. Since the weights are initialized from simple network, the complete FingerNet won’t perform worse.
The detailed architecture of FingerNet is shown in Fig 3. Next we discuss how to expand the plain network.
We directly adopt pixel-wise normalization mentioned in Section 2.1.1 as our very beginning layer after image input.
2.2.2 Orientation Estimation
The deeper version of orientation estimation includes multi-scale feature extraction and orientation regression.
Basic feature extraction part has 3 conv-pooling blocks. Each conv-pooling block contains a pooling layer after a few convolutional blocks, while each conv block is made of a conv layer followed by a BatchNorm  layer and a PReLU  layer.
After that, a parallel orientation regression is carried on each scale feature maps and fused at last as the final estimation. Inspired by 
, we let FingerNet directly predict the probabilities of-discrete angles for each input pixel. The predicted angles at
can be represented as a N-dimensional vector, where the -th element indicates the probability of ridge orientation value of this position to be .
By doing so, we may get the final orientation output by either selecting a maximum response or averaging to a more robust estimate  as,
where is the averaging ridge orientation vector and can be computed as,
As mentioned in Section 2.1.3, learning based segmentation shares some features with orientation estimation. Hence for deeper version, we directly let it share the entire multi-scale feature maps with orientation estimation part.
As for the classifier, we use a multi layer perception to predict the probability of each input pixel to be the region of interest, and output a segmentation score map with size of .
We directly adopt Gabor enhancement method mentioned in Section 2.1.4 as enhancement part for FingerNet. Considering ridge frequency in fingerprint is usually stable, we set ridge frequency to a fixed value and discretize ridge orientation to N intervals.
Different from plain FingerNet, the orientation distribution map is already an orientation mask. So the mask is multiplied directly by grouped phases. We just upsampled the orientation distribution map by the factor of 8 to fit the size of enhanced map.
2.2.5 Minutiae Extraction
Minutiae extraction part takes enhancement output together with segmentation score map as input and conduct 3 conv-pooling blocks as feature extraction. Then we generate 4 different maps for minutiae extraction.
The first map is minutiae score map, which represents the probability of each position to have a minutiae. Its size is .
The second and third maps are and probability map. Since we only predict minutiae score map every 8 pixels, this position regression is essential to precise minutiae extraction. Inspired by , we conduct a 8 disperse location prediction respectively for and on each input feature point.
The fourth map is minutiae angle distribution map. It is completely the same as orientation distribution map, but the max angle value is changed from 180 to 360..
A minutiae list can be easily obtained by filtering minutiae score map with a proper threshold value. The precise location is acquired by adding the offset, which is the argument of the maximum and probability. The angle of minutiae is calculated using Eq. 11 or Eq. 12.
Since predicted minutiae may gather around, we use Non-maximum suppression(NMS) to clip redundant minutiae.
2.3 Label, Loss and Training
2.3.1 Weak, Strong and Ground Truth Label
There is rare available labeled data of fingerprint orientation field or segmentation. Considering most latent databases are matched with rolled/slap fingerprints, we form 3 kinds of labels with different confidence.
Weak orientation labels are generated from matched rolled/slap fingerprints. The matched pairs are from the same finger and share the same ridge structure. The aligned rolled/slap fingerprints’ orientation fields are fairly good estimations for corresponding latent fingerprints. We use minutiae to align fingerprint pairs and plain FingerNet to obtain rolled/slap fingerprints’ orientation fields.
Weak segmentation labels are generated from minutiae convex hulls. The dilated and smoothed minutiae convex hulls are used as weak segmentation labels.
Since unoriented minutiae directions are the same as corresponding orientation fields, we take unoriented minutiae directions manually marked as our strong orientation labels.
Ground truth labels indicate 4 minutiae maps mentioned in Section 2.2.5, which are transformed from manually marked minutiae list.
To measure the distance between angles and handle the discontinuity around , we use inverted gaussian angle as label. The label for position with angle can be computed as,
is the probability value of a gaussian distribution with mean 0 and varianceat . is the max angle value, which is 180 for orientation and 360 for minutiae direction. With these kinds of labels, closer angles have smaller cross entropy loss and angles with same directional distance have same cross entropy loss.
2.3.2 Loss Definition and Training Procedure
The loss cluster is shown in Fig. 4. The total loss is a weighted sum of 9 different losses.
As shown in Fig. 4, there are only 3 different types of loss. The cross entropy loss is defined as,
where ROI is the region of interest, and are weights for positive and negative samples, and are the probability values at in label map and predicted map respectively. Since positive and negative labels are always unbalanced, we use and to balance their loss contributions.
Orientation coherence 
is a strong domain prior knowledge, so we turn it into a loss function to constrain the orientation distribution map. It can be calculated as,
where is an all-ones matrix with size of and is the orientation vector mentioned in Eq.15.
In order to make segmentation more smooth with less noises and outliers, we simply try to suppress the edge responses. It can be calculated as,
where is segmentation score map, is a laplace edge detection kernel and is the region of total image.
After model construction and data preparation, we conduct a two step training procedure. Firstly, let FingerNet learn ridge properties by training with orientation and segmentation losses. After a few epoches, we add minutiae losses. The idea is to let FingerNet learn step by step. Adam optimizer is adopted and other detailed parameter settings can be found in our open source FingerNet codes.
We compare minutiae extraction performance with other algorithms on different quality fingerprint databases to test FingerNet’s generalization ability. As can be seen from the following experiments, our unified FingerNet can calculate reliable orientation field, segmentation, enhanced fingerprint and minutiae without any fine tuning operation.
The training data was collected from crime scenes, including about 8000 pairs of matched rolled fingerprints and latent fingerprints. Each latent fingerprint is pixels in size and 500 pixels per inch (ppi) with expert marked minutiae. FingerNet is trained on this database and remains the same in the following experiments.
Our test experiments are conducted on NIST SD27  and FVC 2004 database set A . NIST SD27 contains 258 latent fingerprints with minutiae marked by experts. Each fingerprint is pixels in size and 500 ppi. FVC 2004 database contains 3600 rolled fingerprints. These fingerprints are also 500 ppi but different in image size.
3.2 Minutiae Extraction Performance
The performance of minutiae extraction is evaluated with Precision-Recall curve. Precision is defined as positive predictive value and recall is defined as true positive rate. An extracted minutia is assigned to be true if its distance to a manually labeled minutia is less than 15 pixels, and the angle between the two is less than . Furthermore, this is one to one match.
Fig. 5 compares the minutiae extraction performance with other methods on NIST SD27. MINDTCT is an open source minutiae extractor from NIST Biometric Image Software . Gabor-based algorithm  extracts minutiae on Gabor phase. AutoEncoder-based algorithm  extracts minutiae with a learned stacked denoising sparse autoencoder. FCN-based algorithm  extract minutiae with a learned fully convolutional network. VeriFinger  is a well-known commercial system used for minutiae extraction and fingerprint matching.
The mean error of location and angle are 4.4 pixels and 5.0 respectively. For segmentation results compared with weak labels, the true positive rate is 0.88 and true negative rate is 0.92. For orientation results compared with weak labels, the accuracy is 0.87 within . The recall can’t reach to 1 due to segmentation and non-maximum suppression. About 0.6 seconds is used on average to extract minutiae on NIST SD27.
Fig. 6 compares the minutiae extraction performance on FVC 2004 database. The mean error of location and angle are 3.4 pixels and 6.4 respectively.
3.3 Identification Performance
Fig. 7 shows Cumulative Match Characteristic curves on NIST SD27 to test whether fingerprint matching can benefit from FingerNet. The matching algorithm is based on extended clique models . Only minutiae are used in this method. The gallery contains about 40K fingerprints including NIST SD27, NIST SD4 and our in-house database. Result shows that FingerNet outperforms other methods.
4 Conclusion and Future Work
We propose a new way to guide the deep network’s structure design and weight initialization for combining domain knowledge and deep learning representation ability. We demonstrate the pipeline consisting of several typical traditional methods is equivalent to a simple network with fixed weights. The network is then expanded and the weights are released while preserving end-to-end differentiability. Following this idea, FingerNet is proposed for efficient and reliable minutiae extraction on both rolled/slap and latent fingerprints. This algorithm has combined domain knowledge and deep learning method to outperform other minutiae extraction algorithms.
Future work will include (1) integrating ridge frequency to the pipeline, (2) exploring more accurate segmentation algorithm, and (3) extending FingerNet to matching.
This work was supported by NSFC(61333015).
-  A. M. Bazen and S. H. Gerez. Segmentation of fingerprint images. In ProRISC 2001 Workshop on Circuits, Systems and Signal Processing, pages 276–280. Citeseer, 2001.
S. Bernard, N. Boujemaa, D. Vitale, and C. Bricot.
Fingerprint segmentation using the phase of multiscale gabor
The 5th Asian Conference on Computer Vision, Melbourne, Australia. Citeseer. Citeseer, 2002.
-  K. Cao and A. K. Jain. Latent orientation field estimation via convolutional neural network. In Biometrics (ICB), 2015 International Conference on, pages 349–356. IEEE, 2015.
-  K. Cao, E. Liu, and A. K. Jain. Segmentation and enhancement of latent fingerprints: A coarse to fine ridgestructure dictionary. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(9):1847–1859, 2014.
-  L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915, 2016.
-  H. Choi, M. Boaventura, I. A. Boaventura, and A. K. Jain. Automatic segmentation of latent fingerprints. In Biometrics: Theory, Applications and Systems (BTAS), 2012 IEEE Fifth International Conference on, pages 303–310. IEEE, 2012.
-  X. Fu, C. Liu, J. Bian, J. Feng, H. Wang, and Z. Mao. Extended clique models: A new matching strategy for fingerprint recognition. In Biometrics (ICB), 2013 International Conference on, pages 1–6. IEEE, 2013.
-  X. Gao, X. Chen, J. Cao, Z. Deng, C. Liu, and J. Feng. A novel method of fingerprint minutiae extraction based on gabor phase. In Image Processing (ICIP), 2010 17th IEEE International Conference on, pages 3077–3080. IEEE, 2010.
-  M. D. Garris and R. M. McCabe. Nist special database 27: Fingerprint minutiae from latent and matching tenprint images. National Institute of Standards and Technology, Technical Report NISTIR, 6534, 2000.
S. Gidaris and N. Komodakis.
Locnet: Improving localization accuracy for object detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 789–798, 2016.
-  I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013.
K. He, X. Zhang, S. Ren, and J. Sun.
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
-  L. Hong, Y. Wan, and A. Jain. Fingerprint image enhancement: Algorithm and performance evaluation. IEEE transactions on pattern analysis and machine intelligence, 20(8):777–789, 1998.
S. Ioffe and C. Szegedy.
Batch normalization: Accelerating deep network training by reducing
internal covariate shift.
Proceedings of The 32nd International Conference on Machine Learning, pages 448–456, 2015.
-  A. Jain, L. Hong, and R. Bolle. On-line fingerprint verification. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19(4):302–314, 1997.
-  A. K. Jain and J. Feng. Latent fingerprint matching. IEEE Transactions on pattern analysis and machine intelligence, 33(1):88–100, 2011.
-  M. Kass and A. Witkin. Analyzing oriented patterns. Computer vision, graphics, and image processing, 37(3):362–385, 1987.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  K. G. Larkin and P. A. Fletcher. A coherent framework for fingerprint analysis: are fingerprints holograms? Optics Express, 15(14):8667–8677, 2007.
-  S. Liu, J. Pan, and M.-H. Yang. Learning recursive filters for low-level vision via a hybrid neural network. In European Conference on Computer Vision, pages 560–576. Springer, 2016.
-  D. Maio, D. Maltoni, R. Cappelli, J. Wayman, and A. Jain. Fvc2004: Third fingerprint verification competition. Biometric Authentication, pages 31–35, 2004.
-  D. Maltoni, D. Maio, A. Jain, and S. Prabhakar. Handbook of fingerprint recognition. Springer Science & Business Media, 2009.
-  M. Oliveira and N. J. Leite. A multiscale directional operator and morphological tools for reconnecting broken ridges in fingerprint images. Pattern Recognition, 41(1):367–377, 2008.
-  A. A. Paulino, J. Feng, and A. K. Jain. Latent fingerprint matching using descriptor-based hough transform. IEEE Transactions on Information Forensics and Security, 8(1):31–45, 2013.
-  N. K. Ratha, S. Chen, and A. K. Jain. Adaptive flow orientation-based feature extraction in fingerprint images. Pattern Recognition, 28(11):1657–1672, 1995.
-  J. S. Ren, L. Xu, Q. Yan, and W. Sun. Shepard convolutional neural networks. In Advances in Neural Information Processing Systems, pages 901–909, 2015.
-  P. Ruangsakul, V. Areekul, K. Phromsuthirak, and A. Rungchokanun. Latent fingerprints segmentation based on rearranged fourier subbands. In Biometrics (ICB), 2015 International Conference on, pages 371–378. IEEE, 2015.
-  A. Sankaran, P. Pandey, M. Vatsa, and R. Singh. On latent fingerprint minutiae extraction using stacked denoising sparse autoencoders. In Biometrics (IJCB), 2014 IEEE International Joint Conference on, pages 1–7. IEEE, 2014.
-  Y. Tang, F. Gao, and J. Feng. Latent fingerprint minutia extraction using fully convolutional network. arXiv preprint arXiv:1609.09850, 2016.
-  S. VeriFinger. Neuro technology (2010).
-  C. Watson and C. Wilson. Nist special database 4. Fingerprint Database, National Institute of Standards and Technology, 17:77, 1992.
-  C. I. Watson, M. D. Garris, E. Tabassi, C. L. Wilson, R. M. Mccabe, S. Janet, and K. Ko. User’s guide to nist biometric image software (nbis). 2007.