A Deep Learning based Framework to Detect and Recognize Humans using Contactless Palmprints in the Wild

by   Yang Liu, et al.

Contactless and online palmprint identfication offers improved user convenience, hygiene, user-security and is highly desirable in a range of applications. This technical report details an accurate and generalizable deep learning-based framework to detect and recognize humans using contactless palmprint images in the wild. Our network is based on fully convolutional network that generates deeply learned residual features. We design a soft-shifted triplet loss function to more effectively learn discriminative palmprint features. Online palmprint identification also requires a contactless palm detector, which is adapted and trained from faster-R-CNN architecture, to detect palmprint region under varying backgrounds. Our reproducible experimental results on publicly available contactless palmprint databases suggest that the proposed framework consistently outperforms several classical and state-of-the-art palmprint recognition methods. More importantly, the model presented in this report offers superior generalization capability, unlike other popular methods in the literature, as it does not essentially require database-specific parameter tuning, which is another key advantage over other methods in the literature.



There are no comments yet.


page 13


Deep Learning Based Framework for Iranian License Plate Detection and Recognition

License plate recognition systems have a very important role in many app...

Attribute-Based Deep Periocular Recognition: Leveraging Soft Biometrics to Improve Periocular Recognition

In recent years, periocular recognition has been developed as a valuable...

A Convolutional LSTM based Residual Network for Deepfake Video Detection

In recent years, deep learning-based video manipulation methods have bec...

Deep TEN: Texture Encoding Network

We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encod...

A Frequency And Phase Attention Based Deep Learning Framework For Partial Discharge Detection On Insulated Overhead Conductors

Partial discharges are known as indicators of degradation of insulation ...

Families in the Wild (FIW): Large-Scale Kinship Image Database and Benchmarks

We present the largest kinship recognition dataset to date, Families in ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Automated personal identification using palmprint images has been widely studied and employed for a range of law-enforcement and e-security applications. However contactless palmprint identification is relatively new area of research and offers more attractive solution for the deployments as it can address serous concerns relating to the hygiene while offering higher convenience and user security. In addition, the contactless palmprint imaging also enables deformation free acquisition of palmprint features, or the ground truth information, which can enable higher matching accuracy than those using contact-based imaging.

Despite strong motivation and desire to develop contactless palmprint identification solutions, there are several challenges that needs to be addressed by researchers. Firstly, the palmprint matching accuracy degrades relatively for the contactless images as such palmprint images generally presents higher imaging variations. Therefore more advanced matching techniques needs to be developed to improve the matching accuracy from the contactless palmprint images. Secondly, the detection of contactless palmprint images (region of interest) from the presented hands is quite challenging as the background during such imaging is expected to be dynamic or less stable. Available research on contactless palmprint images addresses such challenges by acquiring contactless palmprint images with fixed background that can enable key point detection using pixel-wise operators to segment the palmprint images. Deep learning capabilities offer enormous potential to address these two challenges and are considered in this report.

In recent years, deep learning has emerged as the dominant approach for a range of computer vision related problems and has delivered state-of-the-art performance for object detection

[1, 2]

, face recognition

[3, 4], iris recognition [5] and image classification. However, unlike for the face recognition, there is almost nil attention to incorporate remarkable capabilities from the deep learning for palmprint identification and achieve superior performance than popular or state-of-the-art palmprint recognition methods.

This report proposes a new, deep learning based, contactless palmprint identification framework which not only offers accurate matching capabilities but also exhibits outstanding generalization capabilities on different public databases. With the design of effective residual feature network, our model can enlarge the receptive field [6] for matching contactless palmprint images and learn comprehensive palmprint features which generalizes very well on other databases. We develop a soft-shifted triplet loss function to accommodate frequent contactless palmprint imaging variations and offer meaningful supervision for learning effective palmprint features from limited size of training samples. We also introduce a contactless palm detector to automatically detect palm images, from the presented hands under complex backgrounds, and design of such palm detectors is critical for the success contactless palmprint identification during deployments.

The main contributions in this technical report can be summarized as follows: (a) We develop a new deep learning based contactless palmprint identification framework with high generalization capability for operating on different contactless palmprint databases that can represent diverse deployment scenarios. A new Soft-Shifted Triplet Loss (SSTL) function has been developed to successfully address the nature of contactless palmprint patterns for learning comprehensive palm features (please see more details in section 2.3). Our work therefore presents significant advances to bridge the gap between deep learning and contactless palmprint matching techniques available today; (b) Under fair comparison, our approach consistently outperforms several state-of-the-art methods on publicly available contactless palmprint databases. Even under challenging scenario without incorporating any parameter tuning on the target dataset, our model can still achieve superior or competing performance over the state-of-art methods that have been subjected to extensive parameter tuning. This report also demonstrates how the faster-R-CNN [1] architecture can be adapted to build an online palm detector, which can robustly detect palm images from the presented hands under complex backgrounds. Such advancement is highly desirable, in the current literature, for the success of online and contactless palmprint identification applications.

1.1 Related Work

Completely automated matching of contactless palmprint images has received lot of attention and a range of palmprint matchers have been introduced in the literature. Detected or segmented palm images can be characterized by major/minor curved lines and creases that can be observed from low resolution ( dpi) images and additional flexion ridges [7] that are observed from high resolution ( dpi, not focus of this work like for [8]) images. Therefore a range of texture matching methods have been introduced in the literature [9, 10, 11, 12, 13, 14]. Encoding palmprint features using the dominant orientation of lines/creases in [15, 16] is one of the most effective method for matching palmprint images. More recent work in matching contactless palmprint images appear in T-PAMI16 [12] where an ordinal measurement based descriptor, i.e., difference of normal (DoN), has shown to outperform a range of methods introduced for matching contactless palmprint images using publicly available databases. This approach benefits from the contactless palm image acquisition modeling and introduces specialized masks to encode projective ordinal measurements. Therefore, this method has also been used to ascertain the effectiveness of approach developed in this technical report and serves as a reasonable choice as other methods have not yet shown to offer superior performance than from [12] in the best of our knowledge.

Automated detection of palm images, or the region of interest from the hands presented by users, is inherently required for the success of automated palmprint identification systems. Most popular methods for palmprint detection are based on the extraction of key-points representing finger joints and extract a fixed region of interest relative to the orientation and/or the distance [17] between the key points. This approach works very well for the contact-based imaging setups but poses a range of problems for contactless palmprint images as its very difficult to robustly detect these key-points under background changes which are inherent during the contactless imaging even with the cooperative users attempting access. Therefore developed contactless palmprint databases [18, 19, 20] (in public domain) have been acquired using relatively fixed or stable background to primarily address the open problem of detecting palm images under user friendly contactless imaging setup. More work to detect contactless palmprint under contactless imaging is highly desirable and is also considered in our work.

1.2 Open Problems and Challenges

Despite promising performance indicated in the literature for matching palmprint images, conventional palmprint descriptors have several limitations. Summary of earlier work presented in [21] indicate that existing methods offer quite accurate performance but this performance needs to be further improved (especially on large contactless databases e.g. [20]) to meet expectations for a wide range of deployments. Conventional palmprint descriptors, such as CompCode [15] or DoN [12], RLOC [16] or Ordinal [13], are based on empirical models, which apply hand crafted filters for the generation of features. Therefore these models heavily rely on the parameter selection when incorporated for matching performance for other/different databases or those acquired under different imaging environments. This situation can also be observed from [12], where eight different combination of parameters or 4 different databases are employed by extensive tuning. Commonly employed techniques in the the palmprint literature [17, 22] for the automated detection of palm images, or the region of interest from the hands presented by users interested to access the system, often fails when the hand images are acquired under complex backgrounds. Such failure can be attributed to the nature of algorithms that relies on the detection of key-point using pixel-based operators that are dependent to differentiate gray-levels from skin and the background.

The deep learning based approaches have potential to address above outlined limitations, since the parameters in deep neural networks are not empirically set instead self-learned from the data and the deep learning architectures are known to offer high generalization capabilities. However, any direct application of such architectures, e.g.

[23], is expected to deliver limited performance or cannot match performance offered from state-of-art techniques such as those from [12]. This is due to the fact that new challenges emerge while incorporating typical deep learning architectures (e.g. CNN) for the palmprint recognition, which can primarily be attributed to the nature of palmprint patterns. Unlike the face, palmprint patterns are known to reveal little structured information or meaningful hierarchies. The palmprint texture is widely considered to be more accurate methods in the literature [12, 13, 14, 15, 16, 21] which mainly employed small sized filters or block based operators to extract palmprint features. Therefore, we can infer that the most discriminative information from palmprint patterns is extracted from the local intensity distributions in region of interest (palm) images rather than from (if any) global features. The CNNs are known to be effective in recovering features from low level to the high level, and from local to global, due to the combination of convolutional and fully connected layers [24]

. However as outlined earlier, the high level and global features extracted from such networks may not be optimal for the accurate matching of palmprint patterns.

This report attempts to develop a more accurate and robust deep learning based palmprint feature representation framework and makes significant contributions towards fully discovering the potential from the deep learning for the contactless palmprint identification. Such objectives are yet to be pursued in the literature. Different from [4, 5, 23], this technical report develops a novel deep network and customized loss function, which are highly optimized to extract discriminative palmprint features and has been comparatively evaluated with several state of art methods using multiple public contactless palmprint databases.

2 Matching Contactless Palmprint Images

2.1 Network Architecture

We develop a highly optimized deep learning architecture, referred to as residual feature network (RFN) in this report, for accurately matching contactless palmprint images. Different from the residual network [4], RFN does not have fully connected layers which results in pure feature map outputs (Figure 1.(a)) that can preserve spatial-correspondences with palmprint images. As illustrated in Figure 1

.(b), we replace all of the batch normalization layers

[25] with the instance normalization [26]. Our key motivation is to enhance the robustness of RFN in learning low/mid/high level features [27] as the contactless palmprint images present significant variations due just due to deformations [28] but also due to the pose and illumination changes [20].

Figure 1: Left: Detailed architecture of residual feature network for contactless palmprint matching. The RFN generates a single-channel feature map for each of the input images. The first and the second convolutional layer down-sample the input which results in the feature map that is of one quarter the size of input; Right: Training the RFN using a triplet-based network configuration.

2.2 Network Training

The convolutional kernels of RFN were trained using a triplet network [3]. As shown in Figure 1, this triplet network consists of three identical RFN’s and their weights are kept identical during the training. These RFN’s are inter-connected in parallel to enable the forward and backward propagation of the data and gradients for anchor, positive and negative samples respectively. The triplet loss function in such architecture is expected to help the network learn in generating the feature maps that can reduce the anchor-positive distances while increase the anchor-negative distances. Learning feature maps for the accurate matching of contactless palmprint images requires us to generate the network loss to accommodate frequent intra-class changes in the contactless palmprint images. We therefore soften the matching loss and improve the original loss function to accommodate frequent translational changes in the contactless palmprint images from the same class/subject. This new loss function is referred to as Soft Shifted Triplet Loss (SSTL) and is detailed in section 2.3.

2.3 Soft-Shifted Triplet Loss Function

The triplet networks [3] have been conventionally trained using the original loss function which can be written as follows:


where the function represents the embedding of the input image into a high dimensional feature space, is the number of triplet samples in a mini-batch, , and are the feature representations of anchor, positive and negative image samples in the -th triplet respectively. The symbol is equivalent to . is preset parameter to control the desired distance between anchor-positive and anchor-negative. For simplicity, we denote these three feature maps from the input , , as , , respectively.

Accurate matching of contactless palmprint requires us to match the segmented palmprint images, which generally depict high translational changes along the two axes. In order to accommodate such translations, we alter the original triplet loss function and such new loss function is referred to as the Soft-Shifted Triplet Loss(SSTL):


where represents the Minimum Shifted Loss(MSL). Any loss function to train the network should be differentiable along the shift directions and is detailed in the following.

Let us denote the width and height of the feature map from RFN by and respectively. We use and to define extent of maximum expected spatial shifts along the horizontal and vertical directions. The MSL is defined to accommodate frequent translational shifts in the input or segmented contactless palmprint images, as follows:


where represents the common region between two matched feature maps with valid (non-zero) values for each of the combinations while and denotes the spatial coordinates. The MSL in (3) attempts to compute the minimum distance between the two feature map that can be achieved after translation by and pixels along the horizontal and vertical directions respectively. The superscript in (4) denotes such translational operation on feature map and the resulting shifted feature map has following spatial correspondence with the original one:


is obtained by shifting the feature values to the left (horizontal translation) in a step of and is obtained by shifting the feature values upward (vertical translation) in a step of . As illustrated in (6), the void generated due to the translation of feature map values are automatically assigned as zeros. The training of RFN requires us to compute the gradients (or partial derivatives) of the soft shifted triplet loss, between the anchor-positive and anchor-negative feature maps. The resulting loss is back propagated iteratively during the network training. Let us firstly consider the loss between the feature map from the anchor and its respective positive feature map and compute its derivative for the one sample pair in the batch:


Since , above equation can be further simplified as


Let us firstly define the shifting offsets for the anchor-positive and anchor-negative image pairs that can meet requirements for MSL as follows:


The gradient of the distance in (8) can be computed from the following pixel-wise derivatives using (3) and (4):


The partial derivative of SSTL with respect to the positive feature map can be computed as follows:


We can similarly compute the required partial derivatives with respect to the negative feature map:


Our final requirement is to compute the partial derivatives for the feature map from anchor. It can be observed from (3)-(6) that the shifting or translation of the first map towards the left by pixels and towards the top by pixels is equivalent to shifting the second map towards the right by pixels and towards the bottom by pixels. We can therefore rewrite (4) as follows:


It is now quite straightforward compute the partial derivative for the anchor positive feature map using (7)-(10) and (13):


The rest of the back-propagation process is the same as for common end-to-end convolutional network. Above derivation shows that during the matching of feature maps, from the translated palmprint images, the gradients that only lie in the overlapped regions will be back-propagated. This can allow more accurate matching of feature maps from the contactless palmprint images that are not strictly aligned. The network is trained using SSTL while the MSL is used during the test or the evaluation phase.

3 Experiments and Results

We performed thorough experiments using publicly available databases to ascertain various aspects of the performance from our approach. In the following sections, we detail on the experimental protocols, along with the reproducible results [29], employed for the extensive evaluation of the model proposed in this report.

Figure 2: The ROC curves (Left) and CMC curves (Right) of different methods from the IITD Right contactless palmprint database.

Our experiments are firstly organized to ascertain within database performance (WithinDB) which uses some part of the database for the training while using some other independent part of this database for the performance evaluation. However, cross-database performance evaluation (CrossDB) is highly desirable to address limitations of currently available palmprint recognition methods in the literature. Therefore CrossDB performance evaluation results are also presented in this report which uses the network that is trained on some part of publicly available database while the test performance are reported using other or independent publicly available database with the respective protocols which have been used in the literature (to ensure fairness in the performance comparison). It should be noted that for both WithinDB and CrossDB configurations, training set and test set are totally separated, i.e., none of the palmprint images are overlapping between training set and the test set. Since our focus is more on extensive CrossDB performance evaluation, we incorporated the largest subjects database from 600 different subjects for this task as detailed in the next section.

IITD PolyU-IITD (600 subject) CASIA
Accuracy(%) EER(%) Accuracy(%) EER(%) EER(%)
DoN(TPAMI16) 99.15 0.68 98.3 0.329 0.53
RLOC 99.00 0.88 98.45 0.557 1.0
Competitive Code 99.85 1.0 98.45 0.435 0.76
Ordinal Code 98.92 1.25 98.48 0.451 0.79
Ours-CrossDB / / 98.6 0.267 0.51
Ours-WithinDB 99.20 0.60 98.7 0.153 /
Table 1: Summary of accuracy (average rank-one recognition rate) and equal error rate (EER) on three different contactless palmprint databases.

During our CrossDB performance evaluation, all of the test configurations uses the IITD Left [18] (all left hand palmprint images in this dataset) as the training set. The trained model is used for the performance evaluation using IITD Right (all the right hand palmprint images) which indicates WithinDB performance. During the WithinDB configuration, we used the left palmprint images for the training set and the right palmprint images for test or performance evaluation as it allows us to perform fair comparison, with the respective results from more recent approach DoN in TPAMI16 [12] which has shown outperforming results over several state of art methods. The same trained model which is trained using IITD left hand palmprint images is used for the Cross-DB performance evaluation using the largest subjects database made available from [20] and also using the CASIA contactless palmprint image database from [19]. Thus the CrossDB performance evaluation can illustrate the generalization capability of the proposed model when few or the training samples from other databases are incorporated. The WithinDB performance evaluation using [20], in addition to results from IITD [18], is also presented for comparative performance evaluation.

(a) (b) (c)
Figure 3: (a) Comparative ROC and (b) corresponding CMC for WithinDB and CrossDB tests (600 subjects). (c) The ROC for CrossDB evaluation for CASIA palmprint database [19] and other methods using respective/best parameters.

3.1 Databases and Protocols

IITD Palmprint Database. The IITD touchless palmprint database [18] provides contactless palmprint images from the right and left hands of 230 subjects. There are 5 samples for each right hand or left hand. This database also provides pixels segmented palmprint images. In our experiments, the left hand palmprint images are used to train our model detailed in section 2 and all the 1300 right hand palmprint images are used for the performance evaluation. This protocol for test performance evaluation is exactly the same as in [12] and results in 1150 genuine matches and 263,350 imposter matches. The comparative performance using ROC, CMC (Figure 2) and EER (Table 1) is presented to ascertain the performance. The ROC, EER and the average rank-one recognition rate achieved from our approach indicates outperforming results.

3.2 Cross-Database Performance Evaluation

Our CrossDB performance evaluation were firstly focused on more recent database made available from [22] as this contactless palmprint database [20] is acquired from 600 individuals which is the largest in the best of our knowledge. In our experiments, all the 6,000 palmprint images from the left hands were used for the test performance evaluation and the protocol is exactly the same as used for Figure 2 or the protocol used in [12]. Therefore, the test set for this CrossDB performance generated 6,000 genuine and 3,594,000 imposter matches. Figure 3 illustrates comparative ROC, CMC and respective EER is presented in Table 1. This figure also illustrates WithinDB performance which is achieved by training our model using other or all the right hand images for the same database. The results in Figure 2 (a)-(b) indicates that our model can achieve outperforming results and the performance is further improved for the WithinDB case or when the model trained from the right hand images for the same database is used for the performance evaluation.

Another contactless palmprint database available in public domain is from [19]. This CASIA palmprint database contains 5239 palmprint images from 301 individuals. We also employed this database for the CrossDB performance evaluation and used the model trained on IITD database (same as for results in Figure 2 or CrossDB in Figure 3) for the performance evaluation. Our all experiments on this CASIA database used the same matching protocol as used in [12] to ensure fairness in the comparison. Therefore as in [12], we also generated 13,692,466 match scores, which consisted of 20,567 genuine and 13,689,899 imposter match scores. Figure 3 (c) illustrates comparative ROC performance and Table 1 provides respective EER from the CrossDB performance. It is worth to underline that comparison is here with the same result as in [12], which uses heavy tuning of parameters while our results are on unseen or CrossDB evaluation protocol. Due to small number of images (only three) per subject, WithinDB evaluation was not performed and is of least interest. It can be observed from these results that in terms of EER our model performs better than best of results in [12] while the performance from Figure 3 is otherwise but quite competing.

3.3 Discussion

We also performed comparative performance evaluation from our method against other popular deep learning architectures that are widely used for various recognition tasks. The details on the such configurations considered for the performance evaluation is provided in the following.

  • CNN+Triplet Loss

    Pre-trained CNN based methods are most widely employed in the deep learning configurations for the recognition tasks [3, 30] and therefore also be interesting and worth evaluating. We select VGG-16 as our baseline test architecture which has achieved superior performance for many recognition problem and is widely used in other tasks. We replace the last fully connected class layer with another fully connected feature layer for matching the features. We froze the basic feature extraction layers in VGG-16 during the training phase and just fine-tuned the newly added fully connected layer using the given training dataset, i.e. IITD Left palmprint images in our experiment.

  • Fully convolutional network+Extended triplet loss

    The fully convolutional network (FCN) was originally developed for the semantic segmentation [31]. Recently [5] combines FCN and extended triplet loss (ETL) to achieve the state-of-art performance for the iris recognition task. Since this work also employs bit-shifting in the original triplet loss function, it is important to comparatively ascertain the performance from our model over this method.

  • Residual feature network(RFN)+Triplet loss

    Comparative evaluation has also been performed using the RFN used in our model and the original triplet loss function instead of the soft shifted triplet loss introduced in section 2.3. Such comparison is performed to ascertain the merit of SSTL for the problem considered in this report.

    Figure 4: The ROC curves for typical deep learning architecture in the literature using contactless palmprint database in [18].
  • DenseNet+Soft shifted triplet loss/triplet loss

    We also compared our method against a very popular deep learning architecture, densely connected convolutional network (DenseNet) which has shown to offer significant performance improvement over the state-of-the-art on many/most recognition tasks. In our experiments on palmprint image datasets, we use a basic DenseNet-BC structure with three dense blocks on input images and replace the last fully connected layer with one convolutional layer to perform SSTL. The initial convolution layer uses

    convolutions with stride of two.

The comparison with the other deep learning based methods was performed on IITD dataset, which we employed for WithinDB configuration with the same protocol as in Figure 2 or in [12]. All above discussed models were trained on the IITD Left palmprint images and evaluated using the IITD Right palmprint images. The test set generated 1150 genuine match scores and 263,350 imposter match scores which were consistent for the comparisons. The hyper-parameters for the training processes have been carefully investigated to achieve best performance. Comparative performances using ROC is presented in Figure 4 while comparative storage and matching complexity for these methods is summarized in Table 2.

It can be observed from Figure 4 that our newly developed architecture together with newly developed soft shifted loss outperforms other deep learning configurations. The CNN based configurations suggest that global and high level features extracted by CNN may not be suitable for the palmprint recognition problem. Relatively poor performance from (RFN + Triplet) illustrates that a soft matching term introduced by SSTL offers great benefit in addressing inherent variations in the feature map for the contactless palmprint identification. Comparative performance between (DenseNet + SSTL) and (DenseNet + Triplet) also supports such observation.

In all of our experiments, we train our network using Stochastic Gradient Descent (SGD) with standard backprop

[32, 33] and Adam [34]. We start with a learning rate of 0.001 and the models are randomly initialized. The pre-defined margin is set to 0.2. The maximum vertical shift size and horizontal shifting size are both fixed as 5. More details on triplet selection, our network(RFN), along with additional results on CrossDB performance are provided in supplementary file.

Number of
Matching Template size
CNN+Triplet loss 4̃49M 0.00745s 0.00140s

4096-d vector

DenseNet+SSTL 3̃.1M 0.0235s 0.049s map
DenseNet+Triplet loss 3̃.1M 0.0235s 0.00040s map
FCN+ETL 5̃68K 0.00142s 0.0710s map
RFN+Triplet loss 5̃.2M 0.0062s 0.00040s map
Proposed 5̃.2M 0.0062s 0.049s map
Table 2: The Comparison of time and space complexity of different contactless palmprint matching methods (evaluated on Linux Ubuntu 14.04 x86_64 with Quadro M6000 GPU under 10K average runs. The default shift size was set as 5 for SSTL).

4 Online Palmprint Identification

In earlier sections, we discussed on our approach for the development of trained model to match contactless palmprint images. The performance evaluation presented in section 3 used automatically segmented contactless palmprint images in respective public databases. The success of this matcher for online palmprint identification also requires a deep learning based palm detector that can automatically detect palmprint or the region of interest from the presented hands. Therefore we also developed a palm detector that can detect palmprints under complex background which is also considered as a challenge using the conventional methods of palmprint detection available in the literature.

4.1 Palmprint Detection

The online palmprint detector developed in our work is based on the Faster R-CNN introduced in [1]. It is composed of two modules. The first module is a deep fully convolutional network (CNN) that proposes possible object regions. The second module is a Fast R-CNN detector [2]

that uses these proposed regions to classify the palmprint ones. The entire system is a single, unified network for the palm detection, which employs the popular terminology of neural networks with ”attention” mechanisms. The regional proposal network (RPN) modules in the system updates Fast R-CNN module on the specific regions of interest to detect palm region. Tensorflow based

[35] implementation was incorporated for palmprint data and augmentation. Please see demo in [36] for online palmprint detection under complex indoor and outdoor backgrounds.

Figure 5: Sample images from our online system, running on a mobile laptop, depicting palmprint detection from hand images under complex backgrounds.

4.2 Palmprint Dataset for Training and Detection

We firstly acquired a set of videos under indoor and outdoor environment for developing the detector. These videos were acquired under 11 different environments with various postures and illuminations. The videos were then segmented at the interval of every 10 frames which resulted in a dataset of 3K raw palmprint images under varying backgrounds. This raw data is then augmented 10 times which resulted in a total of 30K palmprint images that were employed to train the palmprint detector.

4.3 Performance Evaluation

We trained the palmprint detection model with 20K epochs which required about 7 hours for convergence on a single NVIDIA Quadro M6000. The test phase of the trained model requires an average of 0.101 seconds to generate the proposal bounding box with 300 RPN outputs.

Experiments mAP recall
Overlap IOU threshold Overlap IOU threshold
0.35 0.5 0.6 0.35 0.5 0.6
strategy(a) 100.0 99.89 98.20 100.0 99.84 98.97
strategy(b) 100.0 98.44 86.45 100.0 98.78 90.50
Table 3: The mAP and recall value at different (IOU) threshold.

We also performed experiments to ascertain the performance during the test phase. These experiments are organized in two categories using the strategy and parameters: (a) Separate the dataset randomly in 0.9:0.1 ratio where 0.9 represents the fraction of data for training while and 0.1 represents remaining data for test/evaluation; (b) Separate the dataset by backgrounds, where 10 different background are mixed together to form the training data and the remaining background dataset is used for the test/evaluation. The first strategy tests randomly separate the dataset into 0.9:0.1 ratio while the second strategy selects one of different backgrounds as the test set to ascertain performance. Table 3 shows the values obtained for mean average precision(mAP), and recall for the all experiments performed. One can observed that the network generates higher accuracy even up-to 0.5 to 0.6 overlap of IOU [36] threshold. Slight degradation in accuracy is observed when overlap IOU threshold is more than 0.6. The exact sample size for tests using strategy (a) and (b) is 3517 and 4770 respectively.

5 Conclusions and Further Work

this report has developed a novel deep learning based contactless palmprint feature representation model, which can offer superior matching accuracy and high generalization capability for matching contactless palmprint images. We designed a soft-shifted triplet function to enable effective supervision in learning comprehensive and spatially corresponding residual features using fully convolutional network. Further extension of this work should focus on jointly evaluating the contactless palmprint detection and identification performance. Such evaluation requires new/public video dataset, from hands under complex backgrounds, from large number of subjects and is also part of our further work.


  • [1] Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6) (2017) 1137
  • [2] Girshick, R.: Fast r-cnn. In: IEEE International Conference on Computer Vision. (2015) 1440–1448
  • [3] Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering.

    In: IEEE Conference on Computer Vision and Pattern Recognition. (2015) 815–823

  • [4] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016) 770–778
  • [5] Zhao, Z., Kumar, A.: Towards more accurate iris recognition using deeply learned spatially corresponding features. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. (2017) 22–29
  • [6] Li, Y., Zhang, X., Chen, D.: Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. arXiv preprint arXiv:1802.10062 (2018)
  • [7] Ashbaugh, D.R.: Palmar flexion crease identification. J. For. Ident 41(4) (1991) 255–273
  • [8] Dai, J., Zhou, J.: Multifeature-based high-resolution palmprint recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(5) (2011) 945–957
  • [9] Michael, G.K.O., Connie, T., Teoh, A.B.J.: Touch-less palm print biometrics: Novel design and implementation. Image and Vision Computing 26(12) (2008) 1551–1560
  • [10] Ribaric, S., Fratric, I.: A biometric identification system based on eigenpalm and eigenfinger features. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11) (2005) 1698–1709
  • [11] Morales, A., Ferrer, M.A., Kumar, A.: Towards contactless palmprint authentication. IET Computer Vision 5(6) (2011) 407–416
  • [12] Zheng, Q., Kumar, A., Pan, G.: A 3d feature descriptor recovered from a single 2d palmprint image. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(6) (2016) 1272–1279
  • [13] Sun, Z., Tan, T., Wang, Y., Li, S.: Ordinal palmprint representation for personal identification. 2005. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  • [14] Zhang, L., Li, L., Yang, A., Shen, Y., Yang, M.: Towards contactless palmprint recognition: A novel device, a new benchmark, and a collaborative representation based identification approach. Pattern Recognition 69 (2017) 199–212
  • [15] Kong, A.K., Zhang, D.: Competitive coding scheme for palmprint verification. In: Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Volume 1., IEEE (2004) 520–523
  • [16] Jia, W., Huang, D.S., Zhang, D.: Palmprint verification based on robust line orientation code. Pattern Recognition 41(5) (2008) 1504–1513
  • [17] Zhang, D., Kong, W.K., You, J., Wong, M.: Online palmprint identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9) (2003) 1041–1050
  • [18] : IITD Touchless Palmprint Database (ver 1.0). http://www.comp.polyu.edu.hk/~csajaykr/IITD/Database_Palm.htm Jan. 2014.
  • [19] : Casia palmprint database. http://biometrics.idealtest.org/ 2016.
  • [20] : PolyU-IITD Contactless Palmprint Images Database (version 3.0). http://www4.comp.polyu.edu.hk/~csajaykr/palmprint3.htm 2018.
  • [21] Zhang, D., Zuo, W., Yue, F.: A comparative study of palmprint recognition algorithms. ACM computing surveys (CSUR) 44(1) (2012)  2
  • [22] Wang, Y., Peng, L., Wang, S., Ding, X.: Contactless palm landmark detection and localization on mobile devices. Electronic Imaging 2016(7) (2016) 1–6
  • [23] Kumar, A., Wang, K.:

    Identifying humans by matching their left palmprint with right palmprint images using convolutional neural network.

    Proc. DLPR (2016)
  • [24] Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2014) 1891–1898
  • [25] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift.

    In: International Conference on Machine Learning. (2015) 448–456

  • [26] Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. CoRR, abs/1703.06868 (2017)
  • [27] Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, Springer (2014) 818–833
  • [28] Wu, X., Zhao, Q.: Deformed palmprint matching based on stable regions. IEEE Transactions on Image Processing 24(12) (2015) 4978–4989
  • [29] : Web link to download codes for reproducibility for the approach detailed in this report. http://www.comp.polyu.edu.hk/~csajaykr/RFN.rar 2018.
  • [30] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., et al.: Going deeper with convolutions, CVPR (2015)
  • [31] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2015) 3431–3440
  • [32] LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural computation 1(4) (1989) 541–551
  • [33] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088) (1986) 533
  • [34] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  • [35] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
  • [36] : Online palmprint detection under complex backgrounds. https://www.youtube.com/watch?v=003YHoX0llg 2018.