Segmentation-Aware and Adaptive Iris Recognition

by   Kuo Wang, et al.

Iris recognition has emerged as one of the most accurate and convenient biometric for the human identification and has been increasingly employed in a wide range of e-security applications. The quality of iris images acquired at-a-distance or under less constrained imaging environments is known to degrade the iris matching accuracy. The periocular information is inherently embedded in such iris images and can be exploited to assist in the iris recognition under such non-ideal scenarios. Our analysis of such iris templates also indicates significant degradation and reduction in the region of interest, where the iris recognition can benefit from a similarity distance that can consider importance of different binary bits, instead of the direct use of Hamming distance in the literature. Periocular information can be dynamically reinforced, by incorporating the differences in the effective area of available iris regions, for more accurate iris recognition. This paper presents such a segmentation-assisted adaptive framework for more accurate less-constrained iris recognition. The effectiveness of this framework is evaluated on three publicly available iris databases using within-dataset and cross-dataset performance evaluation and validates the merit of the proposed iris recognition framework.



There are no comments yet.


page 1

page 7


Direct attacks using fake images in iris verification

In this contribution, the vulnerabilities of iris-based recognition syst...

Toward Accurate and Reliable Iris Segmentation Using Uncertainty Learning

As an upstream task of iris recognition, iris segmentation plays a vital...

Stratified SIFT Matching for Human Iris Recognition

This paper proposes an efficient three fold stratified SIFT matching for...

The Impact of Preprocessing on Deep Representations for Iris Recognition on Unconstrained Environments

The use of iris as a biometric trait is widely used because of its high ...

Morton Filters for Superior Template Protection for Iris Recognition

We address the fundamental performance issues of template protection (TP...

Alignment Free and Distortion Robust Iris Recognition

Iris recognition is a reliable personal identification method but there ...

Simultaneous Iris and Periocular Region Detection Using Coarse Annotations

In this work, we propose to detect the iris and periocular regions simul...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Iris recognition has emerged as one of the most accurate, convenient and low-cost biometric modality to verify the identity of an individual. It is common knowledge [9][6] that iris patterns are known to be unique among different subjects, even among identical twins, and be easily acquired using low-cost cameras. Therefore iris recognition has been widely incorporated in the national ID programs for the benefit of citizens and effective e-governance. However, the constrained imaging requirements for such widely deployed conventional iris recognition systems, i.e., requirements for the subjects to stop, stand and stare at the iris sensors in the vicinity, poses severe limitations to incorporate iris recognition for the surveillance and forensics. Iris recognition under less-constrained or distantly acquired images has gained increasing importance in recent years. Iris image acquisition module widely use near-infrared (NIR) illumination, typically in the wavelength range of 700-900nm, which can reveal enhanced quality of iris texture under constrained imaging environment. However, with the increase in the standoff distances, the quality of acquired iris images significantly degrades. In such imaging scenarios the periocular information can play an increasingly important role for accurate personal identification. In recent years, periocular recognition has been receiving increasing attention for its promising performance under such less constrained imaging conditions [36][3]. The periocular region usually refers to the region around the eye, which preferably includes the eyebrow [33]. Such periocular near-infrared iris images, in particular, presents highly discriminative features for the person identification. Earlier work [36][3][25][2][43] in this area has validated that the periocular region is highly discriminative among different persons, and can be considered as an effective alternative or supplement to the face or iris recognition especially when the entire face or clear iris images are not available. This work is motivated to further such advances in the less-constrained iris recognition capabilities and introduces a new framework to more accurately and adaptively match less-constrained iris images.

I-a Related Work

This section presents a brief summary of earlier or related work. We firstly review the related work on iris recognition, followed by the periocular recognition in Section I-A2 which also includes promising references on the less-constrained iris recognition.

I-A1 Iris Recognition

Daugman [9]

proposed one of the most classic and popular approaches for the automated iris recognition which uses band-pass Gabor filters, on the segmented and normalized iris images, for the feature encoding. These filter responses, including the real-part and imaginary-part, are then binarized to generate

IrisCode which offers a compact and more robust feature representation. The Hamming distance between two IrisCodes is used as the dissimilarity score for verification. Based on [9], 1D log-Gabor filter was incorporated in [19]

to replace 2D Gabor filter for more efficient iris feature extraction. In 2007, a different approach

[23] using discrete cosine transform (DCT) was explored for analyzing frequency information from fixed-size image blocks and encoded binarized iris features. Miyazawa et al. [21]

propose another spatial-frequency domain approach using 2D discrete Fourier transforms (DFT) which offered promising results. In 2009, Sun and Tan

[44] employed the multi-lobe differential filter (MLDF), and referred to as the ordinal filters, which offered an alternative for the Gabor/log-Gabor filters in generating rich iris feature templates.

Iris recognition research has also attracted a variety of approaches to enhance segmentation accuracy for the acquired iris images and accurate segmentation is critical in enhancing reliability for the iris recognition. Some of the most widely employed iris segmentation algorithms are based on the integrodifferential operator [9] and circular Hough transforms [19] which are adapting for detecting iris and pupil circles from the near-infrared eye images. These methods perform well for the high-quality iris images but are quite known to be least reliable for the noisy images acquired under relaxed environments. Tan [37] proposed an iterative approach to coarsely cluster the iris and non-iris region pixels before applying the integrodifferential operator, and achieved higher reliability in segmenting the iris pixels from noisy iris images. Following the similar coarse-to-fine strategy, a competitive approach is detailed in [36] which makes use of the Random Walker algorithm [13] for coarsely locating the iris region, followed by a couple of gray-level statistics based operators to refine the boundaries. These operators have shown to enable pixel-level precision in the final output or the iris masks. More recent approaches include [41] which utilizes an improved total variation model to address accompanying noise and artifacts in less constrained iris images, and [10] which relies on the color/illumination correction along with the watershed transform for segmenting noisy iris images acquired under visible wavelength.

There has been quite limited work to exploit the potential from deep neural network capabilities for the iris recognition, especially while considering the tremendous popularity of deep learning for various computer vision tasks including for face recognition. An earlier attempt for deep representation of iris appears in

[20] in 2015, but such proposal was to detect presentation attacks, a two-class classification problem, instead of the iris recognition. A new approach using DeepIrisNet was investigated in [11] and used a deep learning-based framework for general iris recognition. This work is essentially a direct

application of typical convolutional neural networks (CNN) without many optimizations for the iris patterns. Another more recent work in

[14] has attempted to exploit a deep belief net (DBN) for iris recognition. Its core component, however, is the optimal Gabor filter selection, while the DBN is again an application on the IrisCode without iris-specific optimization. More recent work in [42] proposes a UniNet [42] employing the deep fully convolutional networks (FCN) [18]

to generate iris binary images and masks for Hamming distance calculation, which explores the substantial connections between iris recognition and deep learning. This work introduces a new loss function that incorporates conventional bit-shifting operations and masks in matching score computations, and achieves state-of-the-art accuracy on several publicly available datasets. Another related and promising work appears in

[28] which uses a deep learning architecture to infer misalignment between a pair of iris images that are represented in a segmentation-less polar domain.

I-A2 Periocular Recognition

In recent years, researchers have devoted consistent efforts to investigate new periocular recognition algorithms for the images acquired under less-constrained environments [2][30]. Earlier feasibility study on using the periocular regions for human recognition under varying imaging conditions is undertaken by Park et al. [25] in 2009, and promising results were reported, which provides support for the subsequent research. Bharadwaj et al. [3] further explored the effectiveness of periocular recognition in situations arising from the failure of iris recognition. In this work, part of the later research focuses on cross-spectral periocular matching [32] using the potential from the neural networks. The above explorative works have further motivated the researchers to continuously improve the matching accuracy of periocular images. In 2013, another promising approach appeared in [36]

, which exploited key-point features and spatial-filter banks, i.e., Dense-SIFT and LMF features, followed by K-means clustering for dictionary learning and representation. However, this approach did not investigate periocular-specific feature representations, and the uses computationally demanding Dense-SIFT features matching. Smereka and Kumar

[33] proposed the Periocular Probabilistic Deformation Model (PPDM) in 2015, which provided sound modeling for the potential deformations that exists among two matched periocular images. Inference of the captured deformation using correlation filter is utilized for matching periocular image pairs. Later in 2016, the same group of researchers improved their basic model by selecting discriminative patch regions for more accountable matching [34]. These two methods achieved promising performance on multiple datasets. Nevertheless, both of them relied on patch-based matching scheme, and therefore are more susceptible to scale variations or misalignment, that often violate the patch correspondences, which is more likely to happen during the real deployments. Deep learning techniques, especially convolutional neural networks (CNN), have gained immense popularity for computer vision and pattern analysis tasks in recent years.

A recent survey on periocular recognition methods [2] [30] suggests that few studies have considered the potential from deep learning techniques to boost the periocular matching accuracy. Reference [15] provides insightful observations on periocular features and comparison of machine with human matching performance. In [5], Bowyer and Burge present a systematic summary on the related ocular recognition systems and algorithms. More recently, Proença and Neves [27] claimed that iris and sclera regions might be less reliable for periocular recognition and proposed Deep-PRWIS. In their work, periocular images are augmented with inconsistent iris and sclera regions for training a deep CNN, so that the network implicitly degrades the iris and sclera features during learning. Promising results were reported from the Deep-PRWIS on two public databases. More promising efforts appear in [43]

, which uses a deep learning-based architecture for robust and accurate periocular recognition incorporating the attention model to emphasize the region with higher discriminative information. This algorithm achieves state-of-the-art accuracy on six publicly available databases and can serve as a reasonable baseline for further research in this area.

Ref. Features Recognition Performance evaluation
Comparative performance
for less-constrained iris databases
Iris Periocular Recognition Databases Recognition rates at FAR=0.0001 EER
[42] Yes No No
[43] No Yes No
[36] Yes Yes Yes
[40] Yes Yes No
Ours Yes Yes Yes
(a) Q-FIRE [16], (b) CASIA.v4 Distance [4], (c) CASIA-Mobile-V1-S3 [40]
TABLE I: Comparative summary of related and recent work on less-constrained iris recognition.

I-A3 Iris and Periocular Feature Fusion

The periocular information is simultaneously accessible from the iris images and therefore its use to achieve better iris recognition performance is a feasible strategy. Quite a few prior works have attracted attention to this aspect and several approaches have non-ideal scenarios. In 2010, Woodard et al. [39] combined iris and periocular features using score level combination, i.e., weight sum rule, to improve the recognition performance in non-ideal iris imagery. Optimal weights for the two modalities were empirically obtained. Another promising attempt appears in [36] which simultaneously recovers the iris feature extracted from log-Gabor filters, periocular features extracted from Dense-SIFT and LMF, to enhance the iris recognition accuracy under relaxed imaging constraints. Raja et al. [29] propose a framework to combine the information from face, iris and periocular biometric modalities for the user authentication on their smartphones. Various score level combination schemes are explored, including min rule, max rule, product rule, and weighted- score fusion rule, where the weight for each modality is determined according to its contribution to the recognition performance. Besides, some approaches adopt learning-based score-level fusion strategies. Santos et al. [31] present an artificial neural network with two hidden layers to fuse iris and periocular information at the score level for the mobile cross- sensor applications. Verma et al. [38] utilize the random decision forest (RDF), which is an ensemble learning method, to combine the match scores of iris and periocular biometrics. Noticeable improvement in the performance is shown for at-a-distance person recognition. Ahuja et al. [1]

extract the periocular feature using deep learning and the iris feature using the root SIFT. Then they combine the match scores from these two modalities using the mean rule and linear regression. There are some other promising attempts in the literature that integrate the information from these two biometric at the decision level and feature level combination. Santos and Hoyle


fuse iris and periocular modality at the decision level to increase the reliability in the unconstrained iris recognition. They train a logistic regression model to predict the weights for each of the classifiers and obtain a final response. Joshi et al.


investigate iris and periocular biometric performance from their feature level combination. They first concatenate iris and periocular features and then employ the Direct Linear Discriminant Analysis (DLDA) to obtain discriminative and low dimensional feature vectors for the final classification. More recently, Zhang

[40] provide a promising framework to combine the iris and periocular features extracted from maxout CNN to enhance the performance for mobile based personal identification.

I-B Our work

Accuracy of iris recognition under a less-constrained environment is known to significantly degrade, as compared to those from the conventional or standoff iris recognition systems. Such iris images are generally acquired with greater standoff distances, for the surveillance or from the mobile devices with less-cooperative individuals. This research is motivated to address such iris recognition challenges and evaluate iris recognition capabilities under more realistic scenarios. Iris images acquired under less constrained imaging environments often present varying regions of effective iris pixels [9][7]. In the context of such less constrained iris images, we revisit the conventional Hamming distance to match binarized iris templates. Such iris images present significant variations in occlusions which should be carefully considered while simultaneously utilizing available periocular features. Since iris information is inherently embedded in periocular images, the effectiveness of iris matching can benefit from the relative attention or eye area, like for the human visual systems [22]. Table I presents a summary of related work with our work in this paper for the less constrained iris recognition. The key contributions of this paper can be summarized as follows:

  • This paper introduces a new framework for the periocular assisted iris recognition. Iris images under a less-constrained imaging environment often present varying regions of effective

    iris pixels for the iris matching. Such differences in the effective number of available iris pixels can be used to dynamically reinforce periocular information which is simultaneously available from such iris images. Such dynamic reinforcement should also consider effective regions of discriminative features that receive varying attention during respective periocular matching. Our framework therefore incorporates such discriminative information using a multilayer perceptron network for the less-constrained iris recognition. The experimental results presented in Section


    for within-database matching using the receiver operating characteristics curve (ROC), on three publicly available databases, indicate outperforming results over state of the art methods. Also, the ROC results presented in Section

    III-C show that our algorithm outperforms others in cross-dataset matching. The results from within dataset matching and cross-dataset matching validate the effectiveness and generalization ability of the framework presented in this paper for the less-constrained iris recognition.

  • The importance of black (0) and white (1) pixels in binarized iris templates may not be the same or similar for iris image templates acquired under less constrained imaging. Therefore this paper presents a new approach to match such templates using a similarity measure, instead of Hamming distance in the literature, which can accommodate the importance of different bits in iris templates. The experimental results presented in this paper on three publicly available iris databases consistently indicate outperforming results and validate the effectiveness of such approach for less-constrained iris recognition.

Comparative performance from our approach with other competing methods, on three common and public iris images datasets, is also summarized in Table I. The rest of this paper is organized as follows. Section II, provides details on our unified framework for less constrained iris recognition. This section also includes the architectures for iris and periocular recognition, together with the formulation of the dynamic fusion approach introduced in this work. Our comparative experimental results from within dataset matching and cross-dataset matching using three different public databases are presented in Section III, The discussion section appears in Section IV which discusses the theoretical reasons of the effectiveness of our proposed approaches. The key conclusions from this paper are summarized in Section V.

Fig. 1: The framework for the deep dynamic fusion using iris and periocular information.

Ii Periocular-Assisted Iris Recognition Framework

The framework for periocular-assisted and multi-feature collaboration schemes to achieve dynamic iris recognition is illustrated in Figure 1. The detailed explanation of different blocks in this diagram is systematically introduced in the following three sections. This framework adopts the UniNet [42] to achieve accurate iris matching while the AttenNet and FCN-Peri [43]

are embedded in simultaneously matching the periocular regions in the acquired eye or iris images. The network is trained during two different training or offline phases. We firstly pre-process each of the acquired eye images to independently recover the normalized iris images respective periocular images. The corresponding region of interest images is fed to the respective subnets and trained independently during the first network training phase. During the second training phase, all the parameters in two subnets are frozen and and used to recover recover several cues that indicate the similarity among the iris and periocular templates, including the effective region of iris images among matched template and the corresponding periocular region components among the matched templates. Finally these several cues from the two subnets and employed to train a multilayer perceptron (MLP) network that can enable a binary prediction using the softmax cross-entropy loss. During the performance evaluation or the test phase, a pair of eye images are fed into the trained models and recover the prediction results from the the last softmax layer. These softmax layer results are considered as the consolidated match scores between the input or the unknown eye pair images. Thesse consolidated match scores are used to achieve the binary or the classification decisions for the different applications. Following sections provide further details on different components of the framework.

Ii-a Iris Template Generation and Comparisons

Each of the acquired eye images is first subjected to the localization of region of interest or the iris segmentation and image normalization. These preprocessing steps results in the normalized iris images and were same as employed in earlier work [41]. The dimension of all the segmented and normalized iris images generated from the preprocessing steps, for all the databases employed in our work, is 51264 pixels. These images are also subjected to the contrast enhancement which saturates 5% of iris region pixels at high and low intensities.

The normalized rectangular iris images are subjected to recover respective feature templates and respective masks depicting valid iris pixels or regions. The UniNet architecture introduced in [42] has shown to offer state-of-art iris matching capabilities and was also adopted in this work. The UniNet includes two fully-convolutional sub-networks called FeatNet and MaskNet as specified in Table II. The MaskNet generates binary mask distinguishing the valid and invalid or less reliable regions in the iris templates that often degrade the iris matching accuracy. The network uses triplet architecture for the training and we generate triplets in a ratio of 1:3 between the genuine match pairs and the imposter match pairs for the respective training sets. The MaskNet is pre-trained using from ND-IRIRS-0405 Iris Image Dataset [26] and all the parameters are frozen in this work. The FeatNet, pretrained with ND-IRIRS-0405 Iris Image Dataset and publicly made available from [42], is finetuned using the triplet pairs generated from the respective training sets. The FeatNet is essentially a fully convolutional neural network and aims to learn the same size but more robust pseudo-binary representation of the input iris images. The loss function introduced in the FeatNet training is the extended triplet loss which aims to enlarge the margin of the pseudo-Hamming distance between the intra-class and inter-class matching. The extended triplet loss can be defined as follows.

Layer Name Layer Type Kernel Size Output Channel
Conv1 Convolution 37 16
Tanh1 TanH Activation - 16
Pool1 Average Pooling 22 16
Conv2 Convolution 35 32
Tanh2 TanH Activation - 32
Pool2 Average Pooling 22 32
Res1 Deconvolution 44 32
Conv3 Convolution 33 64
Tanh3 TanH Activation - 64
Pool3 Average Pooling 22 64
Res2 Deconvolution 88 64
Concat Concatenation - 112
Conv4 Convolution 33 1
m_Conv1 Convolution 33 16
m_ReLU1 ReLU Activation - 16
m_Pool1 Max Pooling 22 16
m_Conv2 Convolution 35 32
m_ReLU2 ReLU Activation - 32
m_Pool2 Max Pooling 22 32
m_Score2 Convolution 11 2
m_Conv3 Convolution 35 64
m_ReLU3 ReLU Activation - 64
m_Pool3 Max Pooling 22 64
m_Score3 Convolution 11 2
m_Conv4 Convolution 35 128
m_ReLU4 ReLU Activation - 128
m_Pool4 Max Pooling 22 128
m_Score4 Convolution 11 2
m_Upscore4 Deconvolution 88 2
m_Score34 Elementwise Sum - 2
m_Upscore34 Deconvolution 44 2
m_Score234 Elementwise Sum - 2
m_Fuse Deconvolutiion 44 2
TABLE II: The specification of incorporated UniNet.

where is the batch size, ,, are the corresponding masked feature maps generated by the FeatNet

, and m is the hyperparameter controlling the margin between anchor-positive and anchor-negative distances.

Ii-B Comparisons using Similarity Score

Hamming distance is widely employed to compute the dissimilarities between two binary feature templates in a range of biometric identification problems, such as for the iris or the palmprint recognition. It assumes that the information content from all the template values in the coding space is equally important to distinguish the user identity. However the choices of feature extraction and binarization methods, along with the nature of input images, can effectively determine the importance of white (ones) area and black (zeros) area in the encoded images. Therefore, a more flexible distance measure that can consider such asymmetric importance is proposed to be incorporated for matching less-constrained iris images. Such measure is also referred to as the weighted similarity score (WS) with azzoo similarity measure [8] and was also incorporated for matching iris templates.

The effectiveness of white pixel matching and the black pixel matching in feature templates can also be experimentally evaluated. Let us assume that the number of white pixels and black pixels from one feature template A can be respectively represented as and . While comparing two template A and template B, we can perform only white pixels matching and only black pixels matching , and can compute the white pixel matching rates and black pixel matching rate as shown in the following two equations.


The difference in the contributions from different pixels matching,i.e. average and from the genuine and imposter pairs, can also be empirically observed from the experiments using templates generated from the databases. We select 1,000 genuine matching and 2,000 imposter matching from the test on CASIA-Mobile-V1-S3 dataset for empirical evaluation. It was observed that the average are 0.5733 and the average is 0.6138 for the genuine matches, while are 0.4159 and the average is 0.4563 for the imposter matches

In order to accommodate differences in the discriminative information from the white pixel pairs and from the black pixel pairs, we use different weight and generate weighted similarity measure as follows:


where , are pixels in row i and column j in two matched two iris templates, and is hyperparameter controlling the significance of coding pairs. In all our experiments, is empirically set as 0.3. Assuming the image size of iris images are , we generate the match score using the weighted similarity as follows:


It can be observed that when the is unity, the value of is essentially the difference between unity and the normalized Hamming distance. Therefore weighted similarity can be considered as a more flexible alternative for the templates matching.

Ii-C Periocular Template Generation and Comparisons

Layer Name Layer Type Kernel Size Output Channel
Conv1_1 Convolution 55 32
ReLU1_1 ReLU Activation - 32
Conv1_2 Convolution 55 32
ReLU1_2 ReLU Activation - 32
Pool1 Max Pooling 22 32
Slice_roi Slice - 32
Conv2_1, A_Conv2_1 Convolution 33 32
ReLU2_1, A_ReLU2_1 ReLU Activation - 32
Conv2_2, A_Conv2_2 Convolution 33 32
ReLU2_2, A_ReLU2_2 ReLU Activation - 32
Pool2, A_Pool2 Max Pooling 22 32
Att2 Attention - -
Conv3_1, A_Conv3_1 Convolution 33 64
ReLU3_1, A_ReLU3_1 ReLU Activation - 64
Conv3_2, A_Conv3_2 Convolution 33 64
ReLU3_2, A_ReLU3_2 ReLU Activation - 64
Pool3, A_Pool3 Max Pooling 22 64
Conv4_1, A_Conv4_1 Convolution 33 64
ReLU4_1, A_ReLU4_1 ReLU Activation - 64
Conv4_2, A_Conv4_2 Convolution 33 64
ReLU4_2, A_ReLU4_2 ReLU Activation - 64
Pool4, A_Pool4 Max Pooling 22 64
Att4 Attention - -
Feat, A_Feat Fully Connecte - 64
Conv1 Convolution 55 16
ReLU1 ReLU Activation - 16
Pool1 Max Pooling 22 16
Conv2 Convolution 33 32
ReLU2 ReLU Activation - 32
Conv2_s Convolution 11 3
Pool2 Max Pooling 22 32
Conv3 Convolution 33 64
ReLU3 ReLU Activation - 64
Conv3_s Convolution 11 3
Pool3 Max Pooling 22 64
Conv4 Convolution 33 128
ReLU4 ReLU Activation - 128
Conv4_s Convolution 11 3
Upscore4 Deconvolution 88 3
Score34 Elementwise Sum - 3
Upscore34 Deconvolution 44 3
Score234 Elementwise Sum - 3
Fuse Deconvolution 44 3
TABLE III: Details on the architecture for the AttenNet and FCN-Peri.

The periocular preprocessing is more simplified and incorporates image normalization with a bilinear filter. The dimensions of all normalized periocular images are empirically fixed as 300240. Earlier research has shown that periocular recognition with attention models can offer state-of-art performances [43] and was also employed for generating periocular template images for the matching. Therefore the periocular recognition model also includes two components, FCN-Peri and AttenNet. The architecture for these networks are detailed in Table III. The FCN-Peri is a fully convolutional network which aims to detect the eye region and eyebrow region in the presented periocular images. We use the FCN-Peri for the near-infrared (NIR) images, as publicly made available in [43], and do not perform any further tuning. With such automatically detected eye and eyebrow region, the AttenNet provides pixel locations to these specific particular regions so that specific attention is incorporated to these locations in generating more discriminant periocular features. The output of AttenNet is a feature vector with 512 elements. We compute the distance-driven sigmoid cross-entropy (DSC) loss between the siamese pairs, which are generated from the corresponding training set during the training phase. The ratio of genuine pairs and imposter pairs is set empirically set as 1:2 for all our experiments. The DSC loss [43] can be defined as follows.


where is the batch size, is the ground truth label for every genuine and imposter pair, and is a transformed Euclidean distance.

Ii-D Segmentation-Aware Dynamic Fusion

Layer Name Layer Type Input Channel Output Channel
FC1 Fully Connected 8 32
Tanh1 TanH Activation 32 32
FC2 Fully Connected 32 16
Tanh2 TanH Activation 16 16
FC3 Fully Connected 16 8
Tanh3 TanH Activation 8 8
FC4 Fully Connected 8 2
TABLE IV: The specification of incorporated MLP.

Any effective dynamic mechanism to simultaneously utilize the iris and periocular information should carefully consider multiple cues, not just from the individual feature similarity but also from the segmentation steps which can provide (dynamic) importance for the individual similarity scores. Iris images under less-constrained imaging often present varying number of effective iris pixels, that are incorporated to generate respective iris match scores. The differences in the effective number of available iris pixels, among two matched iris images, can be used to dynamically reinforce periocular information for more reliable match score. Such dynamic reinforcement should also consider effective regions of discriminative features, which are receiving varying attention during respective periocular matching. Therefore we incorporate multilayer perceptron network to dynamically consolidate such multiple pieces of discriminative information and generate more reliable consolidated match score between two unknown or input images.

As illustrated in Figure 1, the UniNet generates pseudo-binary feature maps, along with the respective masks, while the AttenNet generates the feature vectors to compute Euclidean distance among respective ROI maps. Therefore we can simultaneously generate iris match scores and periocular match score using the Euclidean distance. Another important input for MLP, which effectively represents the importance or the quality of respective iris match scores, is the mask rate. This mask rate is the ratio between the valid pixels and all iris pixels among two matched iris image templates. Similarily the effectiveness of periocular feature template match scores is represented using the eye and eyebrow ratio sum and the difference, i.e., sum (also difference) of eye areas among matched periocular images and sum (also difference) of eyebrow areas among matched periocular images. It should be noted that these eye and eyebrow areas are automatically predicted or available from AttenNet as shown in Figure 1. The MLP network therefore receives an eight-element feature vector and is trained offline using respective genuine and impostor pairs from the training dataset. The architecture of incorporated MLP is shown in Table IV. The network training attempts for a binary classification using softmax cross-entropy loss, with respective genuine and impostor class labels. The trained network is used to generate consolidated match scores from the softmax value in the last layer output which ranges between 0 and 1.

Iii Experiments and Results

We perform a series of experiments on three publicly available datasets to ascertain the effectiveness of the proposed framework for less-constrained iris recognition. This section firstly provides brief but necessary information for the three public datasets used in this work. We then explain the experimental protocols in the following section. This section also provides a comparative analysis of results from our method with other state-of-the-art methods.

Fig. 2: Sample eye images from employed datasets(a) CASIA-Mobile-V1-S3 dataset. (b) CASIA iris image v.4 distance dataset. (c) Q-FIRE-05-middle-illumination dataset.

Iii-a Datasets and Protocols

The experimental results presented in this section utilized the following three near-infrared eye image datasets in the public domain. Figure 2 illustrates the sample eye images from these different datasets.

Iii-A1 Q-FIRE-05-middle-illumination Dataset

The Quality in Face and Iris Research Ensemble (Q-FIRE) dataset [16] is a publicly available dataset with at-a-distance iris images. Our experiments use Q-FIRE-05-middle-illumination subset which has been acquired at a distance of five feet. under middle-level near-infrared illumination. We automatically segment the periocular region images with a trained Fast-RCNN detector. The processed dataset includes both eye images from 159 different subjects. The first 15 right-eye images are used to train the network while the first ten left-eye images are used for the test evaluation. Therefore this set of experiments generate 7,155 (45 159) genuine scores and 1,256,100 (159 158 50) imposter match scores.

Iii-A2 CASIA-Moblie-V1-S3 Dataset

CASIA-Mobile-V1-S3 dataset [40] is another publicly available dataset that includes 3600 face images from 360 different subjects and these images have been acquired using a mobile device with near-infrared illumination. A Fast-RCNN detector [12] is trained with 100 manually labeled samples to detect the periocular region. We follow the same match protocols, both for the iris matching and periocular matching as described in [40]. Therefore the training set includes 3600 samples from 360 classes (eyes) in the first 180 subjects. The test set includes the other 3600 samples from 360 classes (eyes) in 180 subjects. The left eye is matched with all the left-eye images while the right-eye images are matched with all the right ones. After that, the left eye match scores and right eye match scores are combined using the sum rule and generate 8,100 genuine and 1,611,000 imposter match scores.

Iii-A3 CASIA Iris Image v.4 Distance Dataset

This subset of the CASIA.v4 database [4] contains the upper part of faces images from 142 subjects. We detect the iris region images with an OpenCV-implemented iris detector [24], as in earlier references, and generate an eye dataset with 2,446 instances. The training set comprises all the right eye samples, and the test set is composed of all the left eye samples as in [42]. The test set therefore generates 20,702 genuine and 2,969,533 imposter match scores.

Iii-B Iris and Periocular Recognition

We firstly present comparative experimental results using simultaneously recovered iris and periocular features using the framework presented in Section II. Under this set of experiments, all the models were trained using their respective training set and verification performance is evaluated using the respective test set as detailed in earlier sections. We use iris recognition results generated from the UniNet [42], and periocular recognition results generated using the AttenNet [43], as the baseline methods for the comparative performance evaluation. Also, we provide comparison using the static score level combination, using the iris match scores generated using similarity measure by us with the periocular match scores, with weighted sum. These comparative results from our algorithms and respective benchmarks are presented in Figure 3 and summarized in Table V.

(a) CASIA-Moblie-V1-S3
(b) CASIA Iris Image v.4 Distance
(c) Q-FIRE-05-middle-illumination
Fig. 3: Comparative receiver characteristic curve (ROC) results from within dataset matching.
CASIA-Mobile-V1-S3 CASIA.v4 Distance Q-FIRE
(Periocular and Iris)
85.4% 2.43% 59.2% 9.93% 68.0% 13.86%
Maxout CNN[40]
(Periocular and Iris)
75.4% 7.15% 21.0% 17.99% 44.5% 16.74%
AttenNet[43] (Periocular) 64.6% 3.93% 61.6% 14.27% 53.9% 10.55%
UniNet[43] (Iris) 75.5% 3.94% 75.3% 5.54% 38.7% 9.72%
Weighted Similarity
(Ours, Iris)
85.3% 2.57% 76.0% 6.12% 44.9% 9.85%
Static Fusion (Ours) 92.5% 1.85% 81.5% 5.23% 69.8% 4.95%
Dynamic Fusion (Ours) 94.3% 0.73% 86.3% 2.29% 83.6% 3.97%
TABLE V: Summary of recognition rates and equal error rates values from the comparison within dataset matching.

The receiver characteristic curves (ROC) shown in Figure 3, along with the GAR and equal error rate (EER) summarized in Table V, indicate outperforming results in this set of within database experiments. It can be observed that the iris recognition itself, using the proposed similarity measure, achieve significantly superior performance over the state of the art iris recognition approach in [42]. The combination of respective iris and periocular match scores using static fusion offers significant performance improvement while the dynamic fusion framework using DCNN provides consistently outperforming results on three different datasets. Our approach also outperforms the framework proposed in TIP13 [36]. This limited performance can be attributed to the lack of any specialized periocular matching algorithm in [36] and our analysis indicates that it is the main constraint in limiting the overall performance. The Maxout CNN is implemented by ourselves based on the parameters provided in [40] since there is no publicly available code for the employed DCNN model and the segmentation algorithm. Also, the Bath dataset used to pre-train the model is no longer publicly available.

Iii-C Cross-Database Performance Evaluation

Train CASIA.v4 Distance CASIA-Mobile-V1-S3
Test CASIA-Mobile-V1-S3 Q-FIRE CASIA.v4 Distance Q-FIRE
AttenNet [43] (Periocular) 9.78% 10.15% 13.69% 12.79%
UniNet [42] (Iris) 4.11% 9.72% 7.06% 10.01%
Dynamic Fusion (Ours) 1.62% 6.49% 6.28% 6.43%
TABLE VI: Comparative summary of recognition rates and equal error rates values from the cross-dataset matching.
(a) CASIA-Mobile-V1-S3
(b) Q-FIRE
(c) CASIA.v4 Distance
(d) Q-FIRE
Fig. 4: Comparative ROC results from cross-dataset performance evaluation. The results in (a)-(b) use the model trained with CASIA.v4 Distance dataset while results in (c)-(d) use the model trained with CASIA-Mobile-V1-S3 dataset.

In the cross-database configuration, we incorporate the model trained from CASIA.v4-distance to match CASIA-Mobile-V1-S3 and Q-FIRE dataset images directly without any fine-tuning. In addition, we also present cross-database experimental results with the model trained using CASIA-Mobile-V1-S3 database and tested on the CASIA.v4-distance and Q-FIRE dataset images. These set of experiments are aimed to validate the generalization capability of our framework, especially when the image samples available for the training are quite limited. The EER values are summarized in Table VI and the respective ROCs are shown in Figure 4.

The results summarized in this set of experiments indicate consistent improvement from our framework during the cross-database matching which reveals the generality of the framework in matching less-constrained iris images.

Iv Discussion

(a) CASIA-Moblie-V1-S3
(b) CASIA.v4 Distance
(c) Q-FIRE
Fig. 5: Distribution of matching scores from iris and periocular features.

The complementary nature of match scores generated from the deep features in our experiments can be visualized from the two-dimensional plots representing iris and periocular scores. Figure


illustrates such plots for the distribution of (normalized) genuine and imposter scores from iris and periocular matching using respective databases. The subplots in each axis are the kernel density estimation of each score distribution. These plots from less-constrained images indicate that the joint use of individual match scores can be used to more effectively separate genuine and impostor match scores as pursued in this work.

Fig. 6: Degraded quality image samples: (a) Defocus blur sample in Q-FIRE dataset, (b) Poor illumination sample in CAISA-Mobile-V1-S3 dataset, (c) Severely occluded sample in CASIA v.4 Distance dataset.

V Conclusions and Future Work

This work has introduced a new framework for the periocular assisted iris less-constrained recognition. Our approach has attempted to use better matches for the periocular matching and introduces a similarity score for more accurate iris recognition. The fusion mechanism can dynamically consider the importance of each of the modality, their relative importance, and effective region of the interest to generate more reliable consolidated match scores. The experimental results presented in Section III using three publicly available datasets demonstrate the merit of the proposed approach, with the outperforming ROC results under within dataset and cross dataset scenario. In order to ensure reproducibility of all our results we will provide all codes, along with ground truth labels. Building an end-to-end framework for periocular and iris recognition framework is one possible direction to further improve this work. Iris recognition itself can be considered as an attention in the periocular recognition. An end-to-end framework that can perform segmentation and simultaneously learn robust features is expected to be more attractive, elegant and is part of further work in this area.


  • [1] K. Ahuja, R. Islam, F. A. Barbhuiya, and K. Dey (2016-12) A preliminary study of CNNs for iris and periocular verification in the visible spectrum. In

    2016 23rd International Conference on Pattern Recognition (ICPR)

    pp. 181–186. External Links: ISBN 978-1-5090-4847-2, Document Cited by: §I-A3.
  • [2] F. Alonso-Fernandez and J. Bigun (2016) A survey on periocular biometrics research. Pattern Recognition Letters 82, pp. 92–105. Cited by: §I-A2, §I-A2, §I.
  • [3] S. Bharadwaj, H. S. Bhatt, M. Vatsa, and R. Singh (2010) Periocular biometrics: when iris recognition fails. In 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–6. Cited by: §I-A2, §I.
  • [4] Biometrics Ideal Test, CASIA.v4 database. External Links: Link Cited by: TABLE I, §III-A3.
  • [5] K. W. Bowyer and M. J. Burge (2016) Handbook of iris recognition. Springer. Cited by: §I-A2.
  • [6] K. W. Bowyer, K. Hollingsworth, and P. J. Flynn (2008) Image understanding for iris biometrics: a survey. Computer vision and image understanding 110 (2), pp. 281–307. Cited by: §I.
  • [7] J. Cambier (2011) Biometric data interchange formats–part 6: iris image data. ISO/IEC 19794. Cited by: §I-B.
  • [8] H. M. Cheng and A. Kumar (2018) Advancing Surface Feature Encoding and Matching for More Accurate 3D Biometric Recognition. In Proceedings - International Conference on Pattern Recognition, Vol. 2018-Augus, pp. 3501–3506. External Links: ISBN 9781538637883, Document, ISSN 10514651 Cited by: §II-B.
  • [9] J. Daugman (2004) How Iris Recognition Works. IEEE Transactions on Circuits and Systems for Video Technology 14 (1). External Links: Document Cited by: §I-A1, §I-A1, §I-B, §I.
  • [10] M. Frucci, M. Nappi, D. Riccio, and G. Sanniti di Baja (2016-04) WIRE: Watershed based iris recognition. Pattern Recognition 52, pp. 148–159. External Links: Document, ISSN 0031-3203 Cited by: §I-A1.
  • [11] A. Gangwar and A. Joshi (2016) DeepIrisNet: Deep iris representation with applications in iris recognition and cross-sensor iris recognition. In Proceedings - International Conference on Image Processing, ICIP, Vol. 2016-Augus, pp. 2301–2305. External Links: ISBN 9781467399616, Document, ISSN 15224880 Cited by: §I-A1.
  • [12] R. Girshick (2015) Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision 2015 Inter, pp. 1440–1448. External Links: ISBN 9781467383912, Document, ISSN 15505499 Cited by: §III-A2.
  • [13] L. Grady (2006-11) Random Walks for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (11), pp. 1768–1783. External Links: Document, ISSN 0162-8828 Cited by: §I-A1.
  • [14] F. He, Y. Han, H. Wang, J. Ji, Y. Liu, and Z. Ma (2017-03)

    Deep learning architecture for iris recognition based on optimal Gabor filters and deep belief network

    Journal of Electronic Imaging 26 (2), pp. 023005. External Links: Document, ISSN 1017-9909 Cited by: §I-A1.
  • [15] K. P. Hollingsworth, S. S. Darnell, P. E. Miller, D. L. Woodard, K. W. Bowyer, and P. J. Flynn (2011) Human and machine performance on periocular biometrics under near-infrared light and visible light. IEEE transactions on information forensics and security 7 (2), pp. 588–601. Cited by: §I-A2.
  • [16] P. A. Johnson, P. Lopez-Meyer, N. Sazonova, F. Hua, and S. Schuckers (2010) Quality in face and iris research ensemble (Q-FIRE). In 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–6. Cited by: TABLE I, §III-A1.
  • [17] A. Joshi, A. K. Gangwar, and Z. Saquib (2012-12) Person recognition based on fusion of iris and periocular biometrics. In 2012 12th International Conference on Hybrid Intelligent Systems (HIS), pp. 57–62. External Links: ISBN 978-1-4673-5116-4, Document Cited by: §I-A3.
  • [18] J. Long, E. Shelhamer, and T. Darrell (2015-06) Fully Convolutional Networks for Semantic Segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440. External Links: ISBN 9781467369640, Document, ISSN 10636919 Cited by: §I-A1.
  • [19] L. Masek (2003) Recognition of human iris patterns for biometric identification. Ph.D. Thesis, University of Western Australia. Cited by: §I-A1, §I-A1.
  • [20] D. Menotti, G. Chiachia, A. Pinto, W. Robson Schwartz, H. Pedrini, A. Xavier Falcao, and A. Rocha (2015-04) Deep Representations for Iris, Face, and Fingerprint Spoofing Detection. IEEE Transactions on Information Forensics and Security 10 (4), pp. 864–879. External Links: Document, ISSN 1556-6013 Cited by: §I-A1.
  • [21] K. Miyazawa, K. Ito, T. Aoki, K. Kobayashi, and H. Nakajima (2008-10) An Effective Approach for Iris Recognition Using Phase-Based Image Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (10), pp. 1741–1756. External Links: Document, ISSN 0162-8828 Cited by: §I-A1.
  • [22] V. Mnih, N. Heess, A. Graves, et al. (2014) Recurrent models of visual attention. In Advances in neural information processing systems, pp. 2204–2212. Cited by: §I-B.
  • [23] D. M. Monro, S. Rakshit, and D. Zhang (2007-04) DCT-Based Iris Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (4), pp. 586–595. External Links: Document, ISSN 0162-8828 Cited by: §I-A1.
  • [24] OpenCV based face and eye detector.. External Links: Link Cited by: §III-A3.
  • [25] U. Park, A. Ross, and A. K. Jain (2009-09) Periocular biometrics in the visible spectrum: A feasibility study. In 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems, pp. 1–6. External Links: ISBN 978-1-4244-5019-0, Document Cited by: §I-A2, §I.
  • [26] P. J. Phillips, W. T. Scruggs, A. J. O’Toole, P. J. Flynn, K. W. Bowyer, C. L. Schott, and M. Sharpe (2010) FRVT 2006 and ICE 2006 large-scale experimental results. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (5), pp. 831–846. External Links: Document, ISSN 01628828 Cited by: §II-A.
  • [27] H. Proenca and J. C. Neves (2018-04) Deep-PRWIS: Periocular Recognition Without the Iris and Sclera Using Deep Learning Frameworks. IEEE Transactions on Information Forensics and Security 13 (4), pp. 888–896. External Links: Document, ISSN 1556-6013 Cited by: §I-A2.
  • [28] H. Proenca and J. C. Neves (2019) Segmentation-less and non-holistic deep-learning frameworks for iris recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §I-A1.
  • [29] K. B. Raja, R. Raghavendra, M. Stokkenes, and C. Busch (2015-05) Multi-modal authentication system for smartphones using face, iris and periocular. In 2015 International Conference on Biometrics (ICB), pp. 143–150. External Links: ISBN 978-1-4799-7824-3, Document Cited by: §I-A3.
  • [30] A. Rattani and R. Derakhshani (2017-03) Ocular biometrics in the visible spectrum: A survey. Image and Vision Computing 59, pp. 1–16. External Links: Document, ISSN 0262-8856 Cited by: §I-A2, §I-A2.
  • [31] G. Santos, E. Grancho, M. V. Bernardo, and P. T. Fiadeiro (2015-05) Fusing iris and periocular information for cross-sensor recognition. Pattern Recognition Letters 57, pp. 52–59. External Links: Document, ISSN 0167-8655 Cited by: §I-A3.
  • [32] A. Sharma, S. Verma, M. Vatsa, and R. Singh (2014-10) On cross spectral periocular recognition. In 2014 IEEE International Conference on Image Processing (ICIP), pp. 5007–5011. External Links: ISBN 978-1-4799-5751-4, Document Cited by: §I-A2.
  • [33] J. M. Smereka, V. N. Boddeti, and B. V. K. Vijaya Kumar (2015-09) Probabilistic Deformation Models for Challenging Periocular Image Verification. IEEE Transactions on Information Forensics and Security 10 (9), pp. 1875–1890. External Links: Document, ISSN 1556-6013 Cited by: §I-A2, §I.
  • [34] J. M. Smereka, B. V. K. V. Kumar, and A. Rodriguez (2016-02) Selecting discriminative regions for periocular verification. In 2016 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), pp. 1–8. External Links: ISBN 978-1-4673-9727-8, Document Cited by: §I-A2.
  • [35] V. Talreja, M. C. Valenti, and N. M. Nasrabadi (2017) Multibiometric secure system based on deep learning. In 2017 IEEE Global conference on signal and information processing (globalSIP), pp. 298–302. Cited by: §I-A3.
  • [36] C. W. Tan and A. Kumar (2013) Towards online iris and periocular recognition under relaxed imaging constraints. IEEE Transactions on Image Processing 22 (10), pp. 3751–3765. External Links: Document, ISSN 10577149 Cited by: §I-A1, §I-A2, §I-A3, TABLE I, §I, §III-B, TABLE V.
  • [37] T. Tan, Z. He, and Z. Sun (2010-02) Efficient and robust segmentation of noisy iris images for non-cooperative iris recognition. Image and Vision Computing 28 (2), pp. 223–230. External Links: Document, ISSN 0262-8856 Cited by: §I-A1.
  • [38] S. Verma, P. Mittal, M. Vatsa, and R. Singh (2016-09) At-a-distance person recognition via combining ocular features. In 2016 IEEE International Conference on Image Processing (ICIP), pp. 3131–3135. External Links: ISBN 978-1-4673-9961-6, Document Cited by: §I-A3.
  • [39] D. L. Woodard, S. Pundlik, P. Miller, R. Jillela, and A. Ross (2010-08) On the Fusion of Periocular and Iris Biometrics in Non-ideal Imagery. In 2010 20th International Conference on Pattern Recognition, pp. 201–204. External Links: ISBN 978-1-4244-7542-1, Document Cited by: §I-A3.
  • [40] Q. Zhang, H. Li, Z. Sun, and T. Tan (2018) Deep feature fusion for iris and periocular biometrics on mobile devices. IEEE Transactions on Information Forensics and Security 13 (11), pp. 2897–2912. External Links: Document, ISSN 15566013 Cited by: §I-A3, TABLE I, §III-A2, §III-B, TABLE V.
  • [41] Z. Zhao and A. Kumar (2015-12) An Accurate Iris Segmentation Framework Under Relaxed Imaging Constraints Using Total Variation Model. In 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3828–3836. External Links: ISBN 978-1-4673-8391-2, Document Cited by: §I-A1, §II-A.
  • [42] Z. Zhao and A. Kumar (2017-12) Towards More Accurate Iris Recognition Using Deeply Learned Spatially Corresponding Features. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2017-Octob, pp. 3829–3838. External Links: ISBN 9781538610329, Document, ISSN 15505499 Cited by: §I-A1, TABLE I, §II-A, §II, §III-A3, §III-B, §III-B, TABLE VI.
  • [43] Z. Zhao and A. Kumar (2018) Improving periocular recognition by explicit attention to critical regions in deep neural network. IEEE Transactions on Information Forensics and Security 13 (12), pp. 2937–2952. External Links: Document, ISSN 15566013 Cited by: §I-A2, TABLE I, §I, §II-C, §II, §III-B, TABLE V, TABLE VI.
  • [44] Zhenan Sun, Tieniu Tan, Z. Sun, and T. Tan (2008) Ordinal Measures for Iris Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (12), pp. 2211–2226. External Links: Document, ISSN 0162-8828 Cited by: §I-A1.