Automated systems (including biometrics) are increasingly used in decision making processes within various domains, some of which have traditionally enjoyed strong anti-discrimination legislation protection (see e.g. ).
In recent years, substantial media coverage of systemic biases inherent to several such systems have been reported and hotly debated. In this context, a biased algorithm produces statistically different outcomes (decisions) for different groups of individuals, e.g. based on sex, age, and ethnicity . For biometric recognition specifically, it means that the score distributions and therefore the chances of false positives and/or false negatives may vary across the groups. This, in consequence, may impact the outcomes on the system/application level – for example, one recent study claimed disproportionally high arrest and search rates for certain groups based on decisions made by automatic facial recognition software .
Although some studies have approached bias measurements and ensuring fairness in various machine learning contexts (seee.g.  and 
), for computer vision and biometrics in particular, this remains a nascent field of research. In, it was reported that facial recognition algorithms tend to exhibit higher biometric performance with individuals from ethnic groups corresponding to the area of development of the algorithm; presumably due to training data availability. In  and , some facial biometrics algorithms were shown to exhibit lower recognition and classification performances for certain groups of individuals (in particular women and non-white people), whereas 
conducted a large-scale benchmark of commercial and academic algorithms controlling for various demographic (and other) attributes. Proof-of-concept studies into bias mitigation for facial soft biometric classification using neural networks were presented ine.g.  and .
While facial recognition is certainly the most widely covered and discussed biometric characteristic recently, also with some existing preliminary studies in the context of bias and fairness, this topic remains even less explored for other biometric characteristics. In general, algorithmic bias is considered (by some influential researchers) to be one of the, as of yet unresolved, challenges in biometric systems . For fingervein specifically, a small study evaluating the biometric performance w.r.t. sex and age of the subjects (as a part of a paper presenting a new dataset) has been conducted in . Intuitively, the demographic covariates could be proxies for certain anatomical features which might influence the performance of fingervein recognition systems (e.g. the thickness of the finger). Otherwise, currently no studies benchmarking the potential biases in fingervein algorithms have been reported in the scientific literature. In this paper, a benchmark is conducted to address the following two questions, see figure 1:
Do score distributions computed by fingervein recognition algorithms on disjoint groups of data instances (i.e. based on metadata attributes, such as subject sex, age, and others) exhibit statistically significant differences?
Do these results persist across fundamentally different types of fingervein recognition algorithms?
Ii Experimental Setup
Four publicly available fingervein databases with metadata labels were used. They are listed in table I, while example images are shown in figure 2. The datasets were chosen based on the presence of the metadata information (sex and age), as well as the presence of samples from both hands and different fingers.
|PLUS [10, 23]||78||468||2340|
Ii-B Feature Types
All experiments are executed using four fundamentally different types of vein recognition schemes:
Keypoints: In contrast to the vein pattern techniques, key-point based techniques try to use information from the most discriminative points as well as considering the neighborhood and context information of these points by extracting key-points and assigning a descriptor to each key-point. We used a Scale-Invariant Feature Transform (SIFT ) based technique with additional key-point filtering. All details of this method are described in .
) is a small neural network which is trained using the triplet loss, a special loss function that enables the identification of subjects that were not included in the training set. By using a more advanced selection of the triplets (input images) for training (hard triplet online selection) and by omitting the supervised discrete hashing of the CNN outputs, the results could be improved w.r.t.. It should be noted that the LCNN was trained according to its actual purpose, the recognition of individuals based on the captured vein images. If the net would be trained according to the chosen groups, such as sex or age, the results might look different.
|Dataset||EER (in %)|
Ii-C Data Processing Pipelines
The finger vein recognition tool-chain consists of the following components: (1) For finger region detection, finger alignment and ROI extraction an implementation that is based on  is used. (2) To improve the visibility of the vein pattern, High Frequency Emphasis Filtering (HFE) , Circular Gabor Filter (CGF) , and simple CLAHE (local histogram equalisation)  are used during pre-processing. (3a) For the simple vein pattern based feature methods, MC and PC, the binary feature images are compared using a correlation measure, calculated between the input images and in x- and y-direction shifted and rotated versions of the reference image as described in 
. (3b) The SIFT based method applies feature extraction and comparison as proposed in, and (3c) the LBP based approach as described in  where the LDP features are replaced with LBP features from , respectively. (3d) For LCNN, the ROIs extracted in (1) have been resized to the required input size of 256256. For separating training from input data, a 2-fold cross validation has been applied. Due to a limited number of samples, a training on the VERA data set was not possible. The biometric performance of the used methods is summarised in table II.111Supplementary files (e.g. comparison scores) are available for download at http://wavelab.at/sources/Drozdowski20a/.
The conceptual overview of the conducted experiments is shown in figure 1
. The descriptive statistics (mean –
and standard deviation –) of the score distributions w.r.t. the metadata-based groupings are computed in subsections III-A to III-C. Additionally, for comparison, table III shows the same statistics for all template comparisons without any metadata-based grouping. The algorithms have different ranges of scores. In the context of the conducted bias benchmark, only intra-algorithm comparisons are meaningful – therefore, score normalisation is not required. The results are only listed where available (e.g. the VERA dataset only contains two samples per finger and therefore it is not possible to train a CNN using the triplet loss function).
In this experiment, the data instances are grouped by the subject sex (male and female) with the genuine and impostor score distributions computed within the groups. Table IV shows the descriptive statistics of those score distributions.
|Attribute||Z-score median||Z-score maximum|
In this experiment, the data instances are grouped by the subject age (into buckets) with the genuine and impostor score distributions computed within the groups. Table VI shows the descriptive statistics of those score distributions.
In this experiment, the data instances are grouped by the finger (index, middle, ring) or hand (left, right) with the genuine and impostor score distributions computed within the groups. Tables VII and VIII show the descriptive statistics of those score distributions.
In order to check whether the small differences reported in previous tables are statistically significant, the standard score (-score) is computed as shown in equation 1 (absolute value of the means is used, since only the magnitude of the difference is interesting in this case). Using -score is possible, since all-against-all comparisons were conducted and as such, the whole population of the comparison scores for each database/algorithm is known.
This computation is done for all the relevant pairs of score distributions for all the experiments. In other words, for each database and recognition algorithm, all permutations of the genuine and impostor distribution pairs are considered within the respective metadata attribute. Table V shows the medians and maximums of the computed -scores. Overall, the
-scores (medians) are very low and those of the outliers (maxima) are relatively low. In other words, statistically significant differences are not present for any of the score distribution pairs within their respective experiments (database, algorithm, and metadata attribute). The biometric performance evaluations (in verification and closed-set identification modes) have also not revealed any statistically significant differences.
As shown in the evaluation in section III, statistically significant biases in score distributions w.r.t. the sex and age of the data subjects, as well as the chosen finger/hand have not been detected for the five fingervein recognition algorithms tested on the four datasets. Accordingly, no impact on the biometric performance in neither the biometric verification nor biometric identification mode has been discovered. The results thus indicate that various fundamentally different classes of fingervein recognition algorithms might be suitable for application irrespective of the tested meta-parameters. This also points to a potential advantage in certain application scenarios of the vascular characteristics in comparison to others (e.g. face, see section I), for which potential biases have been reported in the literature. An obvious limitation of this work is the size of the used fingervein datasets – unfortunately, no larger ones (with metadata present) are currently publicly available. Hence, an important avenue of future research in this area would be the acquisition of a more sizeable dataset and a validation of the scalability of those preliminary results.
This research work has been funded by the German Federal Ministry of Education and Research and the Hessen State Ministry for Higher Education, Research and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE, the LOEWE-3 BioBiDa Project (594/18-17), the FFG KIRAS project AUTFingerATM under grant No. 864785, and the FWF project ”Advanced Methods and Applications for Fingervein Recognition” under grant No. P 32201-NBL.
-  (2018-01) Gender shades: intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pp. 77–91. Cited by: §I.
-  (2009) Finger vein extraction using gradient normalization and principal curvature. In Image Processing: Machine Vision Applications II, Vol. 7251. Cited by: item 1.
-  (2018-09) . In Proceedings of the European Conference on Computer Vision (ECCV), pp. 1–13. Cited by: §I.
-  (2016-09) Algorithm and blues. Nature, pp. 1–1. Cited by: §I.
-  (2018-02) Handbook on European non-discrimination law. Publications Office of the European Union. Cited by: §I.
Finger vein recognition using log gabor filter and local derivative pattern.
International Journal of Signal Processing, Image Processing and Pattern Recognition (IJSIP)9 (12), pp. 231–242. Cited by: item 3, §II-C.
-  (2016-10) The perpetual line-up: unregulated police face recognition in America. Georgetown Law, Center on Privacy & Technology. Cited by: §I.
-  (2019-12) Ongoing face recognition vendor test (FRVT) part 3: demographic effects. Technical report Technical Report NISTIR 8280, NIST, National Institute of Standards and Technology. Cited by: §I.
Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems (NIPS), pp. 3315–3323. Cited by: §I.
-  (2018-10) Focussing the beam - a new laser illumination based data set providing insights to finger-vein recognition. In International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–9. Cited by: §I, TABLE I.
-  (September 10 - 12) Pre-processing cascades and fusion in finger vein recognition. In International Conference of the Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany. Cited by: item 2, §II-C.
-  (2017-12) Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems (NIPS), pp. 656–666. Cited by: §I.
-  (2012-10) Face recognition performance: role of demographic information. Transactions on Information Forensics and Security (TIFS) 7 (6), pp. 1789–1801. Cited by: §I.
-  (1999) Object recognition from local scale-invariant features. In International Conference on Computer Vision (CVPR), Vol. 2, pp. 1150 – 1157. Cited by: item 2.
-  (2013-12) An available database for the research of finger vein recognition. In International Congress on Image and Signal Processing (CISP), Vol. 1, pp. 410–415. Cited by: TABLE I.
-  (2013) Robust finger vein ROI localization based on flexible segmentation. Sensors 13 (11), pp. 14339–14366. Cited by: §II-C.
-  (2004) Feature extraction of finger-vein patterns based on repeated line tracking and its application to personal identification. Machine Vision and Applications 15 (4), pp. 194–203. Cited by: §II-C.
-  (2007) Extraction of finger-vein patterns using maximum curvature points in image profiles. Transactions on information and systems 90 (8), pp. 1185–1194. Cited by: item 1.
-  (1996-01) A comparative study of texture measures with classification based on featured distributions. Pattern Recognition 29 (1), pp. 51–59. Cited by: item 3, §II-C.
-  (2011-01) An other-race effect for face recognition algorithms. Transactions on Applied Perception (TAP) 8 (2), pp. 14:1–14:11. Cited by: §I.
-  (2019-06) Some research problems in biometrics: the future beckons. In International Conference on Biometrics (ICB), pp. 1–8. Cited by: §I.
-  (2018-07) Inclusivefacenet: improving face attribute detection with race and gender diversity. In Workshop on Fairness, Accountability, and Transparency in Machine Learning, pp. 1–6. Cited by: §I.
-  (2018) PROTECT multimodal db: a multimodal biometrics dataset envisaging border control. In International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–8. Cited by: TABLE I.
-  (2013-06) A high quality finger vascular pattern dataset collected using a custom designed capturing device. In International Conference on Biometrics (ICB), pp. 1–5. Cited by: TABLE I.
-  (2014-10) Cross-database evaluation with an open finger vein sensor. In Workshop on Biometric Measurements and Systems for Security and Medical Applications (BioMS), pp. 30–35. Cited by: TABLE I.
-  (2019-03) Finger vein identification using convolutional neural network and supervised discrete hashing. Pattern Recognition Letters 119, pp. 148–156. Cited by: item 4.
-  (2009) Finger-vein image enhancement based on combination of gray-level grouping and circular Gabor filter. In International Conference on Information Engineering and Computer Science, pp. 1–4. Cited by: §II-C.
-  (2009) A new approach to hand vein image enhancement. In International Conference on Intelligent Computation Technology and Automation, Vol. 1, pp. 499–501. Cited by: §II-C.
-  (1994) Contrast limited adaptive histogram equalization. In Graphics Gems IV, pp. 474–485. Cited by: §II-C.