A Method for Curation of Web-Scraped Face Image Datasets

04/07/2020
by   Kai Zhang, et al.
0

Web-scraped, in-the-wild datasets have become the norm in face recognition research. The numbers of subjects and images acquired in web-scraped datasets are usually very large, with number of images on the millions scale. A variety of issues occur when collecting a dataset in-the-wild, including images with the wrong identity label, duplicate images, duplicate subjects and variation in quality. With the number of images being in the millions, a manual cleaning procedure is not feasible. But fully automated methods used to date result in a less-than-ideal level of clean dataset. We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods, with similar quality across men and women, to support comparison of accuracy across gender. Our approach removes near-duplicate images, merges duplicate subjects, corrects mislabeled images, and removes images outside a defined range of pose and quality. We conduct the curation on the Asian Face Dataset (AFD) and VGGFace2 test dataset. The experiments show that a state-of-the-art method achieves a much higher accuracy on the datasets after they are curated. Finally, we release our cleaned versions of both datasets to the research community.

READ FULL TEXT

page 2

page 3

page 4

research
10/23/2017

VGGFace2: A dataset for recognising faces across pose and age

In this paper, we introduce a new large-scale face dataset named VGGFace...
research
03/24/2020

Dataset Cleaning – A Cross Validation Methodology for Large Facial Datasets using Face Recognition

In recent years, large "in the wild" face datasets have been released in...
research
02/14/2023

WSD: Wild Selfie Dataset for Face Recognition in Selfie Images

With the rise of handy smart phones in the recent years, the trend of ca...
research
06/01/2016

A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment

Face analysis techniques have become a crucial component of human-machin...
research
09/11/2023

Our Deep CNN Face Matchers Have Developed Achromatopsia

Modern deep CNN face matchers are trained on datasets containing color i...
research
02/10/2019

Deep learning and face recognition: the state of the art

Deep Neural Networks (DNNs) have established themselves as a dominant te...
research
05/12/2023

Gallery Sampling for Robust and Fast Face Identification

Deep learning methods have been achieved brilliant results in face recog...

Please sign up or login with your details

Forgot password? Click here to reset