Balanced Face Dataset: Guiding StyleGAN to Generate Labeled Synthetic Face Image Dataset for Underrepresented Group

08/07/2023
by   Kidist Amde Mekonnen, et al.
0

For a machine learning model to generalize effectively to unseen data within a particular problem domain, it is well-understood that the data needs to be of sufficient size and representative of real-world scenarios. Nonetheless, real-world datasets frequently have overrepresented and underrepresented groups. One solution to mitigate bias in machine learning is to leverage a diverse and representative dataset. Training a model on a dataset that covers all demographics is crucial to reducing bias in machine learning. However, collecting and labeling large-scale datasets has been challenging, prompting the use of synthetic data generation and active labeling to decrease the costs of manual labeling. The focus of this study was to generate a robust face image dataset using the StyleGAN model. In order to achieve a balanced distribution of the dataset among different demographic groups, a synthetic dataset was created by controlling the generation process of StyleGaN and annotated for different downstream tasks.

READ FULL TEXT
research
09/15/2023

Toward responsible face datasets: modeling the distribution of a disentangled latent space for sampling face images from demographic groups

Recently, it has been exposed that some modern facial recognition system...
research
05/09/2023

Fashion CUT: Unsupervised domain adaptation for visual pattern classification in clothes using synthetic data and pseudo-labels

Accurate product information is critical for e-commerce stores to allow ...
research
05/10/2023

Analyzing Bias in Diffusion-based Face Generation Models

Diffusion models are becoming increasingly popular in synthetic data gen...
research
05/12/2023

Zero-shot racially balanced dataset generation using an existing biased StyleGAN2

Facial recognition systems have made significant strides thanks to data-...
research
06/29/2023

Learning from Synthetic Human Group Activities

The understanding of complex human interactions and group activities has...
research
11/10/2022

Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy

Harnessing the benefits of drones for urban innovation at scale requires...
research
04/12/2023

Generation of artificial facial drug abuse images using Deep De-identified anonymous Dataset augmentation through Genetics Algorithm (3DG-GA)

In biomedical research and artificial intelligence, access to large, wel...

Please sign up or login with your details

Forgot password? Click here to reset