Introducing a Family of Synthetic Datasets for Research on Bias in Machine Learning

07/19/2021
by   William Blanzeisky, et al.
0

A significant impediment to progress in research on bias in machine learning (ML) is the availability of relevant datasets. This situation is unlikely to change much given the sensitivity of such data. For this reason, there is a role for synthetic data in this research. In this short paper, we present one such family of synthetic data sets. We provide an overview of the data, describe how the level of bias can be varied, and present a simple example of an experiment on the data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2020

Copula-based synthetic data generation for machine learning emulators in weather and climate: application to a simple radiation model

Can we improve machine learning (ML) emulators with synthetic data? The ...
research
12/08/2020

Synthetic Data: Opening the data floodgates to enable faster, more directed development of machine learning methods

Many ground-breaking advancements in machine learning can be attributed ...
research
09/13/2022

Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation

Machine learning applications are becoming increasingly pervasive in our...
research
08/22/2023

A survey on bias in machine learning research

Current research on bias in machine learning often focuses on fairness, ...
research
09/16/2021

Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

Research in machine learning (ML) has primarily argued that models train...
research
10/08/2020

Towards the Detection of Building Occupancy with Synthetic Environmental Data

Information about room-level occupancy is crucial to many building-relat...
research
05/06/2022

Synthetic Data – what, why and how?

This explainer document aims to provide an overview of the current state...

Please sign up or login with your details

Forgot password? Click here to reset