Labeling Bias in Galaxy Morphologies

11/08/2018
by   Guillermo Cabrera-Vives, et al.
0

We present a metric to quantify systematic labeling bias in galaxy morphology data sets stemming from the quality of the labeled data. This labeling bias is independent from labeling errors and requires knowledge about the intrinsic properties of the data with respect to the observed properties. We conduct a relative comparison of label bias for different low redshift galaxy morphology data sets. We show our metric is able to recover previous de-biasing procedures based on redshift as biasing parameter. By using the image resolution instead, we find biases that have not been addressed. We find that the morphologies based on supervised machine-learning trained over features such as colors, shape, and concentration show significantly less bias than morphologies based on expert or citizen-science classifiers. This result holds even when there is underlying bias present in the training sets used in the supervised machine learning process. We use catalog simulations to validate our bias metric, and show how to bin the multidimensional intrinsic and observed galaxy properties used in the bias quantification. Our approach is designed to work on any other labeled multidimensional data sets and the code is publicly available.

READ FULL TEXT
research
11/22/2017

No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World

Modern machine learning systems such as image classifiers rely heavily o...
research
12/11/2018

SMART: An Open Source Data Labeling Platform for Supervised Learning

SMART is an open source web application designed to help data scientists...
research
04/14/2022

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Common studies of gender bias in NLP focus either on extrinsic bias meas...
research
03/29/2022

OdontoAI: A human-in-the-loop labeled data set and an online platform to boost research on dental panoramic radiographs

Deep learning has remarkably advanced in the last few years, supported b...
research
11/08/2022

Simulation-Based Parallel Training

Numerical simulations are ubiquitous in science and engineering. Machine...
research
07/13/2020

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Nonparametric and machine learning methods are flexible methods for obta...
research
08/02/2023

Using ScrutinAI for Visual Inspection of DNN Performance in a Medical Use Case

Our Visual Analytics (VA) tool ScrutinAI supports human analysts to inve...

Please sign up or login with your details

Forgot password? Click here to reset