Striving for data-model efficiency: Identifying data externalities on group performance

11/11/2022
by   Esther Rolf, et al.
0

Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. In this work, we seek to better understand how we might characterize, detect, and design for data-model synergies. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population, a phenomenon we refer to as negative data externalities on group performance. Such externalities can arise in standard learning settings and can manifest differently depending on conditions between training set size and model size. Data externalities directly imply a lower bound on feasible model improvements, yet improving models efficiently requires understanding the underlying data-model tensions. From a broader perspective, our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks

Early backdoor attacks against machine learning set off an arms race in ...
research
03/28/2019

Using Latent Class Analysis to Identify ARDS Sub-phenotypes for Enhanced Machine Learning Predictive Performance

In this work, we utilize Machine Learning for early recognition of patie...
research
12/11/2020

Data Appraisal Without Data Sharing

One of the most effective approaches to improving the performance of a m...
research
08/28/2023

Machine Unlearning Methodology base on Stochastic Teacher Network

The rise of the phenomenon of the "right to be forgotten" has prompted r...
research
02/22/2022

Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

In addition to reproducing discriminatory relationships in the training ...
research
05/07/2022

Quantifying and Extrapolating Data Needs in Radio Frequency Machine Learning

Understanding the relationship between training data and a model's perfo...
research
05/15/2023

Algorithmic Censoring in Dynamic Learning Systems

Dynamic learning systems subject to selective labeling exhibit censoring...

Please sign up or login with your details

Forgot password? Click here to reset