AI Model Disgorgement: Methods and Choices

04/07/2023
by   Alessandro Achille, et al.
0

Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement – the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.

READ FULL TEXT
research
08/18/2023

Attesting Distributional Properties of Training Data for Machine Learning

The success of machine learning (ML) has been accompanied by increased c...
research
11/15/2017

Maintaining The Humanity of Our Models

Artificial intelligence and machine learning have been major research in...
research
04/19/2022

System Analysis for Responsible Design of Modern AI/ML Systems

The irresponsible use of ML algorithms in practical settings has receive...
research
10/05/2022

Addressing contingency in algorithmic misinformation detection: Toward a responsible innovation agenda

Machine learning (ML) enabled classification models are becoming increas...
research
05/13/2021

An Interpretable Graph-based Mapping of Trustworthy Machine Learning Research

There is an increasing interest in ensuring machine learning (ML) framew...
research
10/09/2019

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data

A fancy learning algorithm A outperforms a baseline method B when they a...

Please sign up or login with your details

Forgot password? Click here to reset