Learning to Limit Data Collection via Scaling Laws: Data Minimization Compliance in Practice

07/16/2021
by   Divya Shanmugam, et al.
0

Data minimization is a legal obligation defined in the European Union's General Data Protection Regulation (GDPR) as the responsibility to process an adequate, relevant, and limited amount of personal data in relation to a processing purpose. However, unlike fairness or transparency, the principle has not seen wide adoption for machine learning systems due to a lack of computational interpretation. In this paper, we build on literature in machine learning and law to propose the first learning framework for limiting data collection based on an interpretation that ties the data collection purpose to system performance. We formalize a data minimization criterion based on performance curve derivatives and provide an effective and interpretable piecewise power law technique that models distinct stages of an algorithm's performance throughout data collection. Results from our empirical investigation offer deeper insights into the relevant considerations when designing a data minimization framework, including the choice of feature acquisition algorithm, initialization conditions, as well as impacts on individuals that hint at tensions between data minimization and fairness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2020

Operationalizing the Legal Principle of Data Minimization for Personalization

Article 5(1)(c) of the European Union's General Data Protection Regulati...
research
03/16/2018

Some HCI Priorities for GDPR-Compliant Machine Learning

In this short paper, we consider the roles of HCI in enabling the better...
research
03/18/2022

Configurable Per-Query Data Minimization for Privacy-Compliant Web APIs

The purpose of regulatory data minimization obligations is to limit pers...
research
11/13/2020

Digital trace data collection through data donation

A potentially powerful method of social-scientific data collection and i...
research
02/05/2021

Applications of Machine Learning in Document Digitisation

Data acquisition forms the primary step in all empirical research. The a...
research
12/19/2018

Preventing Attacks on Anonymous Data Collection

Anonymous data collection systems allow users to contribute the data nec...
research
01/27/2021

Detecting discriminatory risk through data annotation based on Bayesian inferences

Thanks to the increasing growth of computational power and data availabi...

Please sign up or login with your details

Forgot password? Click here to reset