Survey: Leakage and Privacy at Inference Time

07/04/2021
by   Marija Jegorova, et al.
0

Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2022

Leakage and the Reproducibility Crisis in ML-based Science

The use of machine learning (ML) methods for prediction and forecasting ...
research
08/10/2021

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

Machine learning (ML) is increasingly being adopted in a wide variety of...
research
10/17/2022

Confound-leakage: Confound Removal in Machine Learning Leads to Leakage

Machine learning (ML) approaches to data analysis are now widely adopted...
research
05/18/2020

An Overview of Privacy in Machine Learning

Over the past few years, providers such as Google, Microsoft, and Amazon...
research
07/04/2023

ProPILE: Probing Privacy Leakage in Large Language Models

The rapid advancement and widespread use of large language models (LLMs)...
research
08/25/2023

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Recently, there has been growing interest in using Large Language Models...
research
11/08/2022

Efficacy of MRI data harmonization in the age of machine learning. A multicenter study across 36 datasets

Pooling publicly-available MRI data from multiple sites allows to assemb...

Please sign up or login with your details

Forgot password? Click here to reset