Have you forgotten? A method to assess if machine learning models have forgotten data

04/21/2020
by   Xiao Liu, et al.
63

In the era of deep learning, aggregation of data from several sources is considered as a common approach to ensuring data diversity. Let us consider a scenario where several providers contribute data to a consortium for the joint development of a classification model (hereafter the target model), but, now one of the providers decides to leave. The provider requests that their data (hereafter the query dataset) be removed from the databases but also that the model `forgets' their data. In this paper, for the first time, we want to address the challenging question of whether data have been forgotten by a model. We assume knowledge of the query dataset and the distribution of a model's output activations. We establish statistical methods that compare the outputs of the target with outputs of models trained with different datasets. We evaluate our approach on several benchmark datasets (MNIST, CIFAR-10 and SVHN) and on a cardiac pathology diagnosis task using data from the Automated Cardiac Diagnosis Challenge (ACDC). We hope to encourage investigations on what information a model retains and inspire extensions in more complex settings.

READ FULL TEXT
research
10/23/2018

End-to-End Diagnosis and Segmentation Learning from Cardiac Magnetic Resonance Imaging

Cardiac magnetic resonance (CMR) is used extensively in the diagnosis an...
research
05/03/2018

Siamese networks for generating adversarial examples

Machine learning models are vulnerable to adversarial examples. An adver...
research
09/19/2018

Deep-learning models improve on community-level diagnosis for common congenital heart disease lesions

Prenatal diagnosis of tetralogy of Fallot (TOF) and hypoplastic left hea...
research
09/03/2019

Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification

While deep learning models become more widespread, their ability to hand...
research
12/06/2022

Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need

Deep learning technologies have brought us many models that outperform h...
research
06/08/2022

Network Report: A Structured Description for Network Datasets

The rapid development of network science and technologies depends on sha...
research
05/13/2021

DeepObliviate: A Powerful Charm for Erasing Data Residual Memory in Deep Neural Networks

Machine unlearning has great significance in guaranteeing model security...

Please sign up or login with your details

Forgot password? Click here to reset