PUMA: Performance Unchanged Model Augmentation for Training Data Removal

03/02/2022
by   Ga Wu, et al.
0

Preserving the performance of a trained model while removing unique characteristics of marked training data points is challenging. Recent research usually suggests retraining a model from scratch with remaining training data or refining the model by reverting the model optimization on the marked data points. Unfortunately, aside from their computational inefficiency, those approaches inevitably hurt the resulting model's generalization ability since they remove not only unique characteristics but also discard shared (and possibly contributive) information. To address the performance degradation problem, this paper presents a novel approach called Performance Unchanged Model Augmentation (PUMA). The proposed PUMA framework explicitly models the influence of each training data point on the model's generalization ability with respect to various performance criteria. It then complements the negative impact of removing marked data by reweighting the remaining data optimally. To demonstrate the effectiveness of the PUMA framework, we compared it with multiple state-of-the-art data removal techniques in the experiments, where we show the PUMA can effectively and efficiently remove the unique characteristics of marked training data without retraining the model that can 1) fool a membership attack, and 2) resist performance degradation. In addition, as PUMA estimates the data importance during its operation, we show it could serve to debug mislabelled data points more efficiently than existing approaches.

READ FULL TEXT

page 9

page 12

research
10/28/2022

On the Vulnerability of Data Points under Multiple Membership Inference Attacks and Target Models

Membership Inference Attacks (MIAs) infer whether a data point is in the...
research
08/28/2023

Task-Aware Machine Unlearning and Its Application in Load Forecasting

Data privacy and security have become a non-negligible factor in load fo...
research
07/13/2021

DIVINE: Diverse Influential Training Points for Data Visualization and Model Refinement

As the complexity of machine learning (ML) models increases, resulting i...
research
09/02/2022

An Introduction to Machine Unlearning

Removing the influence of a specified subset of training data from a mac...
research
10/04/2022

Certified Data Removal in Sum-Product Networks

Data protection regulations like the GDPR or the California Consumer Pri...
research
08/02/2020

Removing Backdoor-Based Watermarks in Neural Networks with Limited Data

Deep neural networks have been widely applied and achieved great success...
research
08/14/2023

Machine Unlearning: Solutions and Challenges

Machine learning models may inadvertently memorize sensitive, unauthoriz...

Please sign up or login with your details

Forgot password? Click here to reset