Dataset Distillation Fixes Dataset Reconstruction Attacks

02/02/2023
by   Noel Loo, et al.
0

Modern deep learning requires large volumes of data, which could contain sensitive or private information which cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effectiveness regime, and ways to defend against it. In this work, we first build a stronger version of the dataset reconstruction attack and show how it can provably recover its entire training set in the infinite width regime. We then empirically study the characteristics of this attack on two-layer networks and reveal that its success heavily depends on deviations from the frozen infinite-width Neural Tangent Kernel limit. More importantly, we formally show for the first time that dataset reconstruction attacks are a variation of dataset distillation. This key theoretical result on the unification of dataset reconstruction and distillation not only sheds more light on the characteristics of the attack but enables us to design defense mechanisms against them via distillation algorithms.

READ FULL TEXT

page 30

page 31

page 32

page 33

page 38

page 39

page 40

page 41

research
07/14/2016

Defensive Distillation is Not Robust to Adversarial Examples

We show that defensive distillation is not secure: it is no more resista...
research
10/21/2022

Efficient Dataset Distillation Using Random Feature Approximation

Dataset distillation compresses large datasets into smaller synthetic co...
research
08/18/2023

Poison Dart Frog: A Clean-Label Attack with Low Poisoning Rate and High Attack Success Rate in the Absence of Training Data

To successfully launch backdoor attacks, injected data needs to be corre...
research
11/11/2019

Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Deep neural networks are highly expressive machine learning models with ...
research
11/06/2022

Confidence-Ranked Reconstruction of Census Microdata from Published Statistics

A reconstruction attack on a private dataset D takes as input some publi...
research
05/05/2023

A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness

The aim of dataset distillation is to encode the rich features of an ori...
research
06/26/2020

Database Reconstruction from Noisy Volumes: A Cache Side-Channel Attack on SQLite

We demonstrate the feasibility of database reconstruction under a cache ...

Please sign up or login with your details

Forgot password? Click here to reset