Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models

05/21/2022
by   Shawn Shan, et al.
0

Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker "white-box" access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this seems like an inevitable disaster looming on the horizon. In this paper, we consider the problem of post-breach recovery for DNN models. We propose Neo, a new system that creates new versions of leaked models, alongside an inference time filter that detects and removes adversarial examples generated on previously leaked models. The classification surfaces of different model versions are slightly offset (by introducing hidden distributions), and Neo detects the overfitting of attacks to the leaked model used in its generation. We show that across a variety of tasks and attack methods, Neo is able to filter out attacks from leaked models with very high accuracy, and provides strong protection (7–10 recoveries) against attackers who repeatedly breach the server. Neo performs well against a variety of strong adaptive attacks, dropping slightly in # of breaches recoverable, and demonstrates potential as a complement to DNN defenses in the wild.

READ FULL TEXT

page 5

page 7

page 8

page 10

page 12

page 13

page 14

page 16

research
06/24/2020

Blacklight: Defending Black-Box Adversarial Attacks on Deep Neural Networks

The vulnerability of deep neural networks (DNNs) to adversarial examples...
research
04/04/2017

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Although deep neural networks (DNNs) have achieved great success in many...
research
11/22/2021

Medical Aegis: Robust adversarial protectors for medical images

Deep neural network based medical image systems are vulnerable to advers...
research
12/04/2020

Practical No-box Adversarial Attacks against DNNs

The study of adversarial vulnerabilities of deep neural networks (DNNs) ...
research
10/14/2019

Man-in-the-Middle Attacks against Machine Learning Classifiers via Malicious Generative Models

Deep Neural Networks (DNNs) are vulnerable to deliberately crafted adver...
research
02/03/2021

IWA: Integrated Gradient based White-box Attacks for Fooling Deep Neural Networks

The widespread application of deep neural network (DNN) techniques is be...
research
10/27/2022

DICTION: DynamIC robusT whIte bOx watermarkiNg scheme

Deep neural network (DNN) watermarking is a suitable method for protecti...

Please sign up or login with your details

Forgot password? Click here to reset