High-Fidelity Extraction of Neural Network Models

09/03/2019
by   Matthew Jagielski, et al.
10

Model extraction allows an adversary to steal a copy of a remotely deployed machine learning model given access to its predictions. Adversaries are motivated to mount such attacks for a variety of reasons, ranging from reducing their computational costs, to eliminating the need to collect expensive training data, to obtaining a copy of a model in order to find adversarial examples, perform membership inference, or model inversion attacks. In this paper, we taxonomize the space of model extraction attacks around two objectives: accuracy, i.e., performing well on the underlying learning task, and fidelity, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we develop a learning-based attack which exploits the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model---i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical functionally-equivalent extraction attack for direct extraction (i.e., without training) of a model's weights. We perform experiments both on academic datasets and a state-of-the-art image classifier trained with 1 billion proprietary images. In addition to broadening the scope of model extraction research, our work demonstrates the practicality of model extraction attacks against production-grade systems.

READ FULL TEXT

page 1

page 8

research
07/21/2022

Careful What You Wish For: on the Extraction of Adversarially Trained Models

Recent attacks on Machine Learning (ML) models such as evasion attacks w...
research
01/06/2021

Model Extraction and Defenses on Generative Adversarial Networks

Model extraction attacks aim to duplicate a machine learning model throu...
research
02/16/2023

Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data

We study black-box model stealing attacks where the attacker can query a...
research
08/03/2021

DeepFreeze: Cold Boot Attacks and High Fidelity Model Recovery on Commercial EdgeML Device

EdgeML accelerators like Intel Neural Compute Stick 2 (NCS) can enable e...
research
02/27/2020

Entangled Watermarks as a Defense against Model Extraction

Machine learning involves expensive data collection and training procedu...
research
01/13/2022

Reconstructing Training Data with Informed Adversaries

Given access to a machine learning model, can an adversary reconstruct t...
research
01/23/2022

Increasing the Cost of Model Extraction with Calibrated Proof of Work

In model extraction attacks, adversaries can steal a machine learning mo...

Please sign up or login with your details

Forgot password? Click here to reset