Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data

In the past few years, Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems. Many companies employ resources and money to generate these models and provide them as an API, therefore it is in their best interest to protect them, i.e., to avoid that someone else copies them. Recent studies revealed that state-of-the-art CNNs are vulnerable to adversarial examples attacks, and this weakness indicates that CNNs do not need to operate in the problem domain (PD). Therefore, we hypothesize that they also do not need to be trained with examples of the PD in order to operate in it. Given these facts, in this paper, we investigate if a target black-box CNN can be copied by persuading it to confess its knowledge through random non-labeled data. The copy is two-fold: i) the target network is queried with random data and its predictions are used to create a fake dataset with the knowledge of the network; and ii) a copycat network is trained with the fake dataset and should be able to achieve similar performance as the target network. This hypothesis was evaluated locally in three problems (facial expression, object, and crosswalk classification) and against a cloud-based API. In the copy attacks, images from both non-problem domain and PD were used. All copycat networks achieved at least 93.7 non-problem domain data, and at least 98.6 Additionally, the copycat CNN successfully copied at least 97.3 performance of the Microsoft Azure Emotion API. Our results show that it is possible to create a copycat CNN by simply querying a target network as black-box with random non-labeled data.

READ FULL TEXT
research
01/21/2021

Copycat CNN: Are Random Non-Labeled Data Enough to Steal Knowledge from Black-box Models?

Convolutional neural networks have been successful lately enabling compa...
research
07/27/2020

RANDOM MASK: Towards Robust Convolutional Neural Networks

Robustness of neural networks has recently been highlighted by the adver...
research
05/30/2018

AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks

Recent studies have shown that adversarial examples in state-of-the-art ...
research
07/21/2022

Knowledge-enhanced Black-box Attacks for Recommendations

Recent studies have shown that deep neural networks-based recommender sy...
research
06/28/2020

Best-Effort Adversarial Approximation of Black-Box Malware Classifiers

An adversary who aims to steal a black-box model repeatedly queries the ...
research
08/21/2018

Controlling Over-generalization and its Effect on Adversarial Examples Generation and Detection

Convolutional Neural Networks (CNNs) allowed improving the state-of-the-...
research
11/06/2017

Adversarial Frontier Stitching for Remote Neural Network Watermarking

The state of the art performance of deep learning models comes at a high...

Please sign up or login with your details

Forgot password? Click here to reset