CUDA: Convolution-based Unlearnable Datasets

03/07/2023
by   Vinu Sankar Sadasivan, et al.
0

Large-scale training of modern deep learning models heavily relies on publicly available data on the web. This potentially unauthorized usage of online data leads to concerns regarding data privacy. Recent works aim to make unlearnable data for deep learning models by adding small, specially designed noises to tackle this issue. However, these methods are vulnerable to adversarial training (AT) and/or are computationally heavy. In this work, we propose a novel, model-free, Convolution-based Unlearnable DAtaset (CUDA) generation technique. CUDA is generated using controlled class-wise convolutions with filters that are randomly generated via a private key. CUDA encourages the network to learn the relation between filters and labels rather than informative features for classifying the clean data. We develop some theoretical analysis demonstrating that CUDA can successfully poison Gaussian mixture data by reducing the clean data performance of the optimal Bayes classifier. We also empirically demonstrate the effectiveness of CUDA with various datasets (CIFAR-10, CIFAR-100, ImageNet-100, and Tiny-ImageNet), and architectures (ResNet-18, VGG-16, Wide ResNet-34-10, DenseNet-121, DeIT, EfficientNetV2-S, and MobileNetV2). Our experiments show that CUDA is robust to various data augmentations and training approaches such as smoothing, AT with different budgets, transfer learning, and fine-tuning. For instance, training a ResNet-18 on ImageNet-100 CUDA achieves only 8.96%, 40.08%, and 20.58% clean test accuracies with empirical risk minimization (ERM), L_∞ AT, and L_2 AT, respectively. Here, ERM on the clean training data achieves a clean test accuracy of 80.66%. CUDA exhibits unlearnability effect with ERM even when only a fraction of the training dataset is perturbed. Furthermore, we also show that CUDA is robust to adaptive defenses designed specifically to break it.

READ FULL TEXT

page 2

page 5

page 15

page 16

page 17

research
03/27/2023

Learning the Unlearnable: Adversarial Augmentations Suppress Unlearnable Example Attacks

Unlearnable example attacks are data poisoning techniques that can be us...
research
11/23/2020

Learnable Boundary Guided Adversarial Training

Previous adversarial training raises model robustness under the compromi...
research
09/03/2021

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

Since training a large-scale backdoored model from scratch requires a la...
research
05/24/2022

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to...
research
05/24/2022

One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks

Unlearnable examples (ULEs) aim to protect data from unauthorized usage ...
research
06/10/2019

Intriguing properties of adversarial training

Adversarial training is one of the main defenses against adversarial att...
research
03/28/2022

Core Risk Minimization using Salient ImageNet

Deep neural networks can be unreliable in the real world especially when...

Please sign up or login with your details

Forgot password? Click here to reset