Poisoning the Unlabeled Dataset of Semi-Supervised Learning

05/04/2021
by   Nicholas Carlini, et al.
15

Semi-supervised machine learning models learn from a (small) set of labeled training examples, and a (large) set of unlabeled training examples. State-of-the-art models can reach within a few percentage points of fully-supervised training, while requiring 100x less labeled data. We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset. In order to be useful, unlabeled datasets are given strictly less review than labeled datasets, and adversaries can therefore poison them easily. By inserting maliciously-crafted unlabeled examples totaling just 0.1 of the dataset size, we can manipulate a model trained on this poisoned dataset to misclassify arbitrary examples at test time (as any desired label). Our attacks are highly effective across datasets and semi-supervised learning methods. We find that more accurate methods (thus more likely to be used) are significantly more vulnerable to poisoning attacks, and as such better training methods are unlikely to prevent this attack. To counter this we explore the space of defenses, and propose two methods that mitigate our attack.

READ FULL TEXT

page 6

page 12

page 13

page 16

research
12/05/2022

Rethinking Backdoor Data Poisoning Attacks in the Context of Semi-Supervised Learning

Semi-supervised learning methods can train high-accuracy machine learnin...
research
01/01/2023

Trojaning semi-supervised learning model via poisoning wild images on the web

Wild images on the web are vulnerable to backdoor (also called trojan) p...
research
11/01/2022

The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning

Semi-supervised machine learning (SSL) is gaining popularity as it reduc...
research
11/28/2019

Lidar-Camera Co-Training for Semi-Supervised Road Detection

Recent advances in the field of machine learning and computer vision hav...
research
03/17/2020

The Value of Nullspace Tuning Using Partial Label Information

In semi-supervised learning, information from unlabeled examples is used...
research
03/23/2020

ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

Optical character recognition (OCR) systems performance have improved si...
research
03/22/2018

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Cryo-electron microscopy (cryoEM) is fast becoming the preferred method ...

Please sign up or login with your details

Forgot password? Click here to reset