Effective and Efficient Data Poisoning in Semi-Supervised Learning

12/14/2020
by   Adriano Franci, et al.
0

Semi-Supervised Learning (SSL) aims to maximize the benefits of learning from a limited amount of labelled data together with a vast amount of unlabelled data. Because they rely on the known labels to infer the unknown labels, SSL algorithms are sensitive to data quality. This makes it important to study the potential threats related to the labelled data, more specifically, label poisoning. However, data poisoning of SSL remains largely understudied. To fill this gap, we propose a novel data poisoning method which is both effective and efficient. Our method exploits mathematical properties of SSL to approximate the influence of labelled inputs onto unlabelled one, which allows the identification of the inputs that, if poisoned, would produce the highest number of incorrectly inferred labels. We evaluate our approach on three classification problems under 12 different experimental settings each. Compared to the state of the art, our influence-based attack produces an average increase of error rate 3 times higher, while being faster by multiple orders of magnitude. Moreover, our method can inform engineers of inputs that deserve investigation (relabelling them) before training the learning model. We show that relabelling one-third of the poisoned inputs (selected based on their influence) reduces the poisoning effect by 50

READ FULL TEXT
research
10/03/2016

Semi-supervised Learning with Sparse Autoencoders in Phone Classification

We propose the application of a semi-supervised learning method to impro...
research
11/08/2021

Can semi-supervised learning reduce the amount of manual labelling required for effective radio galaxy morphology classification?

In this work, we examine the robustness of state-of-the-art semi-supervi...
research
06/16/2021

A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

Unlabelled data appear in many domains and are particularly relevant to ...
research
11/27/2022

Impact of Labelled Set Selection and Supervision Policies on Semi-supervised Learning

In semi-supervised representation learning frameworks, when the number o...
research
07/23/2019

GraphX^NET- Chest X-Ray Classification Under Extreme Minimal Supervision

The task of classifying X-ray data is a problem of both theoretical and ...
research
05/22/2018

Semi-supervised learning: When and why it works

Semi-supervised learning deals with the problem of how, if possible, to ...
research
05/27/2018

Adversarial Constraint Learning for Structured Prediction

Constraint-based learning reduces the burden of collecting labels by hav...

Please sign up or login with your details

Forgot password? Click here to reset