Demystifying Self-supervised Trojan Attacks

10/13/2022
by   Changjiang Li, et al.
0

As an emerging machine learning paradigm, self-supervised learning (SSL) is able to learn high-quality representations for complex data without data labels. Prior work shows that, besides obviating the reliance on labeling, SSL also benefits adversarial robustness by making it more challenging for the adversary to manipulate model prediction. However, whether this robustness benefit generalizes to other types of attacks remains an open question. We explore this question in the context of trojan attacks by showing that SSL is comparably vulnerable as supervised learning to trojan attacks. Specifically, we design and evaluate CTRL, an extremely simple self-supervised trojan attack. By polluting a tiny fraction of training data (less than 1 with indistinguishable poisoning samples, CTRL causes any trigger-embedded input to be misclassified to the adversary's desired class with a high probability (over 99 CTRL, we study the mechanisms underlying self-supervised trojan attacks. With both empirical and analytical evidence, we reveal that the representation invariance property of SSL, which benefits adversarial robustness, may also be the very reason making SSL highly vulnerable to trojan attacks. We further discuss the fundamental challenges to defending against self-supervised trojan attacks, pointing to promising directions for future research.

READ FULL TEXT

page 1

page 4

page 5

page 9

page 10

page 12

page 17

research
11/15/2019

Self-supervised Adversarial Training

Recent work has demonstrated that neural networks are vulnerable to adve...
research
05/21/2021

Backdoor Attacks on Self-Supervised Learning

Large-scale unlabeled data has allowed recent progress in self-supervise...
research
10/02/2022

What shapes the loss landscape of self-supervised learning?

Prevention of complete and dimensional collapse of representations has r...
research
04/05/2021

An Empirical Study of Training Self-Supervised Vision Transformers

This paper does not describe a novel method. Instead, it studies a strai...
research
01/20/2023

Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Recent studies on backdoor attacks in model training have shown that pol...
research
09/18/2023

Realistic Website Fingerprinting By Augmenting Network Trace

Website Fingerprinting (WF) is considered a major threat to the anonymit...
research
02/14/2021

Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

Automatic speaker verification (ASV) is one of the core technologies in ...

Please sign up or login with your details

Forgot password? Click here to reset