Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings

08/30/2021
by   Mazda Moayeri, et al.
0

Adversarial robustness of deep models is pivotal in ensuring safe deployment in real world settings, but most modern defenses have narrow scope and expensive costs. In this paper, we propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models, based on a linear model operating on the embeddings from a pre-trained self-supervised encoder. We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility, enabling it to encapsulate many threat models at once. We call our method SimCat since it uses SimCLR encoder to catch and categorize various types of adversarial attacks, including L_p and non-L_p evasion attacks, as well as data poisonings. The simple nature of a linear classifier makes our method efficient in both time and sample complexity. For example, on SVHN, using only five pairs of clean and adversarial examples computed with a PGD-L_inf attack, SimCat's detection accuracy is over 85 from each threat model, SimCat can classify eight different attack types such as PGD-L_2, PGD-L_inf, CW-L_2, PPGD, LPA, StAdv, ReColor, and JPEG-L_inf, with over 40 poisoning attacks, such as BP, CP, FC, CLBD, HTBD, halving the success rate while using only twenty total poisons for training. We find that the detectors generalize well to unseen threat models. Lastly, we investigate the performance of our detection method under adaptive attacks and further boost its robustness against such attacks via adversarial training.

READ FULL TEXT
research
06/22/2020

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

We present adversarial attacks and defenses for the perceptual adversari...
research
06/08/2020

A Self-supervised Approach for Adversarial Robustness

Adversarial examples can cause catastrophic mistakes in Deep Neural Netw...
research
07/15/2021

Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models

This paper improves the robustness of the pretrained language model BERT...
research
04/08/2022

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

Adversarial attacks pose a severe security threat to the state-of-the-ar...
research
01/31/2023

Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks

Monocular Depth Estimation (MDE) is a critical component in applications...
research
05/08/2021

Self-Supervised Adversarial Example Detection by Disentangled Representation

Deep learning models are known to be vulnerable to adversarial examples ...
research
02/14/2021

Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

Automatic speaker verification (ASV) is one of the core technologies in ...

Please sign up or login with your details

Forgot password? Click here to reset