Testing Robustness Against Unforeseen Adversaries

by   Daniel Kang, et al.
berkeley college
Stanford University

Considerable work on adversarial defense has studied robustness to a fixed, known family of adversarial distortions, most frequently L_p-bounded distortions. In reality, the specific form of attack will rarely be known and adversaries are free to employ distortions outside of any fixed set. The present work advocates measuring robustness against this much broader range of unforeseen attacks---attacks whose precise form is not known when designing a defense. We propose a methodology for evaluating a defense against a diverse range of distortion types together with a summary metric UAR that measures the Unforeseen Attack Robustness against a distortion. We construct novel JPEG, Fog, Gabor, and Snow adversarial attacks to simulate unforeseen adversaries and perform a careful study of adversarial robustness against these and existing distortion types. We find that evaluation against existing L_p attacks yields highly correlated information that may not generalize to other attacks and identify a set of 4 attacks that yields more diverse information. We further find that adversarial training against either one or multiple distortions, including our novel ones, does not confer robustness to unforeseen distortions. These results underscore the need to study robustness against unforeseen distortions and provide a starting point for doing so.


page 9

page 14

page 15

page 17

page 19

page 20

page 21

page 22


Defending Against Multiple and Unforeseen Adversarial Videos

Adversarial examples of deep neural networks have been actively investig...

Reducing Exploitability with Population Based Training

Self-play reinforcement learning has achieved state-of-the-art, and ofte...

A Closer Look at the Adversarial Robustness of Information Bottleneck Models

We study the adversarial robustness of information bottleneck models for...

Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks

Adversarial training is a popular defense strategy against attack threat...

Evaluations and Methods for Explanation through Robustness Analysis

Among multiple ways of interpreting a machine learning model, measuring ...

Strategies to architect AI Safety: Defense to guard AI from Adversaries

The impact of designing for security of AI is critical for humanity in t...

A LLM Assisted Exploitation of AI-Guardian

Large language models (LLMs) are now highly capable at a diverse range o...

Code Repositories


Code for "Testing Robustness Against Unforeseen Adversaries"

view repo

Please sign up or login with your details

Forgot password? Click here to reset