Universal Hard-label Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

11/09/2018
by   Thomas A. Hogan, et al.
0

We study the problem of finding a universal (image-agnostic) perturbation to fool machine learning (ML) classifiers (e.g., neural nets, decision tress) in the hard-label black-box setting. Recent work in adversarial ML in the white-box setting (model parameters are known) has shown that many state-of-the-art image classifiers are vulnerable to universal adversarial perturbations: a fixed human-imperceptible perturbation that, when added to any image, causes it to be misclassified with high probability Kurakin et al. [2016], Szegedy et al. [2013], Chen et al. [2017a], Carlini and Wagner [2017]. This paper considers a more practical and challenging problem of finding such universal perturbations in an obscure (or black-box) setting. More specifically, we use zeroth order optimization algorithms to find such a universal adversarial perturbation when no model information is revealed-except that the attacker can make queries to probe the classifier. We further relax the assumption that the output of a query is continuous valued confidence scores for all the classes and consider the case where the output is a hard-label decision. Surprisingly, we found that even in these extremely obscure regimes, state-of-the-art ML classifiers can be fooled with a very high probability just by adding a single human-imperceptible image perturbation to any natural image. The surprising existence of universal perturbations in a hard-label black-box setting raises serious security concerns with the existence of a universal noise vector that adversaries can possibly exploit to break a classifier on most natural images.

READ FULL TEXT
research
11/09/2018

Universal Decision-Based Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

We study the problem of finding a universal (image-agnostic) perturbatio...
research
10/26/2016

Universal adversarial perturbations

Given a state-of-the-art deep neural network classifier, we show the exi...
research
08/16/2020

TextDecepter: Hard Label Black Box Attack on Text Classifiers

Machine learning has been proven to be susceptible to carefully crafted ...
research
06/30/2020

Black-box Certification and Learning under Adversarial Perturbations

We formally study the problem of classification under adversarial pertur...
research
11/04/2019

Fast-UAP: Algorithm for Speeding up Universal Adversarial Perturbation Generation with Orientation of Perturbation Vectors

Convolutional neural networks (CNN) have become one of the most popular ...
research
11/21/2017

The Manifold Assumption and Defenses Against Adversarial Perturbations

In the adversarial-perturbation problem of neural networks, an adversary...
research
09/23/2022

Quantile-constrained Wasserstein projections for robust interpretability of numerical and machine learning models

Robustness studies of black-box models is recognized as a necessary task...

Please sign up or login with your details

Forgot password? Click here to reset