Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems

07/13/2023
by   Eldor Abdukhamidov, et al.
0

Deep learning models are susceptible to adversarial samples in white and black-box environments. Although previous studies have shown high attack success rates, coupling DNN models with interpretation models could offer a sense of security when a human expert is involved, who can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations. In black-box settings, as access to the components of IDLSes is limited, it becomes more challenging for the adversary to fool the system. In this work, we propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model. QuScore is based on transfer-based and score-based methods by employing an effective microbial genetic algorithm. Our method is designed to reduce the number of queries necessary to carry out successful attacks, resulting in a more efficient process. By continuously refining the adversarial samples created based on feedback scores from the IDLS, our approach effectively navigates the search space to identify perturbations that can fool the system. We evaluate the attack's effectiveness on four CNN models (Inception, ResNet, VGG, DenseNet) and two interpretation models (CAM, Grad), using both ImageNet and CIFAR datasets. Our results show that the proposed approach is query-efficient with a high attack success rate that can reach between 95 rate of 69 adversarial examples with attribution maps that resemble benign samples. We have also demonstrated that our attack is resilient against various preprocessing defense techniques and can easily be transferred to different DNN models.

READ FULL TEXT

page 1

page 3

page 6

page 7

page 8

page 9

page 10

page 11

research
07/21/2023

Unveiling Vulnerabilities in Interpretable Deep Learning Systems with Query-Efficient Black-box Attacks

Deep learning has been rapidly employed in many applications revolutioni...
research
07/12/2023

Single-Class Target-Specific Attack against Interpretable Deep Learning Systems

In this paper, we present a novel Single-class target-specific Adversari...
research
09/13/2018

Query-Efficient Black-Box Attack by Active Learning

Deep neural network (DNN) as a popular machine learning model is found t...
research
06/15/2020

Efficient Black-Box Adversarial Attack Guided by the Distribution of Adversarial Perturbations

This work studied the score-based black-box adversarial attack problem, ...
research
05/09/2021

Automated Decision-based Adversarial Attacks

Deep learning models are vulnerable to adversarial examples, which can f...
research
06/02/2023

Adaptive Attractors: A Defense Strategy against ML Adversarial Collusion Attacks

In the seller-buyer setting on machine learning models, the seller gener...
research
09/04/2023

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

Large language models (LLMs), designed to provide helpful and safe respo...

Please sign up or login with your details

Forgot password? Click here to reset