I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

06/13/2023
by   Raz Lapid, et al.
0

Modern image-to-text systems typically adopt the encoder-decoder framework, which comprises two main components: an image encoder, responsible for extracting image features, and a transformer-based decoder, used for generating captions. Taking inspiration from the analysis of neural networks' robustness against adversarial perturbations, we propose a novel gray-box algorithm for creating adversarial examples in image-to-text models. Unlike image classification tasks that have a finite set of class labels, finding visually similar adversarial examples in an image-to-text task poses greater challenges because the captioning system allows for a virtually infinite space of possible captions. In this paper, we present a gray-box adversarial attack on image-to-text, both untargeted and targeted. We formulate the process of discovering adversarial perturbations as an optimization problem that uses only the image-encoder component, meaning the proposed attack is language-model agnostic. Through experiments conducted on the ViT-GPT2 model, which is the most-used image-to-text model in Hugging Face, and the Flickr30k dataset, we demonstrate that our proposed attack successfully generates visually similar adversarial examples, both with untargeted and targeted captions. Notably, our attack operates in a gray-box manner, requiring no knowledge about the decoder module. We also show that our attacks fool the popular open-source platform Hugging Face.

READ FULL TEXT

page 1

page 6

page 7

page 9

research
12/06/2017

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

Modern neural image captioning systems typically adopt the encoder-decod...
research
07/07/2021

Controlled Caption Generation for Images Through Adversarial Attacks

Deep learning is found to be vulnerable to adversarial examples. However...
research
09/28/2020

STRATA: Building Robustness with a Simple Method for Generating Black-box Adversarial Attacks for Models of Code

Adversarial examples are imperceptible perturbations in the input to a n...
research
06/20/2023

Reversible Adversarial Examples with Beam Search Attack and Grayscale Invariance

Reversible adversarial examples (RAE) combine adversarial attacks and re...
research
10/14/2021

Adversarial examples by perturbing high-level features in intermediate decoder layers

We propose a novel method for creating adversarial examples. Instead of ...
research
09/18/2023

Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization

Deep neural networks (DNNs) have achieved state-of-the-art performance o...
research
12/01/2018

Discrete Attacks and Submodular Optimization with Applications to Text Classification

Adversarial examples are carefully constructed modifications to an input...

Please sign up or login with your details

Forgot password? Click here to reset