Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

05/10/2019
by   Yan Xu, et al.
0

In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. We propose to fool an image captioning system to generate some targeted partial captions for an image polluted by adversarial noises, even the targeted captions are totally irrelevant to the image content. A partial caption indicates that the words at some locations in this caption are observed, while words at other locations are not restricted.It is the first work to study exact adversarial attacks of targeted partial captions. Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables. Both the generalized expectation maximization algorithm and structural SVMs with latent variables are then adopted to optimize the problem. The proposed methods generate very successful at-tacks to three popular CNN+RNN based image captioning models. Furthermore, the proposed attack methods are used to understand the inner mechanism of image captioning systems, providing the guidance to further improve automatic image captioning systems towards human captioning.

READ FULL TEXT

page 1

page 6

page 8

research
12/06/2017

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

Modern neural image captioning systems typically adopt the encoder-decod...
research
07/07/2021

Controlled Caption Generation for Images Through Adversarial Attacks

Deep learning is found to be vulnerable to adversarial examples. However...
research
07/26/2018

Rethinking the Form of Latent States in Image Captioning

RNNs and their variants have been widely adopted for image captioning. I...
research
01/04/2020

Understanding Image Captioning Models beyond Visualizing Attention

This paper explains predictions of image captioning models with attentio...
research
06/11/2019

Mimic and Fool: A Task Agnostic Adversarial Attack

At present, adversarial attacks are designed in a task-specific fashion....
research
12/24/2020

SubICap: Towards Subword-informed Image Captioning

Existing Image Captioning (IC) systems model words as atomic units in ca...
research
10/31/2019

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Most RNN-based image captioning models receive supervision on the output...

Please sign up or login with your details

Forgot password? Click here to reset