Can you fool AI with adversarial examples on a visual Turing test?

09/25/2017
by   Xiaojun Xu, et al.
0

Deep learning has achieved impressive results in many areas of Computer Vision and Natural Language Pro- cessing. Among others, Visual Question Answering (VQA), also referred to a visual Turing test, is considered one of the most compelling problems, and recent deep learning models have reported significant progress in vision and language modeling. Although Artificial Intelligence (AI) is getting closer to passing the visual Turing test, at the same time the existence of adversarial examples to deep learning systems may hinder the practical application of such systems. In this work, we conduct the first extensive study on adversarial examples for VQA systems. In particular, we focus on generating targeted adversarial examples for a VQA system while the target is considered to be a question-answer pair. Our evaluation shows that the success rate of whether a targeted adversarial example can be generated is mostly dependent on the choice of the target question-answer pair, and less on the choice of images to which the question refers. We also report the language prior phenomenon of a VQA model, which can explain why targeted adversarial examples are hard to generate for some question-answer targets. We also demonstrate that a compositional VQA architecture is slightly more resilient to adversarial attacks than a non-compositional one. Our study sheds new light on how to build deep vision and language resilient models robust against adversarial examples.

READ FULL TEXT

page 1

page 12

page 13

research
06/04/2021

Human-Adversarial Visual Question Answering

Performance on the most commonly used Visual Question Answering dataset ...
research
07/19/2020

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

Visual Question Answering (VQA) has achieved great success thanks to the...
research
03/21/2018

Attention on Attention: Architectures for Visual Question Answering (VQA)

Visual Question Answering (VQA) is an increasingly popular topic in deep...
research
01/24/2018

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Visual question answering (VQA) is of significant interest due to its po...
research
11/16/2015

Yin and Yang: Balancing and Answering Binary Visual Questions

The complex compositional structure of language makes problems at the in...
research
01/10/2022

COIN: Counterfactual Image Generation for VQA Interpretation

Due to the significant advancement of Natural Language Processing and Co...
research
01/05/2020

The Human Visual System and Adversarial AI

This paper introduces existing research about the Human Visual System in...

Please sign up or login with your details

Forgot password? Click here to reset