Log In Sign Up

Can you fool AI with adversarial examples on a visual Turing test?

by   Xiaojun Xu, et al.

Deep learning has achieved impressive results in many areas of Computer Vision and Natural Language Pro- cessing. Among others, Visual Question Answering (VQA), also referred to a visual Turing test, is considered one of the most compelling problems, and recent deep learning models have reported significant progress in vision and language modeling. Although Artificial Intelligence (AI) is getting closer to passing the visual Turing test, at the same time the existence of adversarial examples to deep learning systems may hinder the practical application of such systems. In this work, we conduct the first extensive study on adversarial examples for VQA systems. In particular, we focus on generating targeted adversarial examples for a VQA system while the target is considered to be a question-answer pair. Our evaluation shows that the success rate of whether a targeted adversarial example can be generated is mostly dependent on the choice of the target question-answer pair, and less on the choice of images to which the question refers. We also report the language prior phenomenon of a VQA model, which can explain why targeted adversarial examples are hard to generate for some question-answer targets. We also demonstrate that a compositional VQA architecture is slightly more resilient to adversarial attacks than a non-compositional one. Our study sheds new light on how to build deep vision and language resilient models robust against adversarial examples.


page 1

page 12

page 13


Human-Adversarial Visual Question Answering

Performance on the most commonly used Visual Question Answering dataset ...

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

Visual Question Answering (VQA) has achieved great success thanks to the...

Attention on Attention: Architectures for Visual Question Answering (VQA)

Visual Question Answering (VQA) is an increasingly popular topic in deep...

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Visual question answering (VQA) is of significant interest due to its po...

The Human Visual System and Adversarial AI

This paper introduces existing research about the Human Visual System in...

blessing in disguise: Designing Robust Turing Test by Employing Algorithm Unrobustness

Turing test was originally proposed to examine whether machine's behavio...