-
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Visual Question Answering (VQA) has achieved great success thanks to the...
read it
-
Adversarial Examples that Fool both Human and Computer Vision
Machine learning models are vulnerable to adversarial examples: small ch...
read it
-
Survey of Visual Question Answering: Datasets and Techniques
Visual question answering (or VQA) is a new and exciting problem that co...
read it
-
Attention on Attention: Architectures for Visual Question Answering (VQA)
Visual Question Answering (VQA) is an increasingly popular topic in deep...
read it
-
Question-Conditioned Counterfactual Image Generation for VQA
While Visual Question Answering (VQA) models continue to push the state-...
read it
-
The Human Visual System and Adversarial AI
This paper introduces existing research about the Human Visual System in...
read it
-
Hard to Cheat: A Turing Test based on Answering Questions about Images
Progress in language and image understanding by machines has sparkled th...
read it
Can you fool AI with adversarial examples on a visual Turing test?
Deep learning has achieved impressive results in many areas of Computer Vision and Natural Language Pro- cessing. Among others, Visual Question Answering (VQA), also referred to a visual Turing test, is considered one of the most compelling problems, and recent deep learning models have reported significant progress in vision and language modeling. Although Artificial Intelligence (AI) is getting closer to passing the visual Turing test, at the same time the existence of adversarial examples to deep learning systems may hinder the practical application of such systems. In this work, we conduct the first extensive study on adversarial examples for VQA systems. In particular, we focus on generating targeted adversarial examples for a VQA system while the target is considered to be a question-answer pair. Our evaluation shows that the success rate of whether a targeted adversarial example can be generated is mostly dependent on the choice of the target question-answer pair, and less on the choice of images to which the question refers. We also report the language prior phenomenon of a VQA model, which can explain why targeted adversarial examples are hard to generate for some question-answer targets. We also demonstrate that a compositional VQA architecture is slightly more resilient to adversarial attacks than a non-compositional one. Our study sheds new light on how to build deep vision and language resilient models robust against adversarial examples.
READ FULL TEXT
Comments
There are no comments yet.