Attention on Attention: Architectures for Visual Question Answering (VQA)

03/21/2018
by   Jasdeep Singh, et al.
0

Visual Question Answering (VQA) is an increasingly popular topic in deep learning research, requiring coordination of natural language processing and computer vision modules into a single architecture. We build upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier. We performed 300 GPU hours of extensive hyperparameter and architecture searches and were able to achieve an evaluation score of 64.78 single model's validation score of 63.15

READ FULL TEXT
research
10/05/2016

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Visual Question Answering (VQA) is a recent problem in computer vision a...
research
08/09/2017

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

This paper presents a state-of-the-art model for visual question answeri...
research
05/10/2017

Survey of Visual Question Answering: Datasets and Techniques

Visual question answering (or VQA) is a new and exciting problem that co...
research
06/08/2018

CS-VQA: Visual Question Answering with Compressively Sensed Images

Visual Question Answering (VQA) is a complex semantic task requiring bot...
research
09/06/2021

Improved RAMEN: Towards Domain Generalization for Visual Question Answering

Currently nearing human-level performance, Visual Question Answering (VQ...
research
09/25/2017

Can you fool AI with adversarial examples on a visual Turing test?

Deep learning has achieved impressive results in many areas of Computer ...
research
07/08/2017

Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures

We present a simple dynamic batching approach applicable to a large clas...

Please sign up or login with your details

Forgot password? Click here to reset