Simple Baseline for Visual Question Answering

12/07/2015
by   Bolei Zhou, et al.
0

We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .

READ FULL TEXT

page 5

page 6

10/09/2016

Open-Ended Visual Question-Answering

This thesis report studies methods to solve Visual Question-Answering (V...
03/23/2017

Recurrent and Contextual Models for Visual Question Answering

We propose a series of recurrent and contextual neural network models fo...
06/10/2022

Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model

Current architectures for multi-modality tasks such as visual question a...
04/08/2019

Revisiting EmbodiedQA: A Simple Baseline and Beyond

In Embodied Question Answering (EmbodiedQA), an agent interacts with an ...
06/19/2021

VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis

Visual Question Answering system integrated with Unmanned Aerial Vehicle...
06/19/2018

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Visual Question answering is a challenging problem requiring a combinati...
06/16/2016

No Need to Pay Attention: Simple Recurrent Neural Networks Work! (for Answering "Simple" Questions)

First-order factoid question answering assumes that the question can be ...

Code Repositories

VQAbaseline

Simple Baseline for Visual Question Answering


view repo

Please sign up or login with your details

Forgot password? Click here to reset