Multi-Image Visual Question Answering

12/27/2021
by   Harsh Raj, et al.
0

While a lot of work has been done on developing models to tackle the problem of Visual Question Answering, the ability of these models to relate the question to the image features still remain less explored. We present an empirical study of different feature extraction methods with different loss functions. We propose New dataset for the task of Visual Question Answering with multiple image inputs having only one ground truth, and benchmark our results on them. Our final model utilising Resnet + RCNN image features and Bert embeddings, inspired from stacked attention network gives 39 accuracy and 99

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2017

Video Question Answering via Attribute-Augmented Attention Network Learning

Video Question Answering is a challenging problem in visual information ...
research
06/05/2018

Focal Visual-Text Attention for Visual Question Answering

Recent insights on language and vision with neural networks have been su...
research
07/12/2020

Applying recent advances in Visual Question Answering to Record Linkage

Multi-modal Record Linkage is the process of matching multi-modal record...
research
10/03/2018

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

We study how to leverage off-the-shelf visual and linguistic data to cop...
research
07/16/2019

2nd Place Solution to the GQA Challenge 2019

We present a simple method that achieves unexpectedly superior performan...
research
08/28/2018

Evaluating Theory of Mind in Question Answering

We propose a new dataset for evaluating question answering models with r...
research
02/01/2022

Research on Question Classification Methods in the Medical Field

Question classification is one of the important links in the research of...

Please sign up or login with your details

Forgot password? Click here to reset