Visual Question Answering with Prior Class Semantics

05/04/2020
by   Violetta Shevchenko, et al.
5

We present a novel mechanism to embed prior knowledge in a model for visual question answering. The open-set nature of the task is at odds with the ubiquitous approach of training of a fixed classifier. We show how to exploit additional information pertaining to the semantics of candidate answers. We extend the answer prediction process with a regression objective in a semantic space, in which we project candidate answers using prior knowledge derived from word embeddings. We perform an extensive study of learned representations with the GQA dataset, revealing that important semantic information is captured in the relations between embeddings in the answer space. Our method brings improvements in consistency and accuracy over a range of question types. Experiments with novel answers, unseen during training, indicate the method's potential for open-set prediction.

READ FULL TEXT

page 7

page 13

page 14

research
10/20/2016

Proposing Plausible Answers for Open-ended Visual Question Answering

Answering open-ended questions is an essential capability for any intell...
research
09/30/2019

On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints

The knowledge that humans hold about a problem often extends far beyond ...
research
06/10/2018

Learning Answer Embeddings for Visual Question Answering

We propose a novel probabilistic model for visual question answering (Vi...
research
09/22/2021

Eliciting Thinking Hierarchy without Prior

A key challenge in crowdsourcing is that majority may make systematic mi...
research
11/01/2016

Solving Visual Madlibs with Multiple Cues

This paper presents an approach for answering fill-in-the-blank multiple...
research
10/03/2018

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

We study how to leverage off-the-shelf visual and linguistic data to cop...
research
08/18/2023

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Video Question Answering (VideoQA) is a challenging task that entails co...

Please sign up or login with your details

Forgot password? Click here to reset