Learning Answer Embeddings for Visual Question Answering

by   Hexiang Hu, et al.
University of Southern California

We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn the best parameterization of those embeddings such that the correct answer has higher likelihood among all possible answers. In contrast to several existing approaches of treating Visual QA as multi-way classification, the proposed approach takes the semantic relationships (as characterized by the embeddings) among answers into consideration, instead of viewing them as independent ordinal numbers. Thus, the learned embedded function can be used to embed unseen answers (in the training dataset). These properties make the approach particularly appealing for transfer learning for open-ended Visual QA, where the source dataset on which the model is learned has limited overlapping with the target dataset in the space of answers. We have also developed large-scale optimization techniques for applying the model to datasets with a large number of answers, where the challenge is to properly normalize the proposed probabilistic models. We validate our approach on several Visual QA datasets and investigate its utility for transferring models across datasets. The empirical results have shown that the approach performs well not only on in-domain learning but also on transfer learning.


page 1

page 2

page 3

page 4


Supervised and Unsupervised Transfer Learning for Question Answering

Although transfer learning has been shown to be successful for tasks lik...

Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

Visual question answering (QA) has attracted a lot of attention lately, ...

Exploring Models and Data for Image Question Answering

This work aims to address the problem of image-based question-answering ...

Visual Question Answering with Prior Class Semantics

We present a novel mechanism to embed prior knowledge in a model for vis...

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

We study how to leverage off-the-shelf visual and linguistic data to cop...

Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading

Automatic Short Answer Grading (ASAG) is the process of grading the stud...

Transferring Domain-Agnostic Knowledge in Video Question Answering

Video question answering (VideoQA) is designed to answer a given questio...

Please sign up or login with your details

Forgot password? Click here to reset