Compact Trilinear Interaction for Visual Question Answering

09/26/2019
by   Tuong Do, et al.
0

In Visual Question Answering (VQA), answers have a great correlation with question meaning and visual contents. Thus, to selectively utilize image, question and answer information, we propose a novel trilinear interaction model which simultaneously learns high level associations between these three inputs. In addition, to overcome the interaction complexity, we introduce a multimodal tensor-based PARALIND decomposition which efficiently parameterizes trilinear interaction between the three inputs. Moreover, knowledge distillation is first time applied in Free-form Opened-ended VQA. It is not only for reducing the computational cost and required memory but also for transferring knowledge from trilinear interaction model to bilinear interaction model. The extensive experiments on benchmarking datasets TDIUC, VQA-2.0, and Visual7W show that the proposed compact trilinear interaction model achieves state-of-the-art results when using a single model on all three datasets.

READ FULL TEXT
research
05/18/2017

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

Bilinear models provide an appealing framework for mixing and merging in...
research
12/01/2017

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

A number of studies have found that today's Visual Question Answering (V...
research
09/23/2020

Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering

Different approaches have been proposed to Visual Question Answering (VQ...
research
08/01/2022

Generative Bias for Visual Question Answering

The task of Visual Question Answering (VQA) is known to be plagued by th...
research
06/20/2017

Compact Tensor Pooling for Visual Question Answering

Performing high level cognitive tasks requires the integration of featur...
research
06/11/2021

NAAQA: A Neural Architecture for Acoustic Question Answering

The goal of the Acoustic Question Answering (AQA) task is to answer a fr...
research
04/13/2021

Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation

In this work, we address the issues of missing modalities that have aris...

Please sign up or login with your details

Forgot password? Click here to reset