MUTAN: Multimodal Tucker Fusion for Visual Question Answering

05/18/2017
by   Hedi Ben-Younes, et al.
0

Bilinear models provide an appealing framework for mixing and merging information in Visual Question Answering (VQA) tasks. They help to learn high level associations between question meaning and visual concepts in the image, but they suffer from huge dimensionality issues. We introduce MUTAN, a multimodal tensor-based Tucker decomposition to efficiently parametrize bilinear interactions between visual and textual representations. Additionally to the Tucker framework, we design a low-rank matrix-based decomposition to explicitly constrain the interaction rank. With MUTAN, we control the complexity of the merging scheme while keeping nice interpretable fusion relations. We show how our MUTAN model generalizes some of the latest VQA architectures, providing state-of-the-art results.

READ FULL TEXT
research
09/26/2019

Compact Trilinear Interaction for Visual Question Answering

In Visual Question Answering (VQA), answers have a great correlation wit...
research
01/31/2019

BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

Multimodal representation learning is gaining more and more interest wit...
research
10/14/2016

Hadamard Product for Low-rank Bilinear Pooling

Bilinear models provide rich representations compared with linear models...
research
05/21/2018

Bilinear Attention Networks

Attention networks in multimodal learning provide an efficient way to ut...
research
03/26/2018

Generalized Hadamard-Product Fusion Operators for Visual Question Answering

We propose a generalized class of multimodal fusion operators for the ta...
research
06/20/2017

Compact Tensor Pooling for Visual Question Answering

Performing high level cognitive tasks requires the integration of featur...
research
06/03/2019

Low-rank Random Tensor for Bilinear Pooling

Bilinear pooling is capable of extracting high-order information from da...

Please sign up or login with your details

Forgot password? Click here to reset