Bilinear Attention Networks

05/21/2018
by   Jin-Hwa Kim, et al.
0

Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.

READ FULL TEXT
research
10/14/2016

Hadamard Product for Low-rank Bilinear Pooling

Bilinear models provide rich representations compared with linear models...
research
06/06/2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Modeling textual or visual information with vector representations train...
research
05/18/2017

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

Bilinear models provide an appealing framework for mixing and merging in...
research
09/27/2021

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering

We present VQA-MHUG - a novel 49-participant dataset of multimodal human...
research
11/09/2019

Learning Deep Bilinear Transformation for Fine-grained Image Representation

Bilinear feature transformation has shown the state-of-the-art performan...
research
03/23/2017

Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

In state-of-the-art Neural Machine Translation, an attention mechanism i...
research
10/23/2019

A Unifying Framework of Bilinear LSTMs

This paper presents a novel unifying framework of bilinear LSTMs that ca...

Please sign up or login with your details

Forgot password? Click here to reset