High-Order Attention Models for Visual Question Answering

11/12/2017
by   Idan Schwartz, et al.
0

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

READ FULL TEXT

page 2

page 4

page 6

page 8

page 9

research
05/24/2018

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Recently, Visual Question Answering (VQA) has emerged as one of the most...
research
08/18/2020

Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks

Attention models are widely used in Vision-language (V-L) tasks to perfo...
research
11/19/2018

High Order Neural Networks for Video Classification

Capturing spatiotemporal correlations is an essential topic in video cla...
research
08/10/2017

Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering

Visual question answering (VQA) is challenging because it requires a sim...
research
04/03/2018

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

A key solution to visual question answering (VQA) exists in how to fuse ...
research
05/11/2021

Cross-Modal Generative Augmentation for Visual Question Answering

Data augmentation is an approach that can effectively improve the perfor...
research
06/20/2016

DualNet: Domain-Invariant Network for Visual Question Answering

Visual question answering (VQA) task not only bridges the gap between im...

Please sign up or login with your details

Forgot password? Click here to reset