Question-Guided Hybrid Convolution for Visual Question Answering

08/08/2018
by   Peng Gao, et al.
0

In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features.To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage. The question-guided convolution can tightly couple the textual and visual information but also introduce more parameters when learning kernels. We apply the group convolution, which consists of question-independent kernels and question-dependent kernels, to reduce the parameter size and alleviate over-fitting. The hybrid convolution can generate discriminative multi-modal features with fewer parameters. The proposed approach is also complementary to existing bilinear pooling fusion and attention based VQA methods. By integrating with them, our method could further boost the performance. Extensive experiments on public VQA datasets validate the effectiveness of QGHC.

READ FULL TEXT
research
10/17/2021

Towards Language-guided Visual Recognition via Dynamic Convolutions

In this paper, we are committed to establishing an unified and end-to-en...
research
04/17/2019

Question Guided Modular Routing Networks for Visual Question Answering

Visual Question Answering (VQA) faces two major challenges: how to bette...
research
08/10/2017

Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering

Visual question answering (VQA) is challenging because it requires a sim...
research
11/11/2022

MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering

There is a key problem in the medical visual question answering task tha...
research
04/06/2018

Question Type Guided Attention in Visual Question Answering

Visual Question Answering (VQA) requires integration of feature maps wit...
research
02/22/2017

Task-driven Visual Saliency and Attention-based Visual Question Answering

Visual question answering (VQA) has witnessed great progress since May, ...
research
10/05/2020

Attention Guided Semantic Relationship Parsing for Visual Question Answering

Humans explain inter-object relationships with semantic labels that demo...

Please sign up or login with your details

Forgot password? Click here to reset