Semantic-aware Modular Capsule Routing for Visual Question Answering

07/21/2022
by   Yudong Han, et al.
0

Visual Question Answering (VQA) is fundamentally compositional in nature, and many questions are simply answered by decomposing them into modular sub-problems. The recent proposed Neural Module Network (NMN) employ this strategy to question answering, whereas heavily rest with off-the-shelf layout parser or additional expert policy regarding the network architecture design instead of learning from the data. These strategies result in the unsatisfactory adaptability to the semantically-complicated variance of the inputs, thereby hindering the representational capacity and generalizability of the model. To tackle this problem, we propose a Semantic-aware modUlar caPsulE Routing framework, termed as SUPER, to better capture the instance-specific vision-semantic characteristics and refine the discriminative representations for prediction. Particularly, five powerful specialized modules as well as dynamic routers are tailored in each layer of the SUPER network, and the compact routing spaces are constructed such that a variety of customizable routes can be sufficiently exploited and the vision-semantic representations can be explicitly calibrated. We comparatively justify the effectiveness and generalization ability of our proposed SUPER scheme over five benchmark datasets, as well as the parametric-efficient advantage. It is worth emphasizing that this work is not to pursue the state-of-the-art results in VQA. Instead, we expect that our model is responsible to provide a novel perspective towards architecture learning and representation calibration for VQA.

READ FULL TEXT

page 1

page 4

page 9

page 10

page 12

research
04/17/2019

Question Guided Modular Routing Networks for Visual Question Answering

Visual Question Answering (VQA) faces two major challenges: how to bette...
research
09/19/2019

Learning Sparse Mixture of Experts for Visual Question Answering

There has been a rapid progress in the task of Visual Question Answering...
research
11/09/2015

Neural Module Networks

Visual question answering is fundamentally compositional in nature---a q...
research
04/29/2019

Routing Networks and the Challenges of Modular and Compositional Computation

Compositionality is a key strategy for addressing combinatorial complexi...
research
06/19/2018

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Visual Question answering is a challenging problem requiring a combinati...
research
06/15/2021

How Modular Should Neural Module Networks Be for Systematic Generalization?

Neural Module Networks (NMNs) aim at Visual Question Answering (VQA) via...

Please sign up or login with your details

Forgot password? Click here to reset