DeepAI AI Chat
Log In Sign Up

Learning Sparse Mixture of Experts for Visual Question Answering

by   Vardaan Pahuja, et al.

There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question Answering (VQA). A Convolutional Neural Network (CNN) is an integral part of the visual processing pipeline of a VQA model (assuming the CNN is trained along with entire VQA model). In this project, we propose an efficient and modular neural architecture for the VQA task with focus on the CNN module. Our experiments demonstrate that a sparsely activated CNN based VQA model achieves comparable performance to a standard CNN based VQA model architecture.


page 1

page 2

page 3

page 4


Visual Question Generation as Dual Task of Visual Question Answering

Recently visual question answering (VQA) and visual question generation ...

NAAQA: A Neural Architecture for Acoustic Question Answering

The goal of the Acoustic Question Answering (AQA) task is to answer a fr...

Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures

We present a simple dynamic batching approach applicable to a large clas...

Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Transformer-based approaches have shown great success in visual question...

Generalized Hadamard-Product Fusion Operators for Visual Question Answering

We propose a generalized class of multimodal fusion operators for the ta...

Semantic-aware Modular Capsule Routing for Visual Question Answering

Visual Question Answering (VQA) is fundamentally compositional in nature...

Grad-CAM: Why did you say that?

We propose a technique for making Convolutional Neural Network (CNN)-bas...