Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

03/24/2022
by   Zhou Yu, et al.
0

Transformer-based approaches have shown great success in visual question answering (VQA). However, they usually require deep and wide models to guarantee good performance, making it difficult to deploy on capacity-restricted platforms. It is a challenging yet valuable task to design an elastic VQA model that supports adaptive pruning at runtime to meet the efficiency constraints of diverse platforms. In this paper, we present the Doubly Slimmable Transformer (DST), a general framework that can be seamlessly integrated into arbitrary Transformer-based VQA models to train one single model once and obtain various slimmed submodels of different widths and depths. Taking two typical Transformer-based VQA approaches, i.e., MCAN and UNITER, as the reference models, the obtained slimmable MCAN_DST and UNITER_DST models outperform the state-of-the-art methods trained independently on two benchmark datasets. In particular, one slimmed MCAN_DST submodel achieves a comparable accuracy on VQA-v2, while being 0.38x smaller in model size and having 0.27x fewer FLOPs than the reference MCAN model. The smallest MCAN_DST submodel has 9M parameters and 0.16G FLOPs in the inference stage, making it possible to be deployed on edge devices.

READ FULL TEXT

page 1

page 5

page 10

research
01/27/2022

Transformer Module Networks for Systematic Generalization in Visual Question Answering

Transformer-based models achieve great performance on Visual Question An...
research
09/19/2019

Learning Sparse Mixture of Experts for Visual Question Answering

There has been a rapid progress in the task of Visual Question Answering...
research
08/28/2021

On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering

Generalizing beyond the experiences has a significant role in developing...
research
09/02/2021

Lightweight Visual Question Answering using Scene Graphs

Visual question answering (VQA) is a challenging problem in machine perc...
research
10/18/2022

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ...
research
06/01/2023

LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing

Visual question answering (VQA) methods in remote sensing (RS) aim to an...
research
12/19/2019

Deep Exemplar Networks for VQA and VQG

In this paper, we consider the problem of solving semantic tasks such as...

Please sign up or login with your details

Forgot password? Click here to reset