Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

04/16/2022
by   Gen Luo, et al.
0

Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost. However, compressing Transformer remains as an open problem due to its internal complexity of the layer designs, i.e., Multi-Head Attention (MHA) and Feed-Forward Network (FFN). To address this issue, we introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. LW-Transformer applies Group-wise Transformation to reduce both the parameters and computations of Transformer, while also preserving its two main properties, i.e., the efficient attention modeling on diverse subspaces of MHA, and the expanding-scaling feature transformation of FFN. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets. Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks. To examine the generalization ability, we also apply our optimization strategy to a recently proposed image Transformer called Swin-Transformer for image classification, where the effectiveness can be also confirmed

READ FULL TEXT

page 1

page 10

page 13

research
10/07/2021

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

While Transformer-based models have shown impressive language modeling p...
research
02/14/2020

Transformer on a Diet

Transformer has been widely used thanks to its ability to capture sequen...
research
01/20/2022

TerViT: An Efficient Ternary Vision Transformer

Vision transformers (ViTs) have demonstrated great potential in various ...
research
05/25/2022

MoCoViT: Mobile Convolutional Vision Transformer

Recently, Transformer networks have achieved impressive results on a var...
research
03/22/2021

Incorporating Convolution Designs into Visual Transformers

Motivated by the success of Transformers in natural language processing ...
research
09/04/2023

One Wide Feedforward is All You Need

The Transformer architecture has two main non-embedding components: Atte...
research
04/11/2023

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

In recent years, a great deal of attention has been paid to the Transfor...

Please sign up or login with your details

Forgot password? Click here to reset