DeLighT: Very Deep and Light-weight Transformer

08/03/2020
by   Sachin Mehta, et al.
0

We introduce a very deep and light-weight transformer, DeLighT, that delivers similar or better performance than transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks using block-wise scaling, that allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. Experiments on machine translation and language modeling tasks show that DeLighT matches the performance of baseline Transformers with significantly fewer parameters. On the WMT'14 En-Fr high resource dataset, DeLighT requires 1.8 times fewer parameters and 2 times fewer operations and achieves better performance (+0.4 BLEU score) than baseline transformers. On the WMT'16 En-Ro low resource dataset, DeLighT delivers similar performance with 2.8 times fewer parameters than baseline transformers.

READ FULL TEXT
research
03/08/2022

EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers

Recently, vision transformers started to show impressive results which o...
research
11/10/2019

Improving Transformer Models by Reordering their Sublayers

Multilayer transformer networks consist of interleaved self-attention an...
research
11/27/2019

DeFINE: DEep Factorized INput Word Embeddings for Neural Sequence Modeling

For sequence models with large word-level vocabularies, a majority of ne...
research
08/11/2022

Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

Machine translation has seen rapid progress with the advent of Transform...
research
10/23/2022

Transformers For Recognition In Overhead Imagery: A Reality Check

There is evidence that transformers offer state-of-the-art recognition p...
research
06/10/2023

FalconNet: Factorization for the Light-weight ConvNets

Designing light-weight CNN models with little parameters and Flops is a ...
research
10/19/2019

Fast Portrait Segmentation with extremely light-weight network

In this paper, we describe a fast and light-weight portrait segmentation...

Please sign up or login with your details

Forgot password? Click here to reset