GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

06/10/2021
by   Ivan Chelombiev, et al.
5

Attention based language models have become a critical component in state-of-the-art natural language processing systems. However, these models have significant computational requirements, due to long training times, dense operations and large parameter count. In this work we demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. First, we add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions. Secondly, we rely on grouped transformations to reduce the computational cost of dense feed-forward layers and convolutions, while preserving the expressivity of the model. We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales. We further highlight its improved efficiency, both in terms of floating-point operations (FLOPs) and time-to-train.

READ FULL TEXT

page 16

page 17

page 18

research
12/15/2022

Efficient Long Sequence Modeling via State Space Augmented Transformer

Transformer models have achieved superior performance in various natural...
research
02/01/2023

Feed-Forward Blocks Control Contextualization in Masked Language Models

Understanding the inner workings of neural network models is a crucial s...
research
05/22/2023

Parallel Attention and Feed-Forward Net Design for Pre-training and Inference on Transformers

In this paper, we introduce Parallel Attention and Feed-Forward Net Desi...
research
06/15/2021

PairConnect: A Compute-Efficient MLP Alternative to Attention

Transformer models have demonstrated superior performance in natural lan...
research
05/29/2023

Brainformers: Trading Simplicity for Efficiency

Transformers are central to recent successes in natural language process...
research
06/02/2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

The recently proposed Conformer model has become the de facto backbone m...

Please sign up or login with your details

Forgot password? Click here to reset