Parallel Attention and Feed-Forward Net Design for Pre-training and Inference on Transformers

05/22/2023
by   Shashank Sonkar, et al.
0

In this paper, we introduce Parallel Attention and Feed-Forward Net Design (PAF) for transformer models. Transformer models are indisputably the backbone of all Natural Language Processing applications. Therefore, any efforts aimed at improving their efficiency are guaranteed to have an enormous impact. Transformer models consist of many layers and each layer has an attention block followed by a feed-forward network (FFN) that processes the input based on the attention block's output. We refer to this standard design as Series Attention and Feed-Forward Net Design (SAF). For each layer in our proposed PAF design for transformer models, we make FFN block's computations independent of the output of the attention block. This decoupling allows FFN block of each layer to run in parallel to the attention block of that layer. We evaluate PAF design by training two large language models (RoBERTa-large and bert-large-uncased) and comparing them to their SAF counterparts on six tasks of the General Language Understanding (GLUE) benchmark which test a multitude of semantic attributes. PAF models achieves nearly identical performance as their SAF counterparts on all the six tasks. We also compare time complexities of attention blocks with FFN blocks and find that running both blocks in parallel can theoretically and in practice achieve upto 1.5x to 2x gains in speed. We leave the development of fast and efficient libraries for implementation of PAF design for future work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

Feed-Forward Blocks Control Contextualization in Masked Language Models

Understanding the inner workings of neural network models is a crucial s...
research
05/29/2023

Brainformers: Trading Simplicity for Efficiency

Transformers are central to recent successes in natural language process...
research
09/09/2020

Pay Attention when Required

Transformer-based models consist of interleaved feed-forward blocks - th...
research
10/14/2019

Pruning a BERT-based Question Answering Model

We investigate compressing a BERT-based question answering system by pru...
research
05/23/2023

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model

Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Expert...
research
06/10/2021

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

Attention based language models have become a critical component in stat...
research
01/12/2021

Of Non-Linearity and Commutativity in BERT

In this work we provide new insights into the transformer architecture, ...

Please sign up or login with your details

Forgot password? Click here to reset