Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models

10/07/2020
by   Amrit Nagarajan, et al.
0

Transformer models have garnered a lot of interest in recent years by delivering state-of-the-art performance in a range of Natural Language Processing (NLP) tasks. However, these models can have over a hundred billion parameters, presenting very high computational and memory requirements. We address this challenge through Approximate Computing, specifically targeting the use of Transformers in NLP tasks. Transformers are typically pre-trained and subsequently specialized for specific tasks through transfer learning. Based on the observation that pre-trained Transformers are often over-parameterized for several downstream NLP tasks, we propose a framework to create smaller, faster and in some cases more accurate models. The key cornerstones of the framework are a Significance Analysis (SA) method that identifies components in a pre-trained Transformer that are less significant for a given task, and techniques to approximate the less significant components. Our approximations include pruning of blocks, attention heads and weight groups, quantization of less significant weights and a low-complexity sign-matching based attention mechanism. Our framework can be adapted to produce models that are faster, smaller and/or more accurate, depending on the user's constraints. We apply our framework to seven Transformer models, including optimized models like DistilBERT and Q8BERT, and three downstream tasks. We demonstrate that our framework produces models that are up to 4x faster and up to 14x smaller (with less than 0.5 degradation), or up to 5.5 to 9.83x in model size or 2.94x in speed.

READ FULL TEXT
research
10/06/2021

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

Pre-trained Natural Language Processing (NLP) models can be easily adapt...
research
10/02/2022

Wide Attention Is The Way Forward For Transformers

The Transformer is an extremely powerful and prominent deep learning arc...
research
05/07/2021

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

In human-level NLP tasks, such as predicting mental health, personality,...
research
03/17/2020

Rethinking Batch Normalization in Transformers

The standard normalization method for neural network (NN) models used in...
research
04/09/2021

Transformers: "The End of History" for NLP?

Recent advances in neural architectures, such as the Transformer, couple...
research
10/31/2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Generative Pre-trained Transformer (GPT) models set themselves apart thr...
research
06/01/2021

DoT: An efficient Double Transformer for NLP tasks with tables

Transformer-based approaches have been successfully used to obtain state...

Please sign up or login with your details

Forgot password? Click here to reset