DeepAI AI Chat
Log In Sign Up

Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models

by   Amrit Nagarajan, et al.

Transformer models have garnered a lot of interest in recent years by delivering state-of-the-art performance in a range of Natural Language Processing (NLP) tasks. However, these models can have over a hundred billion parameters, presenting very high computational and memory requirements. We address this challenge through Approximate Computing, specifically targeting the use of Transformers in NLP tasks. Transformers are typically pre-trained and subsequently specialized for specific tasks through transfer learning. Based on the observation that pre-trained Transformers are often over-parameterized for several downstream NLP tasks, we propose a framework to create smaller, faster and in some cases more accurate models. The key cornerstones of the framework are a Significance Analysis (SA) method that identifies components in a pre-trained Transformer that are less significant for a given task, and techniques to approximate the less significant components. Our approximations include pruning of blocks, attention heads and weight groups, quantization of less significant weights and a low-complexity sign-matching based attention mechanism. Our framework can be adapted to produce models that are faster, smaller and/or more accurate, depending on the user's constraints. We apply our framework to seven Transformer models, including optimized models like DistilBERT and Q8BERT, and three downstream tasks. We demonstrate that our framework produces models that are up to 4x faster and up to 14x smaller (with less than 0.5 degradation), or up to 5.5 to 9.83x in model size or 2.94x in speed.


BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

Pre-trained Natural Language Processing (NLP) models can be easily adapt...

Wide Attention Is The Way Forward For Transformers

The Transformer is an extremely powerful and prominent deep learning arc...

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Generative Pre-trained Transformer (GPT) models set themselves apart thr...

Rethinking Batch Normalization in Transformers

The standard normalization method for neural network (NN) models used in...

Transformers: State-of-the-art Natural Language Processing

Recent advances in modern Natural Language Processing (NLP) research hav...

Transformers: "The End of History" for NLP?

Recent advances in neural architectures, such as the Transformer, couple...