FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

10/26/2020
by   Young Jin Kim, et al.
0

Transformer-based models are the state-of-the-art for Natural Language Understanding (NLU) applications. Models are getting bigger and better on various tasks. However, Transformer models remain computationally challenging since they are not efficient at inference-time compared to traditional approaches. In this paper, we present FastFormers, a set of recipes to achieve efficient inference-time performance for Transformer-based models on various NLU tasks. We show how carefully utilizing knowledge distillation, structured pruning and numerical optimization can lead to drastic improvements on inference efficiency. We provide effective recipes that can guide practitioners to choose the best settings for various NLU tasks and pretrained models. Applying the proposed recipes to the SuperGLUE benchmark, we achieve from 9.8x up to 233.9x speed-up compared to out-of-the-box models on CPU. On GPU, we also achieve up to 12.4x speed-up with the presented methods. We show that FastFormers can drastically reduce cost of serving 100 million requests from 4,223 USD to just 18 USD on an Azure F16s_v2 instance. This translates to a sustainable runtime by reducing energy consumption 6.9x - 125.8x according to the metrics used in the SustaiNLP 2020 shared task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2019

WaLDORf: Wasteless Language-model Distillation On Reading-comprehension

Transformer based Very Large Language Models (VLLMs) like BERT, XLNet an...
research
05/03/2023

Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs

Large language models (LLMs) power many state-of-the-art systems in natu...
research
05/01/2020

KLEJ: Comprehensive Benchmark for Polish Language Understanding

In recent years, a series of Transformer-based models unlocked major imp...
research
03/16/2023

Block-wise Bit-Compression of Transformer-based Models

With the popularity of the recent Transformer-based models represented b...
research
05/05/2020

The Cascade Transformer: an Application for Efficient Answer Sentence Selection

Large transformer-based language models have been shown to be very effec...
research
12/30/2020

Unnatural Language Inference

Natural Language Understanding has witnessed a watershed moment with the...
research
04/13/2022

TangoBERT: Reducing Inference Cost by using Cascaded Architecture

The remarkable success of large transformer-based models such as BERT, R...

Please sign up or login with your details

Forgot password? Click here to reset