DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

05/02/2020
by   Qingqing Cao, et al.
0

Transformer-based QA models use input-wide self-attention – i.e. across both the question and the input passage – at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1 open source the code at https://github.com/StonyBrookNLP/deformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

LazyFormer: Self Attention with Lazy Update

Improving the efficiency of Transformer-based language pre-training is a...
research
12/16/2021

Block-Skim: Efficient Question Answering for Transformer

Transformer models have achieved promising results on natural language p...
research
10/26/2022

DyREx: Dynamic Query Representation for Extractive Question Answering

Extractive question answering (ExQA) is an essential task for Natural La...
research
11/14/2022

The Birds Need Attention Too: Analysing usage of Self Attention in identifying bird calls in soundscapes

Birds are vital parts of ecosystems across the world and are an excellen...
research
09/11/2023

CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution

The success of Vision Transformer (ViT) has been widely reported on a wi...
research
09/11/2018

Does it care what you asked? Understanding Importance of Verbs in Deep Learning QA System

In this paper we present the results of an investigation of the importan...
research
10/18/2021

NormFormer: Improved Transformer Pretraining with Extra Normalization

During pretraining, the Pre-LayerNorm transformer suffers from a gradien...

Please sign up or login with your details

Forgot password? Click here to reset