DeepAI AI Chat
Log In Sign Up

Pruning a BERT-based Question Answering Model

10/14/2019
by   J. S. McCarley, et al.
0

We investigate compressing a BERT-based question answering system by pruning parameters from the underlying BERT model. We start from models trained for SQuAD 2.0 and introduce gates that allow selected parts of transformers to be individually eliminated. Specifically, we investigate (1) reducing the number of attention heads in each transformer, (2) reducing the intermediate width of the feed-forward sublayer of each transformer, and (3) reducing the embedding dimension. We compare several approaches for determining the values of these gates. We find that a combination of pruning attention heads and the feed-forward layer almost doubles the decoding speed, with only a 1.5 f-point loss in accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/31/2020

Pretrained Transformers for Simple Question Answering over Knowledge Graphs

Answering simple questions over knowledge graphs is a well-studied probl...
01/12/2021

Of Non-Linearity and Commutativity in BERT

In this work we provide new insights into the transformer architecture, ...
06/02/2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

How much information do NLP tasks really need from a transformer's atten...
10/05/2020

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

Traditional (unstructured) pruning methods for a Transformer model focus...
04/30/2020

How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

Attribution methods assess the contribution of inputs (e.g., words) to t...
05/06/2021

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

The strong performance of vision transformers on image classification an...
02/25/2020

Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0

In this paper we explore the parameter efficiency of BERT arXiv:1810.048...