AdapLeR: Speeding up Inference by Adaptive Length Reduction

03/16/2022
by   Ali Modarressi, et al.
0

Pre-trained language models have shown stellar performance in various downstream tasks. But, this usually comes at the cost of high latency and computation, hindering their usage in resource-limited settings. In this work, we propose a novel approach for reducing the computational cost of BERT with minimal loss in downstream performance. Our method dynamically eliminates less contributing tokens through layers, resulting in shorter lengths and consequently lower computational cost. To determine the importance of each token representation, we train a Contribution Predictor for each layer using a gradient-based saliency method. Our experiments on several diverse classification tasks show speedups up to 22x during inference time without much sacrifice in performance. We also validate the quality of the selected tokens in our method using human annotations in the ERASER benchmark. In comparison to other widely used strategies for selecting important tokens, such as saliency and attention, our proposed method has a significantly lower false positive rate in generating rationales. Our code is freely available at https://github.com/amodaresi/AdapLeR .

READ FULL TEXT

page 8

page 15

research
03/24/2022

Token Dropping for Efficient BERT Pretraining

Transformer-based models generally allocate the same amount of computati...
research
02/19/2021

Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal Regularizer

The BERT model has shown significant success on various natural language...
research
12/03/2021

Make A Long Image Short: Adaptive Token Length for Vision Transformers

The vision transformer splits each image into a sequence of tokens with ...
research
02/28/2023

Weighted Sampling for Masked Language Modeling

Masked Language Modeling (MLM) is widely used to pretrain language model...
research
08/29/2023

Read-only Prompt Optimization for Vision-Language Few-shot Learning

In recent years, prompt tuning has proven effective in adapting pre-trai...
research
06/29/2022

Diet Code is Healthy: Simplifying Programs for Pre-Trained Models of Code

Pre-trained code representation models such as CodeBERT have demonstrate...
research
05/25/2023

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Autoregressive Transformers adopted in Large Language Models (LLMs) are ...

Please sign up or login with your details

Forgot password? Click here to reset