Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

03/27/2022
by   Xin Huang, et al.
3

Transformer-based language models such as BERT have achieved the state-of-the-art performance on various NLP tasks, but are computationally prohibitive. A recent line of works use various heuristics to successively shorten sequence length while transforming tokens through encoders, in tasks such as classification and ranking that require a single token embedding for prediction. We present a novel solution to this problem, called Pyramid-BERT where we replace previously used heuristics with a core-set based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths. We provide extensive experiments establishing advantages of pyramid BERT over several baselines and existing works on the GLUE benchmarks and Long Range Arena datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

SegaBERT: Pre-training of Segment-aware BERT for Language Understanding

Pre-trained language models have achieved state-of-the-art results in va...
research
10/30/2021

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

Pre-training and then fine-tuning large language models is commonly used...
research
05/25/2021

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

Existing pre-trained language models (PLMs) are often computationally ex...
research
01/24/2021

Does Dialog Length matter for Next Response Selection task? An Empirical Study

In the last few years, the release of BERT, a multilingual transformer b...
research
06/26/2023

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Deploying pre-trained transformer models like BERT on downstream tasks i...
research
03/12/2021

Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Type- and token-based embedding architectures are still competing in lex...

Please sign up or login with your details

Forgot password? Click here to reset