Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for Compact and Efficient language model

05/21/2023
by   Wenxi Tan, et al.
0

The prevalence of Transformer-based pre-trained language models (PLMs) has led to their wide adoption for various natural language processing tasks. However, their excessive overhead leads to large latency and computational costs. The statically compression methods allocate fixed computation to different samples, resulting in redundant computation. The dynamic token pruning method selectively shortens the sequences but are unable to change the model size and hardly achieve the speedups as static pruning. In this paper, we propose a model accelaration approaches for large language models that incorporates dynamic token downsampling and static pruning, optimized by the information bottleneck loss. Our model, Infor-Coef, achieves an 18x FLOPs speedup with an accuracy degradation of less than 8% compared to BERT. This work provides a promising approach to compress and accelerate transformer-based models for NLP tasks.

READ FULL TEXT
research
03/30/2022

TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models

Pre-trained language models have been prevailed in natural language proc...
research
10/30/2021

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

Pre-training and then fine-tuning large language models is commonly used...
research
10/28/2021

Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures

Recent years have seen a growing adoption of Transformer models such as ...
research
06/05/2023

NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks

Finetuning large language models inflates the costs of NLU applications ...
research
06/28/2023

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

In recent years, Transformer-based language models have become the stand...
research
02/10/2023

Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

We propose a novel gradient-based attack against transformer-based langu...
research
05/24/2023

SmartTrim: Adaptive Tokens and Parameters Pruning for Efficient Vision-Language Models

Despite achieving remarkable performance on various vision-language task...

Please sign up or login with your details

Forgot password? Click here to reset