Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

04/20/2018
by   Liyuan Liu, et al.
0

Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (PTLM), and brought significant improvements to various applications. To fully leverage the nearly unlimited corpora and capture linguistic information of multifarious levels, large-size LMs are required; but for a specific task, only parts of these information are useful. Such large models, even in the inference stage, lead to overwhelming computation workloads, thus making them too time-consuming for real-world applications. For a specific task, we aim to keep useful information while compressing bulky PTLM. Since layers of different depths keep different information, we can conduct the compression via layer selection. By introducing the dense connectivity, we can detach any layers without eliminating others, and stretch shallow and wide LMs to be deep and narrow. Moreover, PTLM are trained with layer-wise dropouts for better robustness, and are pruned by a sparse regularization which is customized for our goal. Experiments on benchmarks demonstrate the effectiveness of our proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2022

LERT: A Linguistically-motivated Pre-trained Language Model

Pre-trained Language Model (PLM) has become a representative foundation ...
research
02/10/2021

Customizing Contextualized Language Models forLegal Document Reviews

Inspired by the inductive transfer learning on computer vision, many eff...
research
08/30/2021

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Large-scale pre-trained language models have contributed significantly t...
research
12/14/2021

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Pre-trained Language Models (PLMs) have achieved great success in variou...
research
01/24/2020

Compressing Language Models using Doped Kronecker Products

Kronecker Products (KP) have been used to compress IoT RNN Applications ...
research
01/11/2020

A Continuous Space Neural Language Model for Bengali Language

Language models are generally employed to estimate the probability distr...
research
03/07/2023

Gradient-Free Structured Pruning with Unlabeled Data

Large Language Models (LLMs) have achieved great success in solving diff...

Please sign up or login with your details

Forgot password? Click here to reset