HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

10/16/2021
by   Chenhe Dong, et al.
0

On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at https://github.com/cheneydon/hrkd.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2020

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains

Pre-trained language models have been applied to various NLP tasks with ...
research
06/11/2023

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Currently, the reduction in the parameter scale of large-scale pre-train...
research
09/13/2021

KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation

The development of over-parameterized pre-trained language models has ma...
research
09/15/2021

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Pre-trained language models have shown remarkable results on various NLP...
research
10/20/2021

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

The remarkable performance of the pre-trained language model (LM) using ...
research
07/17/2023

Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

Engineering knowledge-based (or expert) systems require extensive manual...
research
05/11/2023

Domain Incremental Lifelong Learning in an Open World

Lifelong learning (LL) is an important ability for NLP models to learn n...

Please sign up or login with your details

Forgot password? Click here to reset