From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models

10/11/2022
by   Lei Li, et al.
14

Investigating better ways to reuse the released pre-trained language models (PLMs) can significantly reduce the computational cost and the potential environmental side-effects. This paper explores a novel PLM reuse paradigm, Knowledge Integration (KI). Without human annotations available, KI aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. To achieve this, we first derive the correlation between virtual golden supervision and teacher predictions. We then design a Model Uncertainty–aware Knowledge Integration (MUKI) framework to recover the golden supervision for the student. Specifically, MUKI adopts Monte-Carlo Dropout to estimate model uncertainty for the supervision integration. An instance-wise re-weighting mechanism based on the margin of uncertainty scores is further incorporated, to deal with the potential conflicting supervision from teachers. Experimental results demonstrate that MUKI achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKI can generalize well for merging teacher models with heterogeneous architectures, and even teachers major in cross-lingual datasets.

READ FULL TEXT
research
12/14/2021

Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models

As many fine-tuned pre-trained language models (PLMs) with promising per...
research
06/05/2021

MergeDistill: Merging Pre-trained Language Models using Distillation

Pre-trained multilingual language models (LMs) have achieved state-of-th...
research
08/20/2019

Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation

A massive number of well-trained deep networks have been released by dev...
research
08/05/2021

MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

Annotating multiple organs in 3D medical images is time-consuming and co...
research
05/23/2023

Improving Heterogeneous Model Reuse by Density Estimation

This paper studies multiparty learning, aiming to learn a model using th...
research
08/31/2021

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

Transliteration is very common on social media, but transliterated text ...

Please sign up or login with your details

Forgot password? Click here to reset