AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference

05/13/2023
by   Qianglong Chen, et al.
0

Knowledge distillation is of key importance to launching multilingual pre-trained language models for real applications. To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student. We first introduce an adaptive learning strategy and teacher importance weight, which enables a student to effectively learn from max-margin teachers and easily adapt to new languages. Moreover, we present a shared student encoder with different projection layers in support of multiple languages, which contributes to largely reducing development and machine cost. Experimental results show that AMTSS gains competitive results on the public XNLI dataset and the realistic industrial dataset AliExpress (AE) in the E-commerce scenario.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2022

Gradient Knowledge Distillation for Pre-trained Language Models

Knowledge distillation (KD) is an effective framework to transfer knowle...
research
10/14/2022

Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation

Recent neural methods for vehicle routing problems always train and test...
research
03/06/2021

Adaptive Multi-Teacher Multi-level Knowledge Distillation

Knowledge distillation (KD) is an effective learning paradigm for improv...
research
10/19/2021

When in Doubt, Summon the Titans: Efficient Inference with Large Models

Scaling neural networks to "large" sizes, with billions of parameters, h...
research
06/05/2021

MergeDistill: Merging Pre-trained Language Models using Distillation

Pre-trained multilingual language models (LMs) have achieved state-of-th...
research
12/30/2021

An Efficient Federated Distillation Learning System for Multi-task Time Series Classification

This paper proposes an efficient federated distillation learning system ...
research
10/27/2022

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

Leveraging shared learning through Massively Multilingual Models, state-...

Please sign up or login with your details

Forgot password? Click here to reset