Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

04/08/2020
by   Xinyu Wang, et al.
8

Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student's and the teachers' structure level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

Multilingual models have been widely used for cross-lingual transfer to ...
research
07/11/2022

UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation

Most translation tasks among languages belong to the zero-resource trans...
research
05/25/2023

Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

While impressive performance has been achieved on the task of Answer Sen...
research
09/15/2023

Multilingual Sentence-Level Semantic Search using Meta-Distillation Learning

Multilingual semantic search is the task of retrieving relevant contents...
research
03/02/2023

Letz Translate: Low-Resource Machine Translation for Luxembourgish

Natural language processing of Low-Resource Languages (LRL) is often cha...
research
02/25/2023

Locale Encoding For Scalable Multilingual Keyword Spotting Models

A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over...
research
08/31/2019

Small and Practical BERT Models for Sequence Labeling

We propose a practical scheme to train a single multilingual sequence la...

Please sign up or login with your details

Forgot password? Click here to reset