Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

10/27/2022
by   Harshita Diddee, et al.
0

Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages. However, this performance comes at the cost of significantly bloated models which are not practically deployable. Knowledge Distillation is one popular technique to develop competitive, lightweight models: In this work, we first evaluate its use to compress MT models focusing on languages with extremely limited training data. Through our analysis across 8 languages, we find that the variance in the performance of the distilled models due to their dependence on priors including the amount of synthetic data used for distillation, the student architecture, training hyperparameters and confidence of the teacher models, makes distillation a brittle compression mechanism. To mitigate this, we explore the use of post-training quantization for the compression of these models. Here, we find that while distillation provides gains across some low-resource languages, quantization provides more consistent performance trends for the entire range of languages, especially the lowest-resource languages in our target set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Letz Translate: Low-Resource Machine Translation for Luxembourgish

Natural language processing of Low-Resource Languages (LRL) is often cha...
research
10/13/2022

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

Multilingual models have been widely used for cross-lingual transfer to ...
research
09/14/2023

ChatGPT MT: Competitive for High- (but not Low-) Resource Languages

Large language models (LLMs) implicitly learn to perform a range of lang...
research
10/12/2020

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Scarcity of parallel sentence-pairs poses a significant hurdle for train...
research
05/13/2023

AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference

Knowledge distillation is of key importance to launching multilingual pr...
research
05/09/2022

Building Machine Translation Systems for the Next Thousand Languages

In this paper we share findings from our effort to build practical machi...
research
11/17/2020

Ultra-Lightweight Speech Separation via Group Communication

Model size and complexity remain the biggest challenges in the deploymen...

Please sign up or login with your details

Forgot password? Click here to reset