Intriguing Properties of Compression on Multilingual Models

11/04/2022
by   Kelechi Ogueji, et al.
1

Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning. Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties. In contrast to prior findings, we find that compression may improve model robustness over dense models. We additionally observe that under certain sparsification regimes compression may aid, rather than disproportionately impact the performance of low-resource languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2019

Multilingual NER Transfer for Low-resource Languages

In massively multilingual transfer NLP models over many source languages...
research
03/16/2022

Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?

What can pre-trained multilingual sequence-to-sequence models like mBART...
research
10/06/2021

The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation

A "bigger is better" explosion in the number of parameters in deep neura...
research
07/15/2023

Multilingual Adapter-based Knowledge Aggregation on Code Summarization for Low-Resource Languages

Multilingual fine-tuning (of a multilingual Pre-trained Language Model) ...
research
12/10/2020

Exploring Pair-Wise NMT for Indian Languages

In this paper, we address the task of improving pair-wise machine transl...
research
05/25/2021

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

The Lottery Ticket Hypothesis suggests that an over-parametrized network...

Please sign up or login with your details

Forgot password? Click here to reset