DeepAI AI Chat
Log In Sign Up

"Diversity and Uncertainty in Moderation" are the Key to Data Selection for Multilingual Few-shot Transfer

by   Shanu Kumar, et al.

Few-shot transfer often shows substantial gain over zero-shot transfer <cit.>, which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretrained model-based systems. This paper explores various strategies for selecting data for annotation that can result in a better few-shot transfer. The proposed approaches rely on multiple measures such as data entropy using n-gram language model, predictive entropy, and gradient embedding. We propose a loss embedding method for sequence labeling tasks, which induces diversity and uncertainty sampling similar to gradient embedding. The proposed data selection strategies are evaluated and compared for POS tagging, NER, and NLI tasks for up to 20 languages. Our experiments show that the gradient and loss embedding-based strategies consistently outperform random data selection baselines, with gains varying with the initial performance of the zero-shot transfer. Furthermore, the proposed method shows similar trends in improvement even when the model is fine-tuned using a lower proportion of the original task-specific labeled training data for zero-shot transfer.


page 1

page 2

page 3

page 4


Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models

Massively Multilingual Transformer based Language Models have been obser...

Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

Massively multilingual models are promising for transfer learning across...

Translate Fill: Improving Zero-Shot Multilingual Semantic Parsing with Synthetic Data

While multilingual pretrained language models (LMs) fine-tuned on a sing...

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

The idea of combining multiple languages' recordings to train a single a...

Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling

A sufficient amount of annotated data is usually required to fine-tune p...

A Closer Look at Few-Shot Crosslingual Transfer: Variance, Benchmarks and Baselines

We present a focused study of few-shot crosslingual transfer, a recently...

On Efficiently Acquiring Annotations for Multilingual Models

When tasked with supporting multiple languages for a given problem, two ...