On Efficiently Acquiring Annotations for Multilingual Models

04/03/2022
by   Joel Ruben Antony Moniz, et al.
0

When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by zero-shot transfer to the remaining languages. In this work, we show that the strategy of joint learning across multiple languages using a single model performs substantially better than the aforementioned alternatives. We also demonstrate that active learning provides additional, complementary benefits. We show that this simple approach enables the model to be data efficient by allowing it to arbitrate its annotation budget to query languages it is less certain on. We illustrate the effectiveness of our proposed method on a diverse set of tasks: a classification task with 4 languages, a sequence tagging task with 4 languages and a dependency parsing task with 5 languages. Our proposed method, whilst simple, substantially outperforms the other viable alternatives for building a model in a multilingual setting under constrained budgets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2019

Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations

We investigate whether off-the-shelf deep bidirectional sentence represe...
research
12/15/2019

A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

The lack of annotated data in many languages is a well-known challenge w...
research
05/24/2022

Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

Massively multilingual models are promising for transfer learning across...
research
03/16/2022

Zero-Shot Dependency Parsing with Worst-Case Aware Automated Curriculum Learning

Large multilingual pretrained language models such as mBERT and XLM-RoBE...
research
09/20/2018

Lessons learned in multilingual grounded language learning

Recent work has shown how to learn better visual-semantic embeddings by ...
research
01/13/2023

Multilingual Detection of Check-Worthy Claims using World Languages and Adapter Fusion

Check-worthiness detection is the task of identifying claims, worthy to ...
research
03/16/2022

Transforming Sequence Tagging Into A Seq2Seq Task

Pretrained, large, generative language models (LMs) have had great succe...

Please sign up or login with your details

Forgot password? Click here to reset