A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

12/15/2019
by   Niels van der Heijden, et al.
0

The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP). Therefore, many recent studies focus on zero-shot transfer learning and joint training across languages to overcome data scarcity for low-resource languages. In this work we (i) perform a comprehensive comparison of state-ofthe-art multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging; and (ii) propose a new method for creating multilingual contextualized word embeddings, compare it to multiple baselines and show that it performs at or above state-of-theart level in zero-shot transfer settings. Finally, we show that our method allows for better knowledge sharing across languages in a joint training setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2020

Extending Multilingual BERT to Low-Resource Languages

Multilingual BERT (M-BERT) has been a huge success in both supervised an...
research
07/26/2019

LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations

We present LINSPECTOR WEB, an open source multilingual inspector to anal...
research
05/22/2023

Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs

Colexification in comparative linguistics refers to the phenomenon of a ...
research
08/03/2022

Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text

The grammatical analysis of texts in any human language typically involv...
research
04/03/2022

On Efficiently Acquiring Annotations for Multilingual Models

When tasked with supporting multiple languages for a given problem, two ...
research
08/28/2018

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

Much work in Natural Language Processing (NLP) has been for resource-ric...
research
02/01/2019

Multilingual NER Transfer for Low-resource Languages

In massively multilingual transfer NLP models over many source languages...

Please sign up or login with your details

Forgot password? Click here to reset