Extending Multilingual BERT to Low-Resource Languages

04/28/2020
by   Zihan Wang, et al.
0

Multilingual BERT (M-BERT) has been a huge success in both supervised and zero-shot cross-lingual transfer learning. However, this success has focused only on the top 104 languages in Wikipedia that it was trained on. In this paper, we propose a simple but effective approach to extend M-BERT (E-BERT) so that it can benefit any new language, and show that our approach benefits languages that are already in M-BERT as well. We perform an extensive set of experiments with Named Entity Recognition (NER) on 27 languages, only 16 of which are in M-BERT, and show an average increase of about 6 that are already in M-BERT and 23

READ FULL TEXT
research
05/18/2020

Are All Languages Created Equal in Multilingual BERT?

Multilingual BERT (mBERT) trained on 104 languages has shown surprisingl...
research
12/15/2019

A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

The lack of annotated data in many languages is a well-known challenge w...
research
05/01/2020

Identifying Necessary Elements for BERT's Multilinguality

It has been shown that multilingual BERT (mBERT) yields high quality mul...
research
05/12/2021

Priberam Labs at the NTCIR-15 SHINRA2020-ML: Classification Task

Wikipedia is an online encyclopedia available in 285 languages. It compo...
research
04/25/2017

280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

We propose a simple, yet effective, approach towards inducing multilingu...
research
08/31/2019

Adversarial Learning with Contextual Embeddings for Zero-resource Cross-lingual Classification and NER

Contextual word embeddings (e.g. GPT, BERT, ELMo, etc.) have demonstrate...
research
11/11/2020

CalibreNet: Calibration Networks for Multilingual Sequence Labeling

Lack of training data in low-resource languages presents huge challenges...

Please sign up or login with your details

Forgot password? Click here to reset