Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank

09/29/2020
by   Ethan C. Chau, et al.
0

Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these models, whose labeled and unlabeled data is too limited to train a monolingual model effectively. We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings. Using dependency parsing of four diverse low-resource language varieties as a case study, we show that these methods significantly improve performance over baselines, especially in the lowest-resource cases, and demonstrate the importance of the relationship between such models' pretraining data and target language varieties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2022

MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning

Transformer language models (TLMs) are critical for most NLP tasks, but ...
research
06/16/2021

Specializing Multilingual Language Models: An Empirical Study

Contextualized word representations from pretrained multilingual languag...
research
09/19/2019

Low-Resource Parsing with Crosslingual Contextualized Representations

Despite advances in dependency parsing, languages with small treebanks s...
research
12/31/2020

UNKs Everywhere: Adapting Multilingual Language Models to New Scripts

Massively multilingual language models such as multilingual BERT (mBERT)...
research
02/12/2021

A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages

Neural dependency parsing has achieved remarkable performance for many d...
research
04/16/2023

Sabiá: Portuguese Large Language Models

As the capabilities of language models continue to advance, it is concei...
research
09/16/2021

Revisiting Tri-training of Dependency Parsers

We compare two orthogonal semi-supervised learning techniques, namely tr...

Please sign up or login with your details

Forgot password? Click here to reset