Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

07/01/2023
by   Pin-Jie Lin, et al.
0

Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

Benefiting from transformer-based pre-trained language models, neural ra...
research
10/13/2022

Bootstrapping Multilingual Semantic Parsers using Large Language Models

Despite cross-lingual generalization demonstrated by pre-trained multili...
research
09/11/2022

Detecting Suicide Risk in Online Counseling Services: A Study in a Low-Resource Language

With the increased awareness of situations of mental crisis and their so...
research
09/14/2022

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

As pre-trained language models become more resource-demanding, the inequ...
research
07/19/2022

On the Usability of Transformers-based models for a French Question-Answering task

For many tasks, state-of-the-art results have been achieved with Transfo...
research
06/04/2021

Language Scaling for Universal Suggested Replies Model

We consider the problem of scaling automated suggested replies for Outlo...

Please sign up or login with your details

Forgot password? Click here to reset