Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese

04/18/2023
by   Vésteinn Snæbjarnarson, et al.
0

Multilingual language models have pushed state-of-the-art in cross-lingual NLP transfer. The majority of zero-shot cross-lingual transfer, however, use one and the same massively multilingual transformer (e.g., mBERT or XLM-R) to transfer to all target languages, irrespective of their typological, etymological, and phylogenetic relations to other languages. In particular, readily available data and models of resource-rich sibling languages are often ignored. In this work, we empirically show, in a case study for Faroese – a low-resource language from a high-resource language family – that by leveraging the phylogenetic information and departing from the 'one-size-fits-all' paradigm, one can improve cross-lingual transfer to low-resource languages. In particular, we leverage abundant resources of other Scandinavian languages (i.e., Danish, Norwegian, Swedish, and Icelandic) for the benefit of Faroese. Our evaluation results show that we can substantially improve the transfer performance to Faroese by exploiting data and models of closely-related high-resource languages. Further, we release a new web corpus of Faroese and Faroese datasets for named entity recognition (NER), semantic text similarity (STS), and new language models trained on all Scandinavian languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

The main goal behind state-of-the-art pretrained multilingual models suc...
research
12/19/2022

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Multilingual Pretrained Language Models (MPLMs) have shown their strong ...
research
11/09/2022

Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes

Providing better language tools for low-resource and endangered language...
research
11/02/2021

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Speech processing systems currently do not support the vast majority of ...
research
05/13/2023

The Geometry of Multilingual Language Models: An Equality Lens

Understanding the representations of different languages in multilingual...
research
09/14/2022

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

As pre-trained language models become more resource-demanding, the inequ...
research
03/31/2020

Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks

It is now established that modern neural language models can be successf...

Please sign up or login with your details

Forgot password? Click here to reset