Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

12/19/2022
by   Ercong Nie, et al.
0

Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1 finetuning baseline by 3.7 cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

READ FULL TEXT
research
04/18/2023

Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese

Multilingual language models have pushed state-of-the-art in cross-lingu...
research
02/14/2022

Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?

Keyword extraction is the task of retrieving words that are essential to...
research
12/26/2018

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

We introduce an architecture to learn joint multilingual sentence repres...
research
12/07/2022

JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

JamPatoisNLI provides the first dataset for natural language inference i...
research
05/17/2022

OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval

Aligning parallel sentences in multilingual corpora is essential to cura...
research
10/21/2022

On the Calibration of Massively Multilingual Language Models

Massively Multilingual Language Models (MMLMs) have recently gained popu...
research
03/03/2022

Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages

Pre-trained multilingual language models such as mBERT and XLM-R have de...

Please sign up or login with your details

Forgot password? Click here to reset