JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

12/07/2022
by   Ruth-Ann Armstrong, et al.
0

JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. While our work, along with previous work, shows that transfer from these models to low-resource languages that are unrelated to languages in their training set is not very effective, we would expect stronger results from transfer to creoles. Indeed, our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages, and help us begin to understand how the unique relationship between creoles and their high-resource base languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring premises and expert-written hypotheses, is a step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2022

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Multilingual Pretrained Language Models (MPLMs) have shown their strong ...
research
05/27/2019

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

While natural language processing systems often focus on a single langua...
research
11/09/2022

Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes

Providing better language tools for low-resource and endangered language...
research
10/08/2018

Zero-Resource Multilingual Model Transfer: Learning What to Share

Modern natural language processing and understanding applications have e...
research
02/22/2021

Bilingual Language Modeling, A transfer learning technique for Roman Urdu

Pretrained language models are now of widespread use in Natural Language...
research
02/16/2019

Exploring Language Similarities with Dimensionality Reduction Technique

In recent years several novel models were developed to process natural l...
research
10/02/2020

Automatic Extraction of Rules Governing Morphological Agreement

Creating a descriptive grammar of a language is an indispensable step fo...

Please sign up or login with your details

Forgot password? Click here to reset