XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual Understanding (XLU)

01/16/2023
by   Ankit Kumar Upadhyay, et al.
0

Natural Language Processing systems are heavily dependent on the availability of annotated data to train practical models. Primarily, models are trained on English datasets. In recent times, significant advances have been made in multilingual understanding due to the steeply increasing necessity of working in different languages. One of the points that stands out is that since there are now so many pre-trained multilingual models, we can utilize them for cross-lingual understanding tasks. Using cross-lingual understanding and Natural Language Inference, it is possible to train models whose applications extend beyond the training language. We can leverage the power of machine translation to skip the tiresome part of translating datasets from one language to another. In this work, we focus on improving the original XNLI dataset by re-translating the MNLI dataset in all of the 14 different languages present in XNLI, including the test and dev sets of XNLI using Google Translate. We also perform experiments by training models in all 15 languages and analyzing their performance on the task of natural language inference. We then expand our boundary to investigate if we could improve performance in low-resource languages such as Swahili and Urdu by training models in languages other than English.

READ FULL TEXT

page 1

page 4

page 5

research
09/13/2018

XNLI: Evaluating Cross-lingual Sentence Representations

State-of-the-art natural language processing systems rely on supervision...
research
05/27/2019

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

While natural language processing systems often focus on a single langua...
research
05/18/2023

Multilingual Event Extraction from Historical Newspaper Adverts

NLP methods can aid historians in analyzing textual materials in greater...
research
09/10/2021

MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment

Readability assessment is the task of determining how difficult or easy ...
research
10/13/2022

Bootstrapping Multilingual Semantic Parsers using Large Language Models

Despite cross-lingual generalization demonstrated by pre-trained multili...
research
05/03/2022

XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal Expression Extraction

Temporal Expression Extraction (TEE) is essential for understanding time...
research
11/12/2015

A Multilingual FrameNet-based Grammar and Lexicon for Controlled Natural Language

Berkeley FrameNet is a lexico-semantic resource for English based on the...

Please sign up or login with your details

Forgot password? Click here to reset