X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

09/16/2023
by   Juan Diego Rodriguez, et al.
0

Understanding when two pieces of text convey the same information is a goal touching many subproblems in NLP, including textual entailment and fact-checking. This problem becomes more complex when those two pieces of text are in different languages. Here, we introduce X-PARADE (Cross-lingual Paragraph-level Analysis of Divergences and Entailments), the first cross-lingual dataset of paragraph-level information divergences. Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language, indicating whether a given piece of information is the same, new, or new but can be inferred. This last notion establishes a link with cross-language NLI. Aligned paragraphs are sourced from Wikipedia pages in different languages, reflecting real information divergences observed in the wild. Armed with our dataset, we investigate a diverse set of approaches for this problem, including classic token alignment from machine translation, textual entailment methods that localize their decisions, and prompting of large language models. Our results show that these methods vary in their capability to handle inferable information, but they all fall short of human performance.

READ FULL TEXT

page 1

page 9

page 17

research
03/16/2022

Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure

Multilingual pre-trained language models, such as mBERT and XLM-R, have ...
research
12/17/2019

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Recent work has exhibited the surprising cross-lingual abilities of mult...
research
05/11/2021

Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora

Cross-lingual text representations have gained popularity lately and act...
research
09/10/2020

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Large-scale cross-lingual language models (LM), such as mBERT, Unicoder ...
research
05/04/2020

WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking

We present our work on aligning the Unified Medical Language System (UML...
research
03/11/2022

Cross-lingual Inference with A Chinese Entailment Graph

Predicate entailment detection is a crucial task for question-answering ...
research
07/16/2019

Language comparison via network topology

Modeling relations between languages can offer understanding of language...

Please sign up or login with your details

Forgot password? Click here to reset