XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

05/01/2020
by   Edoardo Maria Ponti, et al.
0

In order to simulate human language capacity, natural language processing systems must complement the explicit information derived from raw text with the ability to reason about the possible causes and outcomes of everyday situations. Moreover, the acquired world knowledge should generalise to new languages, modulo cultural differences. Advances in machine commonsense reasoning and cross-lingual transfer depend on the availability of challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We benchmark a range of state-of-the-art models on this novel dataset, revealing that current methods based on multilingual pretraining and zero-shot fine-tuning transfer suffer from the curse of multilinguality and fall short of performance in monolingual settings by a large margin. Finally, we propose ways to adapt these models to out-of-sample resource-lean languages where only a small corpus or a bilingual dictionary is available, and report substantial improvements over the random baseline. XCOPA is available at github.com/cambridgeltl/xcopa.

READ FULL TEXT
research
06/22/2021

It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

Commonsense reasoning is one of the key problems in natural language pro...
research
02/26/2023

CLICKER: Attention-Based Cross-Lingual Commonsense Knowledge Transfer

Recent advances in cross-lingual commonsense reasoning (CSR) are facilit...
research
09/13/2021

xGQA: Cross-Lingual Visual Question Answering

Recent advances in multimodal vision and language modeling have predomin...
research
09/02/2021

MultiEURLEX – A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

We introduce MULTI-EURLEX, a new multilingual dataset for topic classifi...
research
01/31/2022

Causal Inference Principles for Reasoning about Commonsense Causality

Commonsense causality reasoning (CCR) aims at identifying plausible caus...
research
10/05/2021

Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance

Multilingual language models achieve impressive zero-shot accuracies in ...
research
02/08/2023

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

This paper proposes a framework for quantitatively evaluating interactiv...

Please sign up or login with your details

Forgot password? Click here to reset