It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

06/22/2021
by   Alexey Tikhonov, et al.
0

Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for languages other than English. Pretrained cross-lingual models are a source of powerful language-agnostic representations, yet their inherent reasoning capabilities are still actively studied. In this work, we design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features. To evaluate this approach, we create a multilingual Winograd Schema corpus by processing several datasets from prior work within a standardized pipeline and measure cross-lingual generalization ability in terms of out-of-sample performance. The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning, even when applied to other languages in a zero-shot manner. Also, we demonstrate that most of the performance is given by the same small subset of attention heads for all studied languages, which provides evidence of universal reasoning capabilities in multilingual encoders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2023

CLICKER: Attention-Based Cross-Lingual Commonsense Knowledge Transfer

Recent advances in cross-lingual commonsense reasoning (CSR) are facilit...
research
05/01/2020

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

In order to simulate human language capacity, natural language processin...
research
09/23/2021

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

Recent progress in self-training, self-supervised pretraining and unsupe...
research
10/08/2020

Precise Task Formalization Matters in Winograd Schema Evaluations

Performance on the Winograd Schema Challenge (WSC), a respected English ...
research
04/16/2022

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation

For multilingual sequence-to-sequence pretrained language models (multil...
research
05/21/2023

SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction

Zero-shot cross-lingual information extraction(IE) aims at constructing ...
research
10/16/2021

Leveraging Knowledge in Multilingual Commonsense Reasoning

Commonsense reasoning (CSR) requires the model to be equipped with gener...

Please sign up or login with your details

Forgot password? Click here to reset