Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

08/18/2021
by   Weicheng Ma, et al.
0

This paper studies the relative importance of attention heads in Transformer-based models to aid their interpretability in cross-lingual and multi-lingual tasks. Prior research has found that only a few attention heads are important in each mono-lingual Natural Language Processing (NLP) task and pruning the remaining heads leads to comparable or improved performance of the model. However, the impact of pruning attention heads is not yet clear in cross-lingual and multi-lingual tasks. Through extensive experiments, we show that (1) pruning a number of attention heads in a multi-lingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments. Our experiments focus on sequence labeling tasks, with potential applicability on other cross-lingual and multi-lingual tasks. For comprehensiveness, we examine two pre-trained multi-lingual models, namely multi-lingual BERT (mBERT) and XLM-R, on three tasks across 9 languages each. We also discuss the validity of our findings and their extensibility to truly resource-scarce languages and other task settings.

READ FULL TEXT

page 3

page 7

research
10/16/2020

Cross-Lingual Relation Extraction with Transformers

Relation extraction (RE) is one of the most important tasks in informati...
research
01/29/2022

Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Machine learning models allow us to compare languages by showing how har...
research
01/30/2020

Do We Need Word Order Information for Cross-lingual Sequence Labeling

Most of the recent work in cross-lingual adaptation does not consider th...
research
11/01/2021

Cross-lingual Hate Speech Detection using Transformer Models

Hate speech detection within a cross-lingual setting represents a paramo...
research
02/25/2022

The Reality of Multi-Lingual Machine Translation

Our book "The Reality of Multi-Lingual Machine Translation" discusses th...
research
04/28/2020

Self-Attention with Cross-Lingual Position Representation

Position encoding (PE), an essential part of self-attention networks (SA...
research
11/27/2018

Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision

Joint representation learning of words and entities benefits many NLP ta...

Please sign up or login with your details

Forgot password? Click here to reset