Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

10/11/2022
by   William Held, et al.
0

Multilingual transformer-based models demonstrate remarkable zero and few-shot transfer across languages by learning and reusing language-agnostic features. However, as a fixed-size model acquires more languages, its performance across all languages degrades, a phenomenon termed interference. Often attributed to limited model capacity, interference is commonly addressed by adding additional parameters despite evidence that transformer-based models are overparameterized. In this work, we show that it is possible to reduce interference by instead identifying and pruning language-specific parameters. First, we use Shapley Values, a credit allocation metric from coalitional game theory, to identify attention heads that introduce interference. Then, we show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction, seeing gains as large as 24.7%. Finally, we provide insights on language-agnostic and language-specific attention heads using attention visualization.

READ FULL TEXT

page 2

page 5

research
08/20/2020

Inducing Language-Agnostic Multilingual Representations

Multilingual representations have the potential to make cross-lingual sy...
research
04/15/2021

Adaptive Sparse Transformer for Multilingual Translation

Multilingual machine translation has attracted much attention recently d...
research
10/07/2021

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

While Transformer-based models have shown impressive language modeling p...
research
10/06/2020

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Modern multilingual models are trained on concatenated text from multipl...
research
03/29/2023

Summarizing Indian Languages using Multilingual Transformers based Models

With the advent of multilingual models like mBART, mT5, IndicBART etc., ...
research
12/14/2022

Causes and Cures for Interference in Multilingual Translation

Multilingual machine translation models can benefit from synergy between...
research
05/19/2021

Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?

The automatic detection of humor poses a grand challenge for natural lan...

Please sign up or login with your details

Forgot password? Click here to reset