Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

05/12/2021
by   Błażej Dolicki, et al.
0

There is an increasing amount of evidence that in cases with little or no data in a target language, training on a different language can yield surprisingly good results. However, currently there are no established guidelines for choosing the training (source) language. In attempt to solve this issue we thoroughly analyze a state-of-the-art multilingual model and try to determine what impacts good transfer between languages. As opposed to the majority of multilingual NLP literature, we don't only train on English, but on a group of almost 30 languages. We show that looking at particular syntactic features is 2-4 times more helpful in predicting the performance than an aggregated syntactic similarity. We find out that the importance of syntactic features strongly differs depending on the downstream task - no single feature is a good performance predictor for all NLP tasks. As a result, one should not expect that for a target language L_1 there is a single language L_2 that is the best choice for any NLP task (for instance, for Bulgarian, the best source language is French on POS tagging, Russian on NER and Thai on NLI). We discuss the most important linguistic features affecting the transfer quality using statistical and machine learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2020

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Much recent progress in applications of machine learning models to NLP h...
research
05/23/2023

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

In this paper, we present MasakhaPOS, the largest part-of-speech (POS) d...
research
04/19/2019

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Pretrained contextual representation models (Peters et al., 2018; Devlin...
research
03/31/2020

Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks

It is now established that modern neural language models can be successf...
research
09/27/2020

What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties

Multilingual sentence encoders have seen much success in cross-lingual m...
research
05/29/2019

Choosing Transfer Languages for Cross-Lingual Learning

Cross-lingual transfer, where a high-resource transfer language is used ...
research
02/23/2018

From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings

A core part of linguistic typology is the classification of languages ac...

Please sign up or login with your details

Forgot password? Click here to reset