What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties

09/27/2020
by   Rochelle Choenni, et al.
0

Multilingual sentence encoders have seen much success in cross-lingual model transfer for downstream NLP tasks. Yet, we know relatively little about the properties of individual languages or the general patterns of linguistic variation that they encode. We propose methods for probing sentence representations from state-of-the-art multilingual encoders (LASER, M-BERT, XLM and XLM-R) with respect to a range of typological properties pertaining to lexical, morphological and syntactic structure. In addition, we investigate how this information is distributed across all layers of the models. Our results show interesting differences in encoding linguistic variation associated with different pretraining strategies.

READ FULL TEXT

page 6

page 7

page 11

page 13

page 14

research
10/24/2020

Cross-neutralising: Probing for joint encoding of linguistic information in multilingual models

Multilingual sentence encoders are widely used to transfer NLP models ac...
research
06/12/2019

Probing Multilingual Sentence Representations With X-Probe

This paper extends the task of probing sentence representations for ling...
research
12/21/2022

Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?

Multilingual BERT (mBERT) has demonstrated considerable cross-lingual sy...
research
09/07/2023

The Daunting Dilemma with Sentence Encoders: Success on Standard Benchmarks, Failure in Capturing Basic Semantic Properties

In this paper, we adopted a retrospective approach to examine and compar...
research
11/08/2019

How Language-Neutral is Multilingual BERT?

Multilingual BERT (mBERT) provides sentence representations for 104 lang...
research
05/12/2021

Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

There is an increasing amount of evidence that in cases with little or n...
research
04/30/2020

A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Deep pre-trained contextualized encoders like BERT (Delvin et al., 2019)...

Please sign up or login with your details

Forgot password? Click here to reset