A multilabel approach to morphosyntactic probing

04/17/2021
by   Naomi Tachikawa Shapiro, et al.
0

We introduce a multilabel probing task to assess the morphosyntactic representations of word embeddings from multilingual language models. We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages of varying morphological complexity: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but robust paradigm, we show that multilingual BERT renders many morphosyntactic features easily and simultaneously extractable (e.g., gender, grammatical case, pronominal type). We further evaluate the probes on six "held-out" languages in a zero-shot transfer setting: Arabic, Chinese, Marathi, Slovenian, Tagalog, and Yoruba. This style of probing has the added benefit of revealing the linguistic properties that language models recognize as being shared across languages. For instance, the probes performed well on recognizing nouns in the held-out languages, suggesting that multilingual BERT has a conception of noun-hood that transcends individual languages; yet, the same was not true of adjectives.

READ FULL TEXT

page 27

page 28

page 30

page 32

page 33

page 34

page 36

page 38

research
09/02/2021

Establishing Interlingua in Multilingual Language Models

Large multilingual language models show remarkable zero-shot cross-lingu...
research
05/01/2020

Identifying Necessary Elements for BERT's Multilinguality

It has been shown that multilingual BERT (mBERT) yields high quality mul...
research
04/20/2022

Analyzing Gender Representation in Multilingual Models

Multilingual language models were shown to allow for nontrivial transfer...
research
10/09/2021

An Isotropy Analysis in the Multilingual BERT Embedding Space

Several studies have explored various advantages of multilingual pre-tra...
research
06/09/2023

Morphosyntactic probing of multilingual BERT models

We introduce an extensive dataset for multilingual probing of morphologi...
research
01/26/2021

Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT

We investigate how Multilingual BERT (mBERT) encodes grammar by examinin...
research
05/19/2023

Evaluating task understanding through multilingual consistency: A ChatGPT case study

At the staggering pace with which the capabilities of large language mod...

Please sign up or login with your details

Forgot password? Click here to reset